Wednesday, June 11, 2008

URL decay in Medline abstracts

Published today in BMC Medical Informatics and Decision Making:

Ducut E, Liu F, Fontelo P. An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation. BMC Medical Informatics and Decision Making 2008, 8:23.

An excerpt from the abstract:
Methods: MEDLINE records from 1994 to 2006 from the National Library of Medicine in Extensible Mark-up Language (XML) format were processed yielding 10,208 URL addresses. These were accessed once daily at random times for 30 days. Titles and abstracts were also searched for the presence of archival tools such as WebCite, Persistent URL (PURL) and Digital Object Identifier (DOI).

Results: Results showed that the average URL length ranged from 13 to 425 characters with a mean length of 35 characters [Standard Deviation (SD) = 13.51; 95% confidence interval (CI) 13.25 to 13.77]. The most common top-level domains were ".org" and ".edu", each with 34%. About 81% of the URL pool was available 90% to 100% of the time, but only 78% of these contained the actual information mentioned in the MEDLINE record. "Dead" URLs constituted 16% of the total. Finally, a survey of archival tool usage showed that since its introduction in 1998, only 519 of all abstracts reviewed had incorporated DOI addresses in their MEDLINE abstracts.
(A quick search of PubMed using the search "http*[tiab]" (without the quotes) yields some examples of URLs included within the abstract)


Post a Comment

Links to this post:

Create a Link

<< Home