|
|
||||||||
1 Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA;
2 Center for Dental Informatics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA;
Correspondence: * corresponding author, wcb{at}cbmi.upmc.edu
| Abstract |
|---|
|
|
|---|
KEY WORDS: Dental research information retrieval inter-rater agreement MEDLINE MeSH data mining text classification
| Introduction |
|---|
|
|
|---|
The impetus for developing the methodology reported in this paper was a very high-level question: "What are the characteristics of the dental and craniofacial research literature available through MEDLINE, such as the frequency and relationship between research topics? What research trends are evident from the literature?" Bibliometric and content analyses of the literature are common methods to answer such questions and, in doing so, describe a scientific field. Co-occurrence of authors (Marion, 2002), journals (Morris and McCain, 1998), or indexing terms (Marion, 2002) in a body of literature can be used to visualize the structure of a scientific field and determine its boundaries. This type of content analysis has been performed in many fields, including medical informatics, information retrieval, and software engineering (Morris and McCain, 1998; Ding and Foo, 1999; Marion, 2002). Free-text analyses are becoming more prevalent to characterize documents and to determine relationships among them. Recent studies using free text to classify documents identified patient subgroups in dictated chest radiography reports (Chapman et al., 2001) and classified Web pages (Moore et al., 1997; Asirvatham, 2003). Once the relationships among documents have been determined, various statistical methods, such as multidimensional scaling and hierarchical clustering, can be used to quantify those relationships.
In this paper, we describe a method for retrieving the dental and craniofacial research literature from MEDLINE. We also present an analysis of a subset of this literature to illustrate what a "birds-eye" view of the dental research literature could look like. We hope that our method, once refined, will be useful in answering more general research questions using the MEDLINE database.
| Background and Significance |
|---|
|
|
|---|
Human indexers make relatively complex decisions when assigning MeSH terms to an article. After reading the complete article, they pick major headings to describe the papers main focus. Other MeSH terms add more detail to the articles description. The indexers must not only be very familiar with the terms in the MeSH hierarchy, but also they must predict the searchers behavior and assign terms that the average searcher would be expected to choose when attempting to find articles about the specific subject(s). A study of indexing consistency in MEDLINE has shown that indexers tend to be more consistent with general MeSH terms and less consistent when using more specific terms (Funk and Reid, 1983). A key problem in indexing is that not all general descriptors that could possibly apply to an article can actually be assigned to it. For example, if a researcher wants to search only papers within the domain of dental research, MeSH indexing is of little help. Searching MEDLINE for articles tagged with the MeSH term DENTAL RESEARCH yielded approximately 850 citations as of August, 2003. Most of these papers were about the topic of dental research and were not dental research papers. In contrast, previous analyses of the dental literature have focused on specific and relatively narrow topics, such as clinical evidence in pediatric dentistry (Yang et al., 2001), orthodontics (Sun et al., 2000), implantology (Russo et al., 2000) and prosthodontics (Nishimura et al., 2002), oro-facial pain (Macfarlane et al., 2001), and randomized controlled trials in dentistry (Sjögren and Halling, 2000).
Published analyses of the literature in a scientific field have used several approaches for identifying the target literature and content analysis. In a recent study of biomedical informatics, the investigators began by identifying journals associated with this field (Morris and McCain, 1998). Then, they performed intercitation studies among productive journal titles and examined co-citation data for proposed core journals using multivariate analyses. The study indicated the presence of a core literature and identified several major research areas. An analysis of the dental research literature faces several challenges and, therefore, must draw on multiple methods. First, it is very difficult to "bound" the field of dental research by attempting to identify a set of core journals, as Morris and McCain did in the biomedical informatics study. Due to dentistrys multi-disciplinarity, its research tends to be published in many different journals, including those in medicine, biomedical engineering, basic sciences, and psychology (Bush, 1996; Macfarlane et al., 2001). Focusing on specific journals would therefore limit ones view of dental research significantly. On the other hand, locating the collection of individual dental research papers from MEDLINE is made difficult by the retrieval challenges discussed above.
Computer-based methods for analyzing text can be helpful in overcoming this obstacle. With these methods, a computer program is trained to search a collection of documents based on the characteristics of a training set of documents similar to those that the searcher would like to retrieve. Various algorithms have been developed and evaluated for classifying or categorizing text (Cavnar and Trenkle, 1994; Lewis and Gale, 1994; Lewis and Linguette, 1994; Yang, 1999). One advantage of these methods is that they can be applied to any text, regardless of source, and they can process very large collections of documents in a short period of time. The success of these methods depends on how well they can discriminate relevant documents from irrelevant ones, and usually involves some trade-off between sensitivity and specificity.
The approach described in this paper is significant for two reasons. First, it provides a method for retrieving and analyzing the dental research literature. Such a comprehensive analysis has not been conducted, but may be useful for several purposes. For instance, it could provide a "birds-eye" view of dental research as a field, which may be useful to policymakers, funding agencies, and researchers. Analyzing changes in dental research topics over time may provide an indication of past, current, and emerging trends. Second, our methodology may be used to facilitate searches within global topics that cannot be easily delineated using current search engines. For example, our methodology could serve as a filter for searches that target only articles within a specific area of dental research and, thus, may increase specificity for researchers interested only in papers from their field. We are not attempting to evaluate or compare our retrieval performance with text classification methods; our aim is to demonstrate to the dental research community how the science of informatics can be applied to their field.
| Methods |
|---|
|
|
|---|
Searching MeSH for dental and craniofacial terms
We examined the MeSH manually and recorded all terms that pertained to a dental or craniofacial topic. Iteratively, one of the author (WCB) and a dental research librarian compared results until they were confident that they had all relevant dental terms. If a term existed as a child of a term already chosen, that term was not included. All terms chosen were exploded for the final search. For example, ORTHODONTICS occurs lower in the MeSH hierarchy than DENTISTRY. Because we included DENTISTRY in our search, all terms below it, including ORTHODONTICS, were part of the final search. Terms were chosen with the goal of higher recall, or sensitivity, since we did not want to exclude any relevant articles from our retrieved set. As a consequence, we purposely accepted the trade-off of more irrelevant citations in our resulting set.
We used the OVID interface to search the MEDLINE database. Once our search strategy was complete, we limited it to English-language articles that contained abstracts published between 1966 and 2002. We omitted the following publication types: comment, editorial, biography, historical article, letter, news, review, review of reported cases, review, tutorial, case report, and dictionary. We randomly sampled 1000 articles from the resulting set for manual review by expert raters. These references were divided into 5 mutually exclusive groups of 200 articles each.
Expert ratings
We sought out dental research experts from academia, industry, and government as volunteers to review articles. We used referrals from dental school and medical informatics faculty, along with Internet searches, to identify possible participants. We e-mailed 50 experts, and 16 agreed to participate. We assigned three experts to review each group of 200 abstracts, with the exception of one group that had four reviewers. Our final set contained 990 references, because we removed ten duplicates.
We developed a World Wide Web interface that allowed experts to review articles at their convenience and in as many sessions as needed. The interface was developed in the programming language PHP (Apache Software Foundation) and was connected to a MySQL database (MySQL AB). Reviewers were allowed to change their ratings if necessary. Reviewers were given 46 weeks to rate the 200 abstracts. Abstracts were randomized by publication date and among the 5 groups. The rating interface displayed only the title and abstract of each reference, and the rater, using radio buttons, was prompted to classify the text as either:
Criteria for inclusion in each category were displayed on the instructions page within the interface.
The ratings for each abstract were counted, and the reliability (Cronbachs
) was calculated for each rater group. Classifications 3 and 4 were combined, so that three classes were used in the reliability measures: 1, 2, and (either 3 or 4). Each reference was placed in a class based on a majority rating of experts. For example, in a group of abstracts with three raters, two raters must have rated a document as "Dental or craniofacial research" for an article to be placed in that category. Those articles without a majority rating were placed in the "3 or 4" (non-dental or not sure) group for classification purposes.
Probabilistic text classification
Identify Patient Sets (IPS) (Aronis et al., 1999) is a general-purpose medical record retrieval program developed at the University of Pittsburgh. Given a set of documents, IPS compiles a dictionary of all words and UMLS (Unified Medical Language System, www.nlm.nih.gov) phrases within them. This dictionary is then used to create a vector for each document, representing the presence or absence of each word in the dictionary. A training set of labeled documents is used to create probabilistic models for the classes of documents in the set.
IPS is trained in a binary way, that is, a document that is of the desired class is a "HIT", and one that is not in that class is a "MISS". We chose to combine the three categories that were not "dental or craniofacial research" so that IPS would make a binary classification. By using this method, we had more documents in our training and test sets than we could have if we trained and tested on 3 or even 4 classes. For instance, if an article were rated by a majority of raters in that group as "dental or craniofacial research", that article would be labeled as a "HIT" for input to IPS. An article that was not categorized as "dental or craniofacial research" was labeled as a "MISS". The "HIT" and "MISS" groups of documents are analyzed to find those words or UMLS phrases that discriminate between the two groups.
The 990 articles were divided into a training set (n = 693) and a test set (n = 297) such that the proportion of dental research articles was the same in the two sets (60%), and each article had a 30% chance of being randomly assigned to the test set. We used the single-holdout method of cross-validation in this study. That is, 70% of the rated documents were used to train IPS, and 30% of the documents were used to test the IPS models generated. These sets were mutually exclusive. In other words, a document occurring in the test set was "held out", or did not occur in the training set, and vice versa.
Three experiments were conducted with the following inputs for each article to IPS: (1) title and abstract only, (2) MeSH terms only, and (3) title, abstract, and MeSH terms. The same training set was used in each experiment to create corresponding IPS models. Those models that attained both sensitivity and specificity of at least 0.60 on the training set were then applied to the test set to predict the category of each article in the set. For instance, a sensitivity of 0.60 of an IPS model indicated that the model correctly identified 60% of the articles rated by experts as "Dental or craniofacial research". Sensitivity, specificity, precision, and F-measure were calculated for each model applied to the test set.
Sensitivity, or recall, is the ratio of documents that IPS correctly categorized as dental research to all documents classified by experts as dental research. Specificity is the ratio of articles that IPS did not categorize as dental research to those articles judged by experts as not being dental research. Precision is analogous to positive predictive value (PPV), or the proportion of dental research documents out of all documents retrieved. F-measure is commonly used to find the best models in information retrieval and takes into account both sensitivity and precision. The F-measure was calculated according to the following equation:
![]() |
The constant "2" is used to weight sensitivity and precision equally. This constant can be varied if unequal weighting of sensitivity and precision is desired.
Characteristics of gold standard set
We performed several analyses to characterize our gold standard set. As described above, these analyses can serve as a template for a larger set. We generated counts of unique journal titles in the rating categories: dental research, dental non-research, and non-dental. We then looked at the types of journals in which these were published (www.ncbi.nlm.nih.gov/entrez), i.e., dental, medical, basic science, etc., and calculated the percentage published in dental journals. We also calculated total counts of all MeSH terms and major MeSH terms that occurred in each rating category. We used programs written in Python (www.python.org) to extract these data from the OVID output files (Reprint/Medlars format) of the references retrieved.
| Results |
|---|
|
|
|---|
Together, these three categories contained 459,758 articles. Approximately 70 additional dental MeSH termssuch as GINGIVAL CREVICULAR FLUID, STREPTOCOCCUS MUTANS, and DENTAL RECORDSwere not children of the three large categories above and were included. Many terms occurred in isolated locations in the hierarchy. For example, GINGIVAL CREVICULAR FLUID and DENTINAL FLUID are present only under EXUDATES AND TRANSUDATES in the ANATOMY category. They do not occur elsewhere in the MeSH hierarchy. Many articles were indexed with these isolated terms, including: terms in BIOMEDICAL AND DENTAL MATERIALS, specific oral microbial species names such as PORPHYROMONAS GINGIVALIS, and many dental public health and education terms such as DENTAL WASTE and DENTAL EDUCATION. 62,255 articles were indexed with MeSH terms that were not indexed within the three major categories above. After unwanted publication types were filtered out and those containing abstracts were retained, 137,816 articles remained.
Characteristics of gold standard set
Sixteen dental research experts used the Web interface to rate articles. It took approximately two months for all 16 raters to complete their ratings. All five groups had acceptable reliability (Cronbachs
> 0.70). Table 1
contains a summary of characteristics of our rated set of 990 articles. Our gold standard set contained 72% dental articles (60% dental research and 12% dental non-research), and 13% non-dental articles. Fifteen percent of the articles did not have a majority rating. The 591 dental research articles were published in 250 different journals. Similar diversity in journal titles was seen in the other categories. Sixty-two percent of dental research articles and 68% of dental non-research articles were published in journals that are considered "dental" titles. A surprising finding was that 5% of non-dental articles were published in dental journals. The journals indexed in MEDLINE are assigned one or more of 127 general subject types (www.ncbi.nlm.nih.gov/entrez/query/fcgi?db=journals). We considered those titles labeled with "Dentistry" as dental titles.
|
|
|
| Discussion |
|---|
|
|
|---|
The variety of journal titles included and major MeSH terms assigned in the dental research and dental non-research sets show that the science is very diverse and that our retrieved set encompasses various areas of clinical dentistry, the basic sciences, and biomaterials. The diversity of major MeSH terms used to describe dental and craniofacial research articles shows that using MeSH alone to retrieve such articles may be very costly in terms of time needed and knowledge of MeSH required. Using MeSH in combination with text analysis may improve success. Also, since dental research articles occur almost 40% of the time in non-dental journals, we may consider including other databases in further studies, e.g., BIOSIS or PsycINFO, since these articles may occur in journals that are not indexed in MEDLINE.
For this study, we considered a sensitivity of 0.70 as acceptable. Only 4 of our 11 IPS models met this goal. Since we have determined that discriminating research from non-research involves analyzing free text, and that many non-dental terms were discriminators, further work with text classification methods to increase performance is necessary. Filtering of dental articles from all others may increase performance. Since 60/72 or 84% of dental articles are dental research in our gold standard set, we may be able to retrieve the dental research literature more successfully if non-dental articles are filtered out. Considering that many words that IPS models used to discriminate dental research from non-research were non-dental terms, e.g., statistically, differentiation, and subjects, IPS may perform better when non-dental articles are not included.
Another method that may increase performance is parsing the text semantically and linking it to UMLS concepts and semantic types. A semantic knowledge representation computer program, MetaMap (Aronson, 2001), developed at the National Library of Medicine, may help achieve this task of discriminating dental research from non-research by providing additional information to classify articles.
Limitations
One limitation to our study was that we grouped the dental non-research with non-dental articles for our text classification procedure. Because some dental terms were determined by IPS to be discriminatory between dental research and other articles, combining these two categories may have decreased performance. That is, a dental research article may have been excluded because it contained a dental term that did not occur in many research articles, but occurred in many non-research articles. The word may be associated with a narrow research area, and its exclusion may result in our missing a growing area of research.
A limitation to our text classification method was that the IPS models were constructed with the use of only one training set. A more comprehensive analysis of performance would include cross-validation with many training and test sets. With either ten-fold cross-validation, or the single-hold-out method, we could determine whether the performance between the different models was statistically significant. We plan this for future work.
The high prevalence (60%) of dental research articles in our gold standard set may have been another limitation. Because of this prevalence, with similar sensitivity, our precision was greater than it would be in a set with a lower percentage of dental research articles, such as all of MEDLINE. However, training IPS on such a set would have required a much larger set of documents. That is, to have a large enough number of dental research documents for a good IPS model to be constructed, more raters rating many more documents would be needed.
| Conclusion |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
Aronson A (2001). Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. Proc AMIA Symp 2001:1721.
Asirvatham AKR (2003). Web page categorization based on document structure (www.iiit.net/students/stnd_pdfs/kranthi.pdf), pp. 19. Last accessed 9/30/2003.
Bush R (1996). Biomaterials: an introduction for librarians. Sci Technol Libr 15(4):317.
Cavnar W, Trenkle J (1994). N-Gram-based text categorization. In: SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, April 1113, 1994, Las Vegas, NV. pp. 161169.
Chapman WW, Fiszman M, Frederick PR, Chapman BE, Haug PJ (2001). Quantifying the characteristics of unambiguous chest radiography reports in the context of pneumonia. Acad Radiol 8:5766.[Medline]
Ding YCG, Foo S (1999). Mapping the intellectual structure of information retrieval studies: an author co-citation analysis, 19871997. J Inf Sci 25(1):6778.
Funk ME, Reid CA (1983). Indexing consistency in MEDLINE. Bull Med Libr Assoc 71:176183.[Medline]
Lewis D, Gale W (1994). A sequential algorithm for training text classifiers. In: 17th Annual ACM/SIGIR conference, July 36, 1994, Dublin, Ireland. New York, NY: Springer-Verlag, pp. 312.
Lewis D, Linguette M (1994). A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, April 1113, 1994, Las Vegas, NV, pp. 8193.
Macfarlane TV, Glenny AM, Worthington HV (2001). Systematic review of population-based epidemiological studies of oro-facial pain. J Dent 29:451467.[Medline]
Marion LSMK (2002). Contrasting views of software engineering journals, author co-citation choices and indexer vocabulary assignments. J Am Soc Inf Sci Tech 52:297308.
Moore JEH, Boley D, Gini M, Gross R, Hastings K, Karypis G, et al. (1997). Web page categorization and feature selection using association rule and principal component clustering. In: 7th Workshop on Information Technologies and Systems (WIT3 97), December 1314, 1997, Atlanta, GA, pp. 110.
Morris TA, McCain KW (1998). The structure of medical informatics journal literature. J Am Med Inform Assoc 5:448466.
Nishimura K, Rasool F, Ferguson MB, Sobel M, Niederman R (2002). Benchmarking the clinical prosthetic dental literature on MEDLINE. J Prosthet Dent 88:533541.[Medline]
Russo SP, Fiorellini JP, Weber HP, Niederman R (2000). Benchmarking the dental implant evidence on MEDLINE. Int J Oral Maxillofac Implants 15:792800.[Medline]
Sjögren P, Halling A (2000). Trends in dental and medical research and relevance of randomized controlled trials to common activities in general dentistry. Acta Odontol Scand 58:260264.[Medline]
Sun RL, Conway S, Zawaideh S, Niederman DR (2000). Benchmarking the clinical orthodontic evidence on Medline. Angle Orthod 70:464470.[Medline]
Yang S, Needleman H, Niederman R (2001). A bibliometric analysis of the pediatric dental literature in MEDLINE. Pediatr Dent 23:415418.[Medline]
Yang Y (1999). An evaluation of statistical approaches to text categorization. J Inf Retrieval 1(1/2):6788.
This article has been cited by other articles:
![]() |
T.K. Schleyer Dental Informatics: A Work in Progress Adv. Dent. Res., December 1, 2003; 17(1): 9 - 15. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.K. Schleyer, P. Corby, and A.L. Gregg A Preliminary Analysis of the Dental Informatics Literature Adv. Dent. Res., December 1, 2003; 17(1): 20 - 24. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| IADR Journals | Advances in Dental Research ® | Journal of Dental Research ® | Critical Reviews (1990-2004) |