当前位置: 首页 > 期刊 > 《英国眼科学杂志》 > 2005年第3期 > 正文
编号:11275006
The quality of reporting of diagnostic accuracy studies published in ophthalmic journals
http://www.100md.com 《英国眼科学杂志》
     1 Department of Ophthalmology, Grampian University Hospitals NHS Trust, UK

    2 Health Services Research Unit, University of Aberdeen, UK

    Correspondence to:

    Augusto Azuara-Blanco PhD

    FRCS(Ed), The Eye Clinic, Aberdeen Royal Infirmary, Aberdeen AB25 2ZN, UK; aazblanco@aol.com

    Accepted for publication 8 August 2004

    ABSTRACT

    Aim: To evaluate the quality of reporting of all diagnostic studies published in five major ophthalmic journals in the year 2002 using the Standards for Reporting of Diagnostic Accuracy (STARD) initiative parameters.

    Methods: Manual searching was used to identify diagnostic studies published in 2002 in five leading ophthalmic journals, the American Journal of Ophthalmology (AJO), Archives of Ophthalmology (Archives), British Journal of Ophthalmology (BJO), Investigative Ophthalmology and Visual Science (IOVS), and Ophthalmology. The STARD checklist of 25 items and flow chart was used to evaluate the quality of each publication.

    Results: A total of 16 publications were included (AJO = 5, Archives = 1, BJO = 2, IOVS = 2, and Ophthalmology = 6). More than half of the studies (n = 9) were related to glaucoma diagnosis. Other specialties included retina (n = 4) cornea (n = 2), and neuro-ophthalmology (n = 1). The most common description of diagnostic accuracy was sensitivity and specificity values, published in 13 articles. The number of fully reported items in evaluated studies ranged from eight to 19. Seven studies reported more than 50% of the STARD items.

    Conclusions: The current standards of reporting of diagnostic accuracy tests are highly variable. The STARD initiative may be a useful tool for appraising the strengths and weaknesses of diagnostic accuracy studies.

    --------------------------------------------------------------------------------

    Abbreviations: AJO, American Journal of Ophthalmology; Archives, Archives of Ophthalmology; BJO, British Journal of Ophthalmology; IOVS, Investigative Ophthalmology and Visual Science; MRI, magnetic resonance imaging; ROC, receiver operating characteristic; STARD, Standards for Reporting of Diagnostic Accuracy

    Keywords: diagnostic accuracy studies; ophthalmic journals; quality of reporting

    Current ophthalmological practice relies on diagnostic tests using sophisticated technologies that are constantly evolving. Diagnostic accuracy studies determine the performance of the test in diagnosing the target condition. Improperly conducted and incompletely reported studies are prone to bias that, in turn, may lead to overly optimistic appraisal of evaluated tests.1 The performance of a diagnostic test can be estimated in several ways, including sensitivity, specificity, receiver operating characteristic curves (ROC), positive and negative predictive values, likelihood ratios, and diagnostic odd ratios.2,3

    To improve the quality of reporting of diagnostic accuracy studies the Standards for Reporting of Diagnostic Accuracy (STARD) initiative was published.4 During a consensus conference in the year 2000, the STARD project group developed a checklist of 25 items and a prototypical flow chart.2,5

    The aim of this study was to examine the current standard of reporting of diagnostic accuracy studies using the STARD parameters. Current standards may provide a useful baseline to measure the impact of the introduction of the STARD statement in the future.

    METHODS

    The five leading ophthalmic journals (that is, according to the impact factor) with clinical research sections or articles were selected. Basic science research and subspecialty journals were excluded. The journals evaluated were AJO, Archives, BJO, IOVS, and Ophthalmology. Since search strategies for diagnostic accuracy tests are suboptimal6 a hand search of all issues of 2002 was done. In these journals all manuscripts related to a diagnostic procedure were identified. Manuscripts were selected for inclusion if the diagnostic test was used in human subjects, the test was intended for clinical use, and measures of diagnostic accuracy were provided. Review articles, case reports, and longitudinal studies were excluded. The full paper was assessed for inclusion by one author, if uncertain; the study was selected as potentially suitable for inclusion. The selected papers were then independently assessed for inclusion by two investigators; if there was a disagreement, a consensus was reached.

    The STARD checklist (table 1) was used to score the studies. Each item could be considered to be fully, partially, or not reported. If the item was "not applicable" it was marked as such. For example, item 21 required reporting of estimates of diagnostic accuracy and measure of statistical uncertainty. If a study reported estimates of accuracy but no measure of precision it was considered partially fulfilled. Similarly, item 20 (reporting of adverse events associated with the test) was scored non-applicable for non-invasive studies (for example, visual field tests, fundus photography).

    Table 1 STARD checklist4,5

    One investigator assessed all the included studies. To evaluate the interobserver variability in the rating of the STARD criteria, a second investigator examined four randomly selected publications, masked to the results of the first investigator.

    RESULTS

    Twenty manuscripts were identified as potentially suitable for inclusion. After review of the full paper, four reports were excluded as they did not meet the inclusion criteria. One longitudinal study evaluated the value of short wavelength automated perimetry to predict the development of glaucoma.7 Another study discussed the use of magnetic resonance imaging (MRI) to differentiate between optic neuritis and non-arteritic anterior ischaemic neuropathy.8 Another study evaluated longitudinally changes in the wavefront aberration of patients with keratoconus.9 The fourth excluded paper described videokeratography findings in children with vernal keratoconjunctivitis and compared them with those of healthy children, without attempting to use these differences as a diagnostic test.10 A total of 16 studies (table 2) were included in this review (AJO = 5, Archives = 1, BJO = 2, IOVS = 2, and Ophthalmology = 6).

    Table 2 Included studies

    Glaucoma was the specialty with the highest number of studies (n = 9). Other specialties included retina (n = 4), cornea (n = 2), and neuro-ophthalmology (n = 1) (table 3). Interobserver rating agreement was observed in 92% of items. Among the 16 articles the range of fully reported positive STARD items was from eight to 19. Less than half the studies (n = 7) explicitly reported more than 50% of STARD items. Reporting of an individual STARD item ranged from 1/16 (item 24) to 16/16 (100%) (item 2 and item 25) (table 4). The commonest description of diagnostic accuracy was sensitivity and specificity values (n = 13), followed by area under the ROC curve (n = 4). The reporting of each of the items is described in table 4.

    Table 3 Details of included studies

    Table 4 Summary score of STARD items

    DISCUSSION

    In 1978 Ransohoff and Feinsten11 first reported a detailed analysis of diagnostic accuracy studies and identified the major sources of bias. Since then there have been numerous articles identifying a variety of biases as a potential source of inaccuracies in the indices of diagnostic accuracy.12–17 Reid et al12 evaluated diagnostic accuracy studies published in four prominent medical journals between 1978 and 1993. They evaluated the quality of 20 diagnostic test studies published during this period against seven methodological standards. Their study showed that quality of reporting was of moderate or low quality, and that the essential elements of data required to evaluate a study were missing in the majority of the reports. Although there had been some improvement over time, most of the diagnostic accuracy tests were inadequately reported.

    Harper and Reeves evaluated the quality of reporting of ophthalmic diagnostic tests15 published in the early and mid-1990s. They showed a limited compliance with accepted methodological standards. The compliance in ophthalmic journals was no worse than other evaluations published in general medical journals, but only 25% of articles complied with more than 50% of methodological standards

    In this current appraisal of recent ophthalmic publications using the STARD checklist, similar flaws were found. Less than 50% of articles (n = 7) reported more than half of STARD items. Information on key elements of design, conduct, analysis, and interpretation of diagnostic studies were frequently missed. To our knowledge, STARD has not been used to appraise the quality of reporting of diagnostic accuracy studies in other medical specialties.

    The importance of describing the selection of the study population in appraising a diagnostic test cannot be overemphasised (item 3). For example, Harper et al showed how indices of diagnostic accuracy of tonometry for glaucoma greatly varied depending on the characteristics of the study population.16 Most publications reported this issue properly (n = 13).

    Review bias, including test review bias (inflation of diagnostic accuracy indices by knowing the results of the gold standard while reviewing the index test), diagnostic review bias (knowledge of the outcome of the index test while reviewing gold standard), and clinical review bias (additional clinical information available to the reader, which would not normally be available when interpreting the index test results) can lead to inflation of the measures of diagnostic accuracy. Reader masking (item 11) was reported in less than half of the studies (n = 6).

    Methods for calculating test reproducibility or citation of reproducibility studies (item 13) was among the least commonly reported items from the STARD checklist (n = 2). There may be a lack of understanding of effects of poor reproducibility on the final outcome of a diagnostic accuracy test.

    Verification or workup bias (item 16) occurs when gold standard test is performed only on people who have already tested positive for the index test.3 It is important to describe how many patients satisfying inclusion criteria failed to undergo index or reference tests and the reason of failing to do so. A flow diagram is highly recommended to clearly explain this issue.2,4 This item was reported in four studies.

    Since the technology for existing tests is rapidly improving, it is important to report the actual dates when the study was performed. This will allow the reader to consider any technological advancement since the study was done. This information was provided in less than half of articles (n = 6).

    Spectrum bias results from differences in the severity of target condition and co-morbidity. Incomplete reporting of clinical spectrum (item 18) may result in inaccurate diagnostic accuracy estimates—for example, advanced disease status would lead to increased sensitivity of a diagnostic test. This item was fully reported in 10 studies.

    Confidence intervals (CIs) were reported in only a quarter (n = 4) of studies. A recent review by Harper and Reeves17 revealed that CIs were reported in only 50% of diagnostic evaluation reports published in the BMJ during the 2 year period of 1996 and 1997. Since the absolute values of diagnostic accuracy are only estimates, when evaluations of diagnostic accuracy are reported the precision of the sensitivity and specificity or likelihood ratios should be reported. Reporting of confidence interval is essential to allow a physician to know the range within which the true values of the indices are likely to lie.17

    Intermediate, indeterminate, and uninterpretable results may not always be included in final assessment of the diagnostic accuracy of a test.18 The frequency of these results, by itself, is an important pointer of the overall usefulness of the test.2 Approximately one third of studies (n = 5) reported this item (item 22).Diagnostic accuracy in subgroups was reported in only a quarter of studies (n = 4) (item 23).

    The STARD group strongly recommends use of a flow diagram to clearly communicate the design of the study and provide the exact number of participants at each stage of the study.2 A flow diagram has been a valuable addition to the report of randomised clinical trials. It has been reported that flow diagrams are associated with improved quality of reporting of randomised controlled trials.19 A flow diagram was present in only one of the evaluated studies.

    In a similar and previous effort to improve the quality of reporting of the literature, to prevent shortcomings and biases in randomised control trials, the CONSORT statement was introduced in 1995.20 Use of CONSORT has shown to improve the quality of reporting of randomised controlled trials (RCTs).21 Sanchez-Thorin et al22 compared RCTs published in Ophthalmology during 1999 with the ones published in 1991–4 before the adoption of the CONSORT statement, and found an improvement in the quality of reporting. Future research will be able to evaluate the impact of the STARD initiative on the accuracy and completeness of reporting of studies on diagnostic accuracy.

    ACKNOWLEDGEMENTS

    The Health Services Research Unit is funded by Chief Scientist Office of the Scottish Executive Health Department; the views expressed here are those of the authors.

    REFERENCES

    Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061–6.

    Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7–18.

    Deeks J. Systematic reviews in health care: systematic reviews of diagnostic tests. BMJ 2001;323:157–62.

    Bossuyt PM, Reitsma JB, Bruns DE, et al. Standards for Reporting of Diagnostic Accuracy steering group Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. BMJ 2003;326:41–4.

    STARD statement. Available at www.consort-statement.org/stardstatement.htm (accessed 19 February 2004).

    Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65–9.

    Polo V, Larrosa JM, Pinilla I, et al. Predictive value of short-wavelength automated perimetry: a 3-year follow-up study. Ophthalmology 2002;109:761–5.

    Rizzo JF III, Andreoli CM, Rabinov JD. Use of magnetic resonance imaging to differentiate optic neuritis and nonarteritic anterior ischemic optic neuropathy. Ophthalmology 2002;109:1679–84.

    Maeda N, Fujikado T, Kuroda T, et al. Wavefront aberrations measured with Hartmann-Shack sensor in patients with keratoconus. Ophthalmology 2002;109:1996–2003.

    Lapid-Gortzak R, Rosen S, Weitzman S, et al. Videokeratography findings in children with vernal keratoconjunctivitis versus those of healthy children. Ophthalmology 2002;109:2018–23.

    Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926–30.

    Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995;274:645–51.

    Nierenberg AA, Feinstein AR. How to evaluate a diagnostic marker test. Lessons from the rise and fall of dexamethasone suppression test. JAMA 1988;259:1699–702.

    Power EJ, Tunis SR, Wagner JL. Technology assessment and public health. Annu Rev Public Health 1994;15:561–79.

    Harper R, Reeves B. Compliance with methodological standards when evaluating ophthalmic diagnostic tests. Invest Ophthalmol Vis Sci 1999;40:1650–7.

    Harper R, Henson D, Reeves BC. Appraising evaluations of screening/diagnostic tests: the importance of the study populations. Br J Ophthalmol 2000;84:1198–202.

    Harper R, Reeves B. Reporting of precision of estimates for diagnostic accuracy: a review. BMJ 1999;318:1322–3.

    Simel DL, Feussner JR, DeLong ER, et al. Intermediate, indeterminate, and uninterpretable diagnostic test results. Med Decis Making 1987;7:107–14.

    Egger M, Juni P, Bartlett C, CONSORT Group. Value of flow diagrams in reports of randomized controlled trials. JAMA 2001;285:1996–9.

    CONSORT statement. Available at www.consort-statement.org (accessed 19 February 2004).

    Moher D, Jones A, Lepage L, CONSORT Group. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285:1992–5.

    Sanchez-Thorin JC, Cortes MC, Montenegro M, et al. The quality of reporting of randomized clinical trials published in Ophthalmology. Ophthalmology 2001;108:410–5.(M A R Siddiqui1, A Azuara)