External Validation of a Measurement Tool to AssessSystematic Reviews (AMSTAR)Beverley J. Shea1,2*, Lex M. Bouter3, Joan Peterson4, Maarten Boers5, Neil Andersson1,6, Zulma Ortiz7, Tim Ramsay4, Annie Bai8, Vijay K. Shukla8,Jeremy M. Grimshaw4
1 Community Information and Epidemiological Technologies (CIET), Ottawa, Ontario, Canada, 2 Institute for Research in Extramural Medicine (EMGOInstitute), Vrije Universiteit (VU) University Medical Center, Amsterdam, The Netherlands, 3 Executive Board, Vrije Universiteit (VU) UniversityAmsterdam, Amsterdam, The Netherlands, 4 Clinical Epidemiology Program, Ottawa Health Research Institute, University of Ottawa, Ontario, Canada,5 Department of Clinical Epidemiology and Biostatistics, Vrije Universiteit (VU) University Medical Center, Amsterdam, The Netherlands, 6 Centro deInvestigacio´n de Enfermedades Tropicales (CIET), Universidad Auto´noma de Guerrero, Acapulco, Mexico, 7 Epidemiological Research Institute,National Academy of Medicine, Buenos Aires, Argentina, 8 Canadian Agency for Drugs and Technologies in Health (CADTH), Ottawa, Ontario, Canada
Background. Thousands of systematic reviews have been conducted in all areas of health care. However, the methodologicalquality of these reviews is variable and should routinely be appraised. AMSTAR is a measurement tool to assess systematicreviews. Methodology. AMSTAR was used to appraise 42 reviews focusing on therapies to treat gastro-esophageal refluxdisease, peptic ulcer disease, and other acid-related diseases. Two assessors applied the AMSTAR to each review. Two otherassessors, plus a clinician and/or methodologist applied a global assessment to each review independently. Conclusions. Thesample of 42 reviews covered a wide range of methodological quality. The overall scores on AMSTAR ranged from 0 to 10 (outof a maximum of 11) with a mean of 4.6 (95% CI: 3.7 to 5.6) and median 4.0 (range 2.0 to 6.0). The inter-observer agreement ofthe individual items ranged from moderate to almost perfect agreement. Nine items scored a kappa of .0.75 (95% CI: 0.55 to0.96). The reliability of the total AMSTAR score was excellent: kappa 0.84 (95% CI: 0.67 to 1.00) and Pearson’s R 0.96 (95% CI:0.92 to 0.98). The overall scores for the global assessment ranged from 2 to 7 (out of a maximum score of 7) with a mean of 4.43(95% CI: 3.6 to 5.3) and median 4.0 (range 2.25 to 5.75). The agreement was lower with a kappa of 0.63 (95% CI: 0.40 to 0.88). Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson’s R 0.72 (95% CI: 0.53to 0.84). For the AMSTAR total score, the limits of agreement were 20.1961.38. This translates to a minimum detectabledifference between reviews of 0.64 ‘AMSTAR points’. Further validation of AMSTAR is needed to assess its validity, reliabilityand perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.
Citation: Shea BJ, Bouter LM, Peterson J, Boers M, Andersson N, et al (2007) External Validation of a Measurement Tool to Assess Systematic Reviews(AMSTAR). PLoS ONE 2(12): e1350. doi:10.1371/journal.pone.0001350
based on empirical evidence and expert consensus. A measure-
High quality systematic reviews are increasingly recognized as
ment tool to assess systematic reviews (AMSTAR) was highly rated
providing the best evidence to inform health care practice and
in a recent review (personal communication) of quality assessment
policy [1]. The quality of a review, and so its worth, depends on
instruments performed by the Canadian Agency for Drugs and
the extent to which, scientific review methods were used to
Technologies in Health (CADTH). In this study we present the
minimize the risk of error and bias. The quality of published
results of an external validation of AMSTAR using data from a
reviews can vary considerably, even when they try to answer the
series of systematic reviews obtained from the gastroenterology
same question [2]. As a result, it is necessary to appraise their
quality (as is done for any research study) before the results areimplemented into clinical or public health practice. Much has
been written on how best to appraise systematic reviews, and while
The characteristics and basic properties of the instrument have
there is some variation on how this is achieved, most agree on key
been described elsewhere [7]. Briefly, a 37-item initial assessment
components of the critical appraisal [3]. Methodological quality
tool was formed by combining a) the enhanced Overview Quality
can be defined as the extent to which the design of a systematic
Assessment Questionnaire (OQAQ) scale, b) a checklist created by
review will generate unbiased results [4].
Sacks, and c) three additional items recently judged by experts in
Several instruments exist to assess the methodological quality of
systematic reviews [5], but not all of them have been developedsystematically or empirically validated and have achieved general
Academic Editor: Joel Gagnier, University of Toronto, Canada
acceptance. The authors of this paper acknowledge that the
Received April 17, 2007; Accepted October 22, 2007; Published December 26,
methodological quality and reporting quality for systematic reviews
is very different. The first, methodological quality, considers how well thesystematic review was conducted (literature searching, pooling of
Copyright: ß 2007 Shea et al. This is an open-access article distributed underthe terms of the Creative Commons Attribution License, which permits
data, etc.). The second, reporting quality, considers how well systematic
unrestricted use, distribution, and reproduction in any medium, provided the
reviewers have reported their methodology and findings. Existing
original author and source are credited.
instruments often try to include both types of methods without being
Funding: The authors have no support or funding to report.
conceptually clear about the differences.
In an attempt to achieve some consistency in the evaluation of
Competing Interests: The authors have declared that no competing interestsexist.
systematic reviews we have developed a tool to assess theirmethodological quality. This builds on previous work [6], and is
* To whom correspondence should be addressed. E-mail: [email protected]
the field to be of methodological importance. In its development
as less than chance agreement; 0.01–0.20 slight agreement; 0.21–
phase the instrument was applied to 99 paper-based and 52
0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80
electronic systematic reviews [6] [7]. Exploratory factor analysis
substantial agreement; and 0.81–0.99 almost perfect agreement
was used to identify underlying components. The results were
[52], [57]. We calculated PHI W for each question [55], [58].
considered by methodological experts using a nominal groupprocess to reduce the number of items and design an assessment
tool with face and content validity. This process lead to an 11-item
We assessed construct validity (i.e. evaluation of a hypothesis about
instrument [7]. A description of the instrument is provided in
the expected performance of an instrument) by converting the
total mean score (mean of the two assessors) for each of the 42reviews to a percentage of the maximum score for AMSTAR and
of the maximum score of the global assessment instrument. We
For our validation test set we chose to use systematic reviews or
used Pearson’s Rank correlation coefficients, Pearson’s R and
meta-analyses in the area of gastroenterology, specifically upper
Kruskal-Wallis test to further explore the impact of the following
gastrointestinal. CADTH’s informational specialist searched
items on the construct validity of AMSTAR: a) Cochrane
electronic bibliographic databases (i.e. Medline, Central and
systematic review vs. non-Cochrane systematic reviews [59],
EMBASE) up to and including 2005. A total of 42 systematic
[60], b) journal type [61], c) year of publication [62], d) conflict
reviews met the a priori criteria and were included [8]. This sample
of interest [63], e) impact factor [64], and number of pages [64].
included seven electronic Cochrane systematic reviews and 35
We studied these in the context of a priori hypotheses concerning
paper-based non-Cochrane reviews. The topics of the reviews
the correlation of AMSTAR scores. Because of the nature of their
ranged across the spectrum of GI problems like dyspepsia, gastro-
development, we anticipated that Cochrane systematic reviews
esophageal reflux disease (GERD), peptic ulcer disease (PUD), and
would have higher quality scores than non-Cochrane systematic
also GI drug interventions such as H2 receptor antagonists and
reviews and those electronic or general journals would score
higher than specialist journals. We reported on impact factors for
Two CADTH assessors from two review groups (SS and FA, AL
these journals. We hypothesized that reviews published more
and CY) independently applied AMSTAR to each review and
recently would be of higher quality than those published earlier. In
reached agreement on the assessment results. To assess construct
addition, we anticipated that reviews declaring a conflict of interest
validity, two reviewers (JP, ZO) plus a clinician and/or
might have lower quality scores [63], [64].
methodologist (MB, DF, DP, MO, and DH) applied a global
We assessed the practicability of the new instrument by recording
assessment to each review [51] (Annex S2).
the time it took to complete scoring and the instances where scoringwas difficult. We interviewed assessors (N = 6) to obtain data on
clarity, ambiguity, completeness and user-friendliness.
We calculated an overall agreement score using the weighted
We used SPSS (versions 13 and 15) and MedCalc for Windows,
Cohen’s kappa, as well as one for each item [52] (Table 1). Bland
and Altman’s limits of agreement methods were used to displayagreement graphically [53], [54] (Fig. 1). We calculated the
percentage of the theoretical maximum score. Pearson’s Rank
The 42 reviews included in the study had a wide range of quality
correlation coefficients were used to assess reliability of this total
scores. The overall scores estimated by the AMSTAR instrument
score. For comparisons of rating the methodological quality we
ranged from 0 to 10 (out of a maximum of 11) with a mean of 4.6
calculated chance-corrected agreement (using kappa) and chance-
(95% CI: 3.7 to 5.6; median 4.0 (range 2.0 to 6.0). The overall
independent agreement (using W) [52], [55], [56]. We accepted a
scores for the global assessment instrument ranged from 2 to 7 (out
correlation of .0.66. We further scrutinized items and reviews
of a maximum score of seven) with a mean of 4.43 (95% CI: 3.6 to
with kappa scores below 0.66 [52]. Kappa values of less than 0 rate
5.3) and median 4.0 (range 2.5 to 5.3).
. . Table 1. Assessment of the inter-rater agreement for AMSTAR
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 1. Was an ‘a priori’ design provided?
. 2. Was there duplicate study selection and data extraction?
. 3. Was a comprehensive literature search performed?
4. Was the status of publication (i.e. grey literature) used as an inclusion criterion?
. 5. Was a list of studies (included and excluded) provided?
6. Were the characteristics of the included studies provided?
7. Was the scientific quality of the included studies assessed and documented?
. . 8. Was the scientific quality of the included studies used appropriately in formulating conclusions?
. 9. Were the methods used to combine the findings of studies appropriate?
. 10. Was the likelihood of publication bias assessed?
. 11. Were potential conflicts of interest included?
. . doi:10.1371/journal.pone.0001350.t001
Conflict of interest was poorly presented. Of the 42 reviews
assessed, no study had appropriately declared their conflict ofinterest. Therefore, we were unable to assess whether or notfunding had a positive or negative effect on the AMSTAR score.
PracticabilityBoth AMSTAR and the global assessment required on average15 minutes to complete, but with the latter, assessors expresseddifficulty in reaching a final decision in the absence of comprehensiveguidelines. In contrast, AMSTAR was well received.
DISCUSSIONPrincipal findingsThis paper describes an external validation of AMSTAR. Thisnew measurement tool to assess methodological quality ofsystematic reviews showed satisfactory inter-observer agreement,
Figure 1. Bland and Altman limits of agreement plot for AMSTAR
reliability and construct validity in this study. Items in AMSTAR
displayed levels of agreement that ranged from moderate to almost
perfect. The reliability of the total AMSTAR score was excellent. Construct validity was shown by AMSTAR convergence with the
results of the global assessment instrument.
The reliability of the total AMSTAR score between two assessors
We found a significant association between number of published
(the sum of all items answered ‘yes’ scored as 1, all others as 0) was
pages and overall AMSTAR score, suggesting that the longer the
(kappa 0.84 (95% CI: 0.67 to 1.00, W = 0.85) and Pearson’s R 0.96
manuscript, the higher the quality score. It should be interpreted
(95% CI: 0.92 to 0.98). The inter-rater agreement (kappa) between
with caution given the fact that only a couple of the longer reviews
two raters, for the global assessment was 0.63 (95% CI: 0.40 to
largely drive the hypothesis tests. We found no association when
the outliers were removed from the dataset. We did not find an
Items in AMSTAR displayed levels of agreement that ranged
association between AMSTAR score and impact factor.
from moderate to almost perfect; nine items scored a kappa of
The AMSTAR instrument was developed pragmatically using
.0.75 (0.55 to 0.96 (and W .0.76). Item 4 had a kappa of 0.64
previously published tools and expert consensus. The original 37
(0.40 to 0.88) W = 0.64 and item 8 a kappa of 0.51(0.25 to 0.78
items were reduced to an 11- item instrument addressing key
W = 0.56). The reliability of the total AMSTAR score was
domains; the resulting instrument was judged by the expert panel
excellent (kappa 0.84 (95% CI: 0.67 to 1.00 and Pearson’s R
to have face and content validity [7].
0.96 (95% CI: 0.92 to 0.98). For the AMSTAR total score, thelimits of agreement were 20.1961.38 (Fig. 1).
The mean age of our reviewers was 40.57, median 43. Fifty-
This is a prospective external validation study. We compared the
seven percent were identified as experts in methodology and 43%
new instrument to an independent and reliable gold standard
were identified as content experts in the field.
designed for assessing the quality of systematic reviews, allowingmultiple testing of convergent validity.
The analytical methods for assessing quality and measuring
Expressed as a percentage of the maximum score, the results of
agreement amongst assessors need further discussion and devel-
AMSTAR converged with the results of the global assessment
opment. We calculated chance-corrected agreement, using the
instrument [Pearson’s Rank Correlation Coefficient 0.72 (95% CI:
kappa statistic [57], [65]. While avoiding high levels of agreement
0.53 to 0.84)]. AMSTAR scoring also upheld our other a priori
due to chance, kappa has its own limitations that have lead to
hypotheses. The sub-analysis revealed that Cochrane reviews had
academic criticism [66], [67]. One of the major difficulties with
significantly higher scores than paper-based reviews with a
kappa is that when the proportion of positive ratings is extreme,
(R = 37.21 n = 7) for Cochrane reviews and (R = 18.36 n = 35)
the possible agreement above chance agreement is small and it is
for paper-based (P,0.0002). Cochrane reviews (R = 37.21 n = 7)
difficult to achieve even moderate values of kappa. Thus, if one
also scored higher than reviews published in general journals
uses the same raters in a variety of settings, as the proportion of
(R = 25.77 n = 11) and specialty journals (R = 14.96, n = 24)
positive ratings becomes extreme, kappa will decrease even if the
(P,0.0001). Reviews published from 2000 onward had higher
manner in which the assessors rate the quality does not change. To
AMSTAR scores than earlier reviews (R = 25.20, n = 25 vs.
address this limitation, we also calculated chance-independent
agreement using PHIW, a relatively new approach to assessing
The journals had the following overall summary statistics for the
impact factors: mean 5.88 (95% CI: 3.9 to 7.9) median 3.3 (lowest
We were unable to test our convergent validity hypothesis about
value 1.4, highest value 23.9). There is no statistical association
conflict of interest because of missing data in the systematic
between AMSTAR score and impact factor (Pearson’s R (0.555
reviews and primary studies. This highlights the need for journals
P = 0.7922)). There was however a significant association found
and journal editors to require that the information is provided.
with the number of pages and AMSTAR scores (Pearson’s R
Our results are based on a small sample of systematic reviews in
(0.5623 P = 0.0001 n = 42). We found no association (R 0.1773
a particular clinical area and a relatively small number of
P = 0.0308) when we removed the outliers (i.e. systematic reviews
AMSTAR assessors. There is a need for replication in larger
and different data sets with more diverse appraisers.
Possible mechanisms and implications for clinicians
assess the responsiveness of AMSTAR looking at its sensitivity todiscriminate between high and low methodological quality
Existing systematic review appraisal instrument did not reflect
We need to assess the applicability of AMSTAR for reviews of
current evidence on potential sources of bias in systematic reviews
observational (diagnostic, etiological and prognostic) studies and if
and were generally not validated. The best available instrument
necessary develop AMSTAR extensions for these reviews.
prior to the development of AMSTAR was OQAQ which was
We plan to update AMSTAR as new evidence regarding
formally validated. However, users of OQAQ frequently had to
sources of bias within systematic reviews becomes available.
develop their own rules for operationalizing the instrument andOQAQ does not reflect current evidence on sources of potentialbias in systematic reviews (for example funding source and conflict
AMSTAR is a measurement tool created to assess the
Quality assessment instruments can focus on either reporting
methodological quality of systematic reviews.
quality (how well systematic reviewers have reported their
Found at: doi:10.1371/journal.pone.0001350.s001 (0.04 MB
methodology and findings (internal validity) or methodological quality
(how well the systematic review was conducted (literaturesearching, pooling of data, etc.). It is possible for a systematic
review with poor methodological quality to have good reporting
Found at: doi:10.1371/journal.pone.0001350.s002 (0.03 MB
quality. For this reason, the AMSTAR items focus on method-
Decision-makers have spent the last ten years trying to work out
the best way to use the enormous amounts of systematic reviews
We would like to thank our International panel of assessors: Daniel Francis,
available to them. They can hardly know where to start when
David Henry, Marisol Betancourt, Dana Paul, Martin Olmos, and our
deciding whether the relevant literature is valid and of the highest
local team of assessors: Sumeet Singh, Avtar Lal, Changhua Yu, Fida
quality. AMSTAR is a user friendly methodological quality
Ahmed. We also thank Dr. Giuseppe G.L. Biondi-Zoccai and Crystal
assessment that has the potential to standardize appraisal of
Huntly-Ball for their helpful suggestions on this manuscript.
systematic reviews. Early experience suggests that relevant groupsare finding the instrument useful.
Conceived and designed the experiments: JG MB BS NA LB. Performed
the experiments: ZO JP VS AB. Analyzed the data: BS TR. Wrote the
Further validation of AMSTAR is needed to assess its validity,
paper: ZO JG MB BS LB. Other: Designed the study: JS. Wrote the first
reliability and perceived utility by appraisers and end users of
reviews across a broader range of systematic reviews. We need to
1. Young D (2005) Policymakers, experts review evidence-based medicine. Am. J.
13. Van Pinxteren B, Numans ME, Bonis PA, Lau J (2004) Short-term treatment
with proton pump inhibitors, H2-receptor antagonists and prokinetics for gastro-
2. Dolan-Mullen P, Ramı´rez G (2006) The Promise and Pitfalls of Systematic
oesophageal reflux disease-like symptoms and endoscopy negative reflux disease.
Reviews. Annual Review of Public Health 27: 81–102.
Cochrane Database Syst Rev (3): CD002095.
3. Oxman AD, Guyatt GH (1991) Validation of an index of the quality of review
14. Rostom A, Dube C, Wells G, Tugwell P, Welch V, et al. (2002) Prevention of
articles. J Clin Epidemiol 44(11): 1271–78.
NSAID-induced gastroduodenal ulcers. Cochrane Database Syst Rev (4):
4. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, et al. (1995) Assessing
the quality of randomized controlled trials: an annotated bibliography of scales
15. Laheij RJ, van Rossum LG, Jansen JB, Straatman H, Verbeek AL (1999)
and checklists. Control Clin Trials 16(1): 62–73.
Evaluation of treatment regimens to cure Helicobacter pylori infection: a meta-
5. Shea B, Dube C, Moher D (2001) Assessing the quality of reports of systematic
analysis. Aliment Pharmacol Ther 13(7): 857–64.
reviews: the QUOROM statement compared to other tools. Systematic review
16. Carlsson R, Galmiche JP, Dent J, Lundell L, Frison L (1997) Prognostic factors
in health care meta-analysis in context. London: BMJ Books (7): 122–39.
influencing relapse of oesophagitis during maintenance therapy with antisecre-
6. Shea B (1999) Assessing the quality of reporting meta-analyses of randomized
tory drugs: a meta-analysis of long-term omeprazole trials. Aliment Pharmacol
controlled trials. MSc thesis. University of Ottawa, Department of Epidemiology
17. Chiba N (1997) Proton pump inhibitors in acute healing and maintenance of
7. Shea B, Grimshaw JM, Wells GA, Boers M, Andersson N, et al. (2007)
erosive or worse esophagitis: a systematic overview. Can J Gastroenterol 11
Development of AMSTAR: A Measurement Tool to Assess Systematic Reviews.
BMC Medical Research Methodology 7: 10, doi:10.1186/1471-2288-7-10.
18. Delaney B, Moayyedi P, Deeks J, Innes M, Soo S, et al. (2000) The management
8. Singh S, Bai A, Lal A, Yu C, Ahmed F, et al. (2006) Developing evidence-based
of dyspepsia: a systematic review. Health Technol Assess 4(39); i,iii-189.
best practices for the prescribing and use of proton pump inhibitors in Canada.
Available: http://www.ncchta.org/execsumm/summ439.htm.
Ottawa, Canada: The Canadian Agency for Drugs and Technologies in Health
19. Moayyedi P, Soo S, Deeks J, Delaney B, Harris A, et al. (2005) Eradication of
Helicobacter pylori for non-ulcer dyspepsia. Cochrane Database Syst Rev (1):
9. Chiba N, De Gara CJ, Wilkinson JM, Hunt RH (1997) Speed of healing and
symptom relief in grade II to IV gastroesophageal reflux disease: a meta-analysis.
20. Moayyedi P, Soo S, Deeks J, Delaney B, Innes M, et al. (2005) Pharmacological
Gastroenterology 112(6): 1798–810.
interventions for non-ulcer dyspepsia. Cochrane Database Syst Rev (1):
10. Caro JJ, Salas M, Ward A (2001) Healing and relapse rates in gastroesophageal
CD001960. Available: http://www.mrw.interscience.wiley.com/cochrane/
reflux disease treated with the newer proton-pump inhibitors lansoprazole,
clsysrev/articles/CD001960/pdf_fs.html (accessed 2006 Feb 16).
rabeprazole, and pantoprazole compared with omeprazole, ranitidine, and
21. Delaney BC, Moayyedi P, Forman D (2003) Initial management strategies for
placebo: evidence from randomized clinical trials. Clin Ther 23(7): 998–1017.
dyspepsia. Cochrane Database Syst Rev (2): CD001961.
11. Klok RM, Postma MJ, van Hout BA, Brouwers JR (2003) Meta-analysis:
22. Hopkins RJ, Girardi LS, Turney EA (1996) Relationship between Helicobacter
comparing the efficacy of proton pump inhibitors in short-term use. Aliment
pylori eradication and reduced duodenal and gastric ulcer recurrence: a review.
12. Van Pinxteren B, Numans ME, Lau J, de Wit NJ, Hungin AP, et al. (2003)
23. Huang JQ, Sridhar S, Hunt RH (2002) Role of Helicobacter pylori infection and
Short-term treatment of gastroesophageal reflux disease. J Gen Intern Med
non-steroidal anti-inflammatory drugs in peptic-ulcer disease: a meta-analysis.
24. Moayyedi P, Soo S, Deeks J, Forman D, Mason J, et al. (2000) Systematic review
45. Mulder CJ, Schipper DL (1990) Omeprazole and ranitidine in duodenal ulcer
and economic evaluation of Helicobacter pylori eradication treatment for non-
healing. Analysis of comparative clinical trials. Scand J Gastroenterol Suppl 178:
ulcer dyspepsia. Dyspepsia Review Group. BMJ 321(7262): 659–64.
25. Jovell AJ, Aymerich M, Garcia Altes A, Serra Prat M (1998) Clinical practice
46. Shiau JY, Shukla VK, Dube´ C (2002) The efficacy of proton pump inhibitors in
guideline for the eradicating therapy of Helicobacter pylori infections associated
adults with functional dyspepsia. Ottawa: Canadian Coordinating Office for
to duodenal ulcer in primary care. Barcelona: Catalan Agency for Health
Technology Assessment. Available: http://www.gencat.net/salut/depsan/
47. Danesh J, Lawrence M, Murphy M, Roberts S, Collins R (2005) Systematic
review of the epidemiological evidence on Helicobacter pylori infection and non-
26. Gisbert JP, Gonza´lez L, Calvet X, Garcı´a N, Lo´pez T (2000) Proton pump
ulcer or uninvestigated dyspepsia. Arch Intern Med 160(8): 1192–98.
inhibitor, clarithromycin and either amoxycillin or nitroimidazole: a meta-
48. Gibson PG, Henry RL, Coughlan JL (2005) Gastro-esophageal reflux treatment
analysis of eradication of Helicobacter pylori. Aliment Pharmacol Ther 14(10):
for asthma in adults and children. Cochrane Database Syst Rev (3): 1–27.
49. Fischbach LA, Goodman KJ, Feldman M, Aragaki C (2002) Sources of variation
27. Calvet X, Garcı´a N, Lo´pez T, Gisbert JP, Gene´ E, et al. (2000) A meta-analysis
of helicobacter pylori treatment success in adults worldwide: a meta-analysis.
of short versus long therapy with a proton pump inhibitor, clarithromycin and
either metronidazole or amoxycillin for treating Helicobacter pylori infection.
50. Ford A, Delaney B, Moayyedi P (2003) Eradication therapy for peptic ulcer
disease in helicobacter pylori positive patients. Cochrane Database Syst Rev (4):
Aliment Pharmacol Ther 14(5): 603–09.
28. Gene´ E, Calvet X, Azagra R, Gisbert JP (2003) Triple vs. quadruple therapy for
51. Oxman AD, Guyatt GH (1991) Validation of an index of the quality of review
treating Helicobacter pylori infection: a meta-analysis. Aliment Pharmacol Ther
articles. J Clin Epidemiol 44(11): 1271–78.
52. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol
29. Huang J, Hunt RH (1999) The importance of clarithromycin dose in the
management of Helicobacter pylori infection: a meta-analysis of triple therapies
53. Bland JM, Altman DG (1986) Statistical methods for assessing agreement
with a proton pump inhibitor, clarithromycin and amoxycillin or metronidazole.
between two methods of clinical Measurement. Lancet i: 307–10.
Aliment Pharmacol Ther 13(6): 719–29.
54. Bland JM, Altman DG (1987) Statistical methods for assessing agreement
30. Leodolter A, Kulig M, Brasch H, Meyer Sabellek W, Willich SN, et al. (2001) A
between measurement. Biochimica Clinica 11: 399–404.
meta-analysis comparing eradication, healing and relapse rates in patients with
55. Meade M, Cook R, Guyatt G, Groll R, Kachura J, et al. (2000) Interobserver
Helicobacter pylori-associated gastric or duodenal ulcer. Aliment Pharmacol
Variation in Interpreting Chest Radiographs for the Diagnosis of Acute
Respiratory Distress Syndrome. Am. J. Respir. Crit. Care Med 161(1): 185–90.
31. Moayyedi P, Murphy B (2001) Helicobacter pylori: a clinical update. J Appl
56. Uebersax JS (1987) Diversity of decision-making models and the measurement of
inter-rater agreement. Psychological Bulletin 101: 140–46.
32. Oderda G, Rapa A, Bona G (2000) A systematic review of Helicobacter pylori
57. Cohen J (1968) Weighted kappa: Nominal scale agreement with provision for
eradication treatment schedules in children. Aliment Pharmacol Ther 14(Suppl
scaled disagreement or partial credit. Psychol Bull 70: 213–220.
58. McGinn T, Guyatt G, Cook R, Meade M (2002) Diagnosis: measuring
33. Schmid CH, Whiting G, Cory D, Ross SD, Chalmers TC (1999) Omeprazole
agreement beyond chance. In: Guyatt G, Rennie D, eds. Users’ guide to the
plus antibiotics in the eradication of Helicobacter pylori infection: a meta-
medical literature. A manual for evidence-based clinical practice. Chicago, IL:
regression analysis of randomized, controlled trials. Am J Ther 6(1): 25–36.
34. Unge P, Berstad A (1996) Pooled analysis of anti-Helicobacter pylori treatment
59. Moja LP, Telaro E, D’Amico R, Moschetti I, Coe L, et al. (2005) Assessment of
regimens. Scand J Gastroenterol Suppl 220: 27–40.
methodological quality of primary studies by systematic reviews: results of the
35. Unge P (1998) Antimicrobial treatment of H. pylori infection: a pooled efficacy
metaquality cross sectional study. BMJ Publishing Group Ltd. 330(7499): 1053.
analysis of eradication therapies. Eur J Surg Suppl 582: 16–26.
60. Shea B, Moher D, Graham I, Pham B, Tugwell P (2002) A comparison of the
36. Unge P (1997) What other regimens are under investigation to treat
quality of Cochrane reviews and systematic reviews published in paper-based
Helicobacter pylori infection? Gastroenterology 113(6 Suppl): S131–S148.
journals. Evaluation & the Health Professions 25(1): 116–29.
37. Vallve M, Vergara M, Gisbert JP, Calvet X (2002) Single vs. double dose of a
61. Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG (2007) Epidemiology
proton pump inhibitor in triple therapy for Helicobacter pylori eradication: a
and Reporting Characteristics of Systematic Reviews. PLoS Med 4(3): e78,
meta-analysis. Aliment Pharmacol Ther 16(6): 1149–56.
38. Veldhuyzen van Zanten SJ, Sherman PM (1994) Indications for treatment of
62. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC (1987) Meta-
Helicobacter pylori infection: a systematic overview. CMAJ 150(2): 189–98.
analyses of randomized controlled trials. New England Journal of Medicine 316:450–54.
39. Tre´panier EF, Agro K, Holbrook AM, Blackhouse G, Goeree R, et al. (1998)
63. Bero LA (2005) Managing financial conflicts of interest in research. Journal of
Meta-analysis of H pylori (HP) eradication rates in patients with duodenal ulcer
the American College of Dentists 72(2): 4–9.
(DU). Can J Clin Pharmacol. 5(1): 67.
64. Biondi-Zoccai G, Lotrionte M, Abbate A, Testa L (2006) Compliance with
40. Bamberg P, Caswell CM, Frame MH, Lam SK, Wong EC (1992) A meta-
QUOROM and quality of reporting of overlapping meta-analyses on the role of
analysis comparing the efficacy of omeprazole with H2-receptor antagonists for
acetylcysteine in the prevention of contrast associated nephropathy: case study.
acute treatment of duodenal ulcer in Asian patients. J Gastroenterol Hepatol
65. Fleiss JL (1971) Measuring nominal scale agreement among many raters.
41. Di Mario F, Battaglia G, Leandro G, Grasso G, Vianello F, et al. (1996) Short-
term treatment of gastric ulcer: a meta-analytical evaluation of blind trials. Dig
66. McClure M, Willett W (1987) Misinterpretation and misuse of the kappa
statistic. Am. J. Epidemiol. 126: 161–169.
42. Eriksson S, Langstrom G, Rikner L, Carlsson R, Naesdal J (1995) Omeprazole
67. Cook RJ, Farewell VT (1995) Conditional inference for subject-specific and
and H2-receptor antagonists in the acute treatment of duodenal ulcer, gastric
marginal agreement: two families of agreement measures. Can. J. Stat 23:
ulcer and reflux oesophagitis: a meta-analysis. Eur J Gastroenterol Hepatol 7(5):
68. Barnes DE, Bero LA (1998) Why review articles on the health effects of passive
43. Poynard T, Lemaire M, Agostini H (1995) Meta-analysis of randomized clinical
smoking reach different conclusions. JAMA 279: 1566–1570.
trials comparing lansoprazole with ranitidine or famotidine in the treatment of
69. Cho MK, Bero LA (1996) The quality of drug studies published in symposium
acute duodenal ulcer. Eur J Gastroenterol Hepatol 7(7): 661–65.
proceedings. Ann Intern Med 124: 485–489.
44. Laine L, Schoenfeld P, Fennerty MB (2001) Therapy for Helicobacter pylori in
70. Lexchin J, Bero LA, Djulbegovic B, Clark O (2003) Pharmaceutical industry
patients with nonulcer dyspepsia: a meta-analysis of randomized, controlled
sponsorship and research outcome and quality: systematic review. BMJ 326:
trials. Ann Intern Med 134(5): 361–9.
Publication summary Efficacy of modified-release versus standard prednisone to reduce duration of morning stiffness of the joints in rheumatoid arthritis (CAPRA-1): a double-blind, randomised controlled trial. Buttgereit F, Doering G, Schaeffler A, et al. Lancet 2008; 371(9608):205-14. Background and key findings It was proposed that by administering glucocorticoids of RA compared w
www.journals.elsevierhealth.com/periodicals/theImproved vitrification method allowing direct transferF. Guignot , A. Bouttier , G. Baril P. Salvetti P. Pignon J.F. Beckers , J.L. Touze´ , J. Cognie´ , A.S. Traldi a INRA-CNRS-Universite´ de Tours-Haras Nationaux, Physiologie de la Reproduction et des Comportements, 37380 Nouzilly, Franceb Universite´ de Lie`ge, Faculte´ de Me´decin