ATLA 35, 641–659, 2007 641
Systematic Reviews of Animal Experiments DemonstratePoor Human Clinical and Toxicological Utility
Animal Consultants International, London, UKSummary — The assumption that animal models are reasonably predictive of human outcomes provides the basis for their widespread use in toxicity testing and in biomedical research aimed at developing cures for human diseases. To investigate the validity of this assumption, the comprehensive Scopus biomedical bibliographic databases were searched for published systematic reviews of the human clinical or toxico- logical utility of animal experiments. In 20 reviews in which clinical utility was examined, the authors con- cluded that animal models were either significantly useful in contributing to the development of clinical interventions, or were substantially consistent with clinical outcomes, in only two cases, one of which was contentious. These included reviews of the clinical utility of experiments expected by ethics committees to lead to medical advances, of highly-cited experiments published in major journals, and of chimpanzee experiments — those involving the species considered most likely to be predictive of human outcomes. Seven additional reviews failed to clearly demonstrate utility in predicting human toxicological outcomes, such as carcinogenicity and teratogenicity. Consequently, animal data may not generally be assumed to be substantially useful for these purposes. Possible causes include interspecies differences, the distortion of outcomes arising from experimental environments and protocols, and the poor methodological quality of many animal experiments, which was evident in at least 11 reviews. No reviews existed in which the major- ity of animal experiments were of good methodological quality. Whilst the effects of some of these prob- lems might be minimised with concerted effort (given their widespread prevalence), the limitations resulting from interspecies differences are likely to be technically and theoretically impossible to overcome. Non-animal models are generally required to pass formal scientific validation prior to their regulatory acceptance. In contrast, animal models are simply assumed to be predictive of human outcomes. These results demonstrate the invalidity of such assumptions. The consistent application of formal validation studies to all test models is clearly warranted, regardless of their animal, non-animal, historical, contem- porary or possible future status. Likely benefits would include, the greater selection of models truly pre- dictive of human outcomes, increased safety of people exposed to chemicals that have passed toxicity tests, increased efficiency during the development of human pharmaceuticals and other therapeutic interven- tions, and decreased wastage of animal, personnel and financial resources. The poor human clinical and toxicological utility of most animal models for which data exists, in conjunction with their generally sub- stantial animal welfare and economic costs, justify a ban on animal models lacking scientific data clearly establishing their human predictivity or utility. Key words: animal experiment, animal study, clinical trial, human outcome, systematic review. Address for correspondence: Andrew Knight, Animal Consultants International, 91 Vanbrugh Court, Wincott Street, London SE11 4NR, UK. E-mail: [email protected]
United States alone, that many millions of animalsare used worldwide, and that certain trends areresulting in an increase in laboratory animal use. Trends in laboratory animal use
Standards for the reporting of laboratory animal usevary internationally, with many countries failing to
European Commission (EC) statistics on laboratory
record or publicise statistics on animal use at all. Of
animal use in 25 EU Member States, revealed that
those that do, most record only live animal use, and
12,117,583 animals were used in 2005, the latest
fail to record the substantial numbers of animals that
reporting period (except for France, which provided
may be killed prior to certain procedures, such as dis-
figures for 2004). The majority of these were mice
section or the collection of organs, tissues or cells.
(53.1%), rats (19.3%), cold-blooded animals (15.1%,
Hence, making realistic annual estimates of animal
consisting of fish [primarily], amphibians and rep-
use within biomedical research and toxicity testing is
tiles), and birds (5.4%). As in previous years,
difficult. Despite these limitations, it remains clear
France, Germany and the UK reported the greatest
from consideration of the European Union (EU) and
Program, and the Voluntary Children’s ChemicalEvaluation Program. The 2003 EC proposal for the
In the USA, laboratory animal use is federally reg-
Registration, Evaluation and Authorisation of
ulated by the Animal Welfare Act 1966 (amended in
Chemicals (REACH), similarly aims to assess the
1985), which excludes laboratory-bred mice and
toxicity of chemicals produced or imported in high
rats, as well as non-mammals, from consideration
quantities (15–20). It is reported that the HPV pro-
or protection (2, 3), despite the fact that mice and
gramme, for example, has already subjected over
rats comprise the overwhelming majority of all lab-
150,000 animals to chemical tests (21).
oratory subjects. This impedes the accurate estima-tion of laboratory animal use in the USA. Forexample, although 1,012,713 regulated animals
Claims supporting laboratory animal use
were used in the Fiscal Year 2006 (4), the latestreporting period, Carbone (5) estimated that in
Biomedical research using laboratory animals is
excess of 100 million mice are used annually. This
highly controversial. Advocates frequently claim
represents a dramatic increase from the 17–22 mil-
that such research is vital for preventing, curing or
lion vertebrates used in the mid-1980s (6).
alleviating human diseases (e.g. 22, 23), that thegreatest achievements of medicine have been possi-ble only due to the use of animals (e.g. 24), and that
the complexity of humans requires nothing lessthan the complexity of laboratory animals to serve
In recent years, the previous steady decreases in
as an effective model during biomedical investiga-
laboratory animal use have been reversed, in some
tions (e.g. 25). They even claim that medical
countries, mostly as a result of dramatic increases
progress would be “severely maimed by prohibition
in the use of genetically-modified (GM) animals. or severe curtailing of animal experiments,” and
The production of these GM animals requires sub-
that “catastrophic consequences would ensue” (26).
stantial breeding, which serves to further increase
However, such claims are hotly contested (e.g.
the numbers of animals used. Within the UK, for
27), and the right of humans to experiment on ani-
example, a steady and significant reduction since
mals has also been strongly contested philosophi-
1976 stabilised during the early 1990s, and then
cally (e.g. 28, 29). A growing body of empirical
reversed. 3,012,032 procedures on living, regulated
evidence also casts doubt upon the scientific utility
animals (vertebrates and one species of octopus,
of animals as experimental models of humans. Octopus vulgaris) were conducted in 2006, the high-est number for around 15 years (7). Greater breed-ing and use of GM animals have contributed to
Clinical utility of animal models: case studies
these increasing numbers (8, 9). In 1995, GM ani-mals were used in 8% of regulated procedures. In
Within the field of pharmaceutical development,
2004, the total was 32%, and in 2006 it was 34%
case studies exemplifying differing human and ani-
(1,035,343 regulated procedures; 7). Increased GM
mal outcomes — sometimes with severe adverse
animal use has also been recorded in Germany (10)
consequences for human patients — are sufficiently
and Switzerland (11), where total animal use is also
numerous to fill entire book chapters (e.g. 30, 31).
A recent notorious example was TGN1412 (also
known as CD28-SuperMAB), a fully-humanisedmonoclonal antibody (i.e. a product developed in a
non-human species and protein-engineered to pos-sess specifically-human characteristics) that was
Recently-initiated, large-scale chemical testing pro-
undergoing development for the treatment of
grammes are also important drivers of the recent
inflammatory conditions, such as leukaemia and
and probable substantial future increases in labora-
rheumatoid arthritis (32, 33). During a Phase I clin-
tory animal use (13, 14). These programmes are
ical trial in the UK in 2006, TGN1412 caused severe
intended to rectify existing knowledge gaps with
adverse reactions, culminating in organ failure
regard to the toxicity of chemicals that are pro-
requiring intensive care, in all six volunteers given
duced or imported into the EU and the USA in par-
the drug, with one volunteer suffering permanent
ticularly high quantities (or that otherwise give rise
damage. These effects occurred despite the admin-
to special concerns), and are likely to result in the
istration of an expected sub-clinical dose of 0.1
use of unprecedented numbers of animals in toxic-
mg/kg — 500 times lower than the 50 mg/kg dose
ity testing. Included are three programmes initiated
found not to cause adverse effects in cynomolgus
monkeys. Tests on rhesus macaques, rats and mice
Environmental Protection Agency (EPA) since
also failed to reveal adverse effects (34, 35).
1998: the High Production Volume (HPV) Chall
Another recent notorious example was the arthri-
enge Program, the Endocrine Disruptor Screening
tis drug, Vioxx, which appeared to be safe, and even
Poor human clinical and toxicological utility of animal experiments 643
beneficial to the heart, in animal studies. However,
Results and Discussion). True discordance in results
Vioxx was withdrawn from the global market in
may also arise from interspecies differences.
2004, after causing as many as 140,000 heart
Finally, the limited predictivity for wider human
attacks and strokes, and over 60,000 deaths, in the
outcomes of human clinical trials may result from
their focus on small groups of healthy young men,
Since their commercial introduction in the early
or from insufficient study durations. Particularly in
1980s, non-steroidal anti-inflammatory drugs
Phases I–II, small cohorts of young men (20–300)
(NSAIDs) have also had a problematic clinical his-
are typically used, to minimise experimental vari-
tory. Although apparently safe in year-long studies
ability and to avoid the possibility of endocrinologi-
in rhesus monkeys, benoxaprofen (Oraflex) pro-
cal disruption or other risks to women of
duced thousands of serious adverse reactions in
reproductive age. Although 1,000–3,000 volunteers
humans, including dozens of deaths, within three
may be used in Phase III trials, the final phase
months of its initial marketing (37). Fenclofenac
before marketing (50), it is nevertheless clear that
(Flenac) revealed no toxicity in ten animal species,
cohort numbers, study durations or other aspects of
yet produced severe liver toxicity in humans, and
protocol design, conduct or interpretation, are inad-
was subsequently withdrawn (38). Similar fates
equate to detect the adverse side-effects of the large
befell some other NSAIDs, including zomepirac
number of pharmaceuticals that are found to harm
(Zomax; 39), bromfenac (Duract; 40), and phenyl -
patients after marketing. Longer studies of more-
butazone (Butazolidin; 41), which produced adverse
broadly representative human populations would
human effects undetected in animal studies.
be more predictive, but would increase the time and
Numerous other pharmaceuticals have also been
cost of pharmaceutical development, and are resis-
marketed after passing limited clinical trials and
more rigorous animal testing, only to subsequentlybe found to cause serious side-effects or death inhuman patients. Examples include various antibi-
The necessity of systematic reviews
otics (e.g. chloramphenicol, clindamycin, tema flox -acin), antidepressants (e.g. nomifensine), antivirals
The premise that laboratory animal models are gen-
(e.g. idoxuridine), cardiovascular medications (e.g.
erally predictive of human outcomes is the basis for
amrinone, cerivastatin, mibefradil, ticrynafen), and
their widespread use in human toxicity testing, and
in the safety and efficacy testing of putative
Although 92% of new drugs that pass preclinical
chemotherapeutic agents and other clinical inter-
testing, which routinely includes animal tests, fail
ventions. However, the numerous cases of discor-
to reach the market because of safety or efficacy
dance between laboratory animal and human
failures in human clinical trials (45), adverse drug
outcomes suggest that this premise may well be
reactions detected after drugs have been approved
incorrect, and that the utility of animal experi-
for clinical use, nevertheless remain common. They
ments for these purposes may not be assured. On
are, indeed, sufficiently common to have been
the other hand, only small numbers of experiments
recently recorded as the 4th–6th leading cause of
are normally reviewed in case studies, and their
death in US hospitals (based on a 95% confidence
selection may be subject to bias. To provide more-
interval; 46), a rate considered by investigators to
definitive conclusions, systematic reviews of the
human clinical or toxicological utility of large num-
There are also cases of safe and efficacious
bers of animal experiments are necessary.
human pharmaceuticals that would not pass rigor-
Experiments included in such reviews should be
ous animal testing, because of severe or lethal toxi-
selected without bias, via randomisation, or simi-
city in some laboratory animal species. Notable
larly methodical and impartial means.
examples include, penicillin (e.g. 47), paracetamol
In support of this concept, Pound and colleagues
(acetaminophen; e.g. 48), and aspirin (acetylsali-
(51) commented that clinicians and the public often
cylic acid; e.g. 49). More rigorous animal testing
consider it axiomatic that animal research has con-
may well have delayed or prevented the use of these
tributed to human clinical knowledge, on the basis of
highly beneficial drugs in human patients.
anecdotal evidence or unsupported claims. These
The large number of examples of apparent differ-
constitute an inadequate form of evidence, they
ences between outcomes in laboratory animals and in
asserted, for such a controversial area of research,
human patients may be the result of several factors.
particularly given increasing competition for scarce
Flaws may occur during the pharmaceutical develop-
research resources. Hence, they called for systematic
ment and testing process, in which the design, con-
reviews to examine the human clinical utility of ani-
duct or interpretation of experiments may fail to
mal experiments, and commenced by examining six
adequately highlight the risks to human patients.
existing reviews, which did not demonstrate the clin-
Such flaws are more likely in animal studies than in
ical utility expected of the experiments in question.
human clinical trials, because the experimental qual-
Soon afterwards, the UK Nuffield Council on
ity of the former are usually significantly lower (see
Bioethics stated that, It would… be desirable toundertake further systematic reviews and meta-
tions of animal experiments toward increased
analyses to evaluate more fully the predictability
understanding of the aetiological, pathogenesic or
and transferability of animal models. They called
other aspects of human diseases, or on the clinical
for these to be undertaken by the UK Home Office,
utility of animal experiments in non-human
in collaboration with the major funders of research,
species, were excluded from consideration.
industry associations and animal protection groups(52).
Since then, several such reviews and meta-analy-
ses have been published, which collectively provideimportant insights into the human clinical and tox-
Bibliographic databases are constantly updated.
icological utility of animal models. Their identifica-
2,274 articles or reviews were retrieved, by using
tion and examination was the purpose of this
the specified search terms, on 1 March 2007. In
total, 27 systematic reviews which examined theutility of animal experiments during the develop-ment of human clinical interventions (20), or in
deriving human toxicity classifications (seven),were located. Three different approaches that
The Scopus biomedical bibliographic databases
sought to determine the maximum human clinical
were searched for systematic reviews of the human
utility that may be achieved by animal experiments,
clinical or toxicological utility of animal experi-
ments published in the peer-reviewed biomedicalliterature. Among the world’s most comprehensivedatabases, they include over 12,850 academic jour-
Clinical utility of experiments expected to
nals, 500 open access journals, 700 conference pro-
lead to medical advances
ceedings, and a total of 29 million abstracts (53). The Life Sciences database includes over 3,400
Lindl and colleagues (55, 56) examined animal exper-
titles, and the Health Sciences database includes
iments conducted at three German universities
over 5,300 titles, including all of Medline, the lead-
between 1991 and 1993, that had been approved by
ing medical and allied health profession database,
animal ethics committees, at least partly on the basis
which itself contains over 15 million citations from
of claims by researchers that the experiments might
the mid-1950s onwards, sourced from more than
lead to concrete advances toward the cure of human
5,000 biomedical journals from over 80 countries
diseases. Experiments were only included where pre-
vious studies had shown that the applications of
All abstracts, titles and key words were searched
related animal research had confirmed the hypothe-
for (animal experiment OR animal model OR ani-
ses of the researchers, and where the experiments
mal study OR animal trial) AND (clinical trial OR
had achieved publication in biomedical journals.
human outcome OR human relevance OR human
For 17 experiments meeting these inclusion crite-
result). The results were limited to articles or
ria, citations were analysed over at least 12 years.
reviews, but no chronological, language or other
Citation frequencies and types of citing papers were
limitations were applied. Titles and, where neces-
recorded: whether they were reviews or animal-
sary, abstracts or complete papers, were examined,
based, in vitro, or clinical studies. 1,183 citations
in order to locate relevant papers. Additional rele-
were evident, but only 8.2% (97 citations) were in
vant studies were obtained by examination of the
clinical publications, and only 0.3% (4 citations)
reference lists of the papers retrieved, and by con-
demonstrated a direct correlation between the
sultation with colleagues working in this field.
results of animal experiments and human out-
To minimise bias, reviews were included only
comes. However, even in these four cases, the
when they had been conducted systematically, by
hypotheses that had been verified successfully in
using randomisation or similarly methodical and
the animal experiment failed in every respect when
impartial means to select animal studies. For exam-
applied to humans. None of these 17 experiments
ple, in some cases, all the animal studies within rel-
led to any new therapies, or had any beneficial clin-
evant subsets of toxic chemical databases were
ical impact during the period examined.
As a result of their analysis, Lindl and colleagues
The examination covered only reviews which con-
called for serious, rather than cursory, evaluations
sidered the human toxicological predictivity or util-
of the likely benefits of animal experiments by ani-
ity of animal experiments, their contributions
mal ethics committees and related authorities, and
toward the development of prophylactic, diagnostic
for a reversal of the current paradigm, in which ani-
or therapeutic interventions with clear potential for
mal experiments are routinely approved. Instead of
combating human diseases or injuries, or their con-
approving experiments because of the possibility
sistency with human clinical outcomes. Reviews
that benefits might accrue, Lindl and colleagues
which focused, for example, only on the contribu-
suggested that where significant doubt exists, labo-
Poor human clinical and toxicological utility of animal experiments 645
ratory animals should receive the benefit of that
leading scientific journals, few included the random
doubt, and that such experiments should not, in
allocation of animals to test groups, any adjustment
for multiple hypothesis testing, or the blindedassessment of outcomes. Accordingly, Hackam andRedelmeier cautioned patients and physicians
Clinical utility of highly-cited animal
about the extrapolation of the findings of even
experiments
highly-cited animal research to cases of human dis-ease.
Hackam and Redelmeier (57) also used a citationanalysis, but without geographical limitations. Based on the assumption that findings from highly-
Clinical utility of chimpanzee experiments
cited animal experiments would be most likely to besubsequently tested in clinical trials, they searched
Chimpanzees are the species most closely related to
for experiments with more than 500 citations and
humans, and consequently, are considered to be the
published in the seven leading scientific journals, as
laboratory animals most likely to provide results
which are predictive of human outcomes. Hence, in
Of 76 animal studies located, with a median cita-
2005, I conducted a citation analysis of the human
tion count of 889 (range: 639–2,233), only 36.8%
clinical utility of chimpanzee experiments (59).
(28/76) were replicated in randomised human trials.
I searched three major biomedical bibliographic
18.4% (14/76) were contradicted by randomised tri-
databases, and located 749 papers published
als, and 44.7% (34/76) had not translated to clinical
between 1995 and 2004, which described experi-
trials. Ultimately, only 10.5% (8/76) of these medical
ments on captive chimpanzees or their tissues.
interventions were subsequently approved for use in
Although published in the international scientific
patients, and, as stated previously, even in these
literature, the vast majority of these experiments
cases, human benefit cannot be assumed, because
were conducted within the USA (60). To obtain 95%
adverse reactions to approved interventions are com-
CIs with an accuracy of at least plus or minus 10%,
mon, and a leading cause of death (46).
when estimating the proportion of chimpanzee
A low rate of translation to clinical trials of even
studies subsequently cited by other published
these highly-cited animal experiments was appar-
papers, a subset of at least 86 chimpanzee studies
ent, despite 1992 being the median publication
year, allowing a median of 14 years for potential
Of 95 published randomly-selected studies on
translation. For studies that did translate to clinical
chimpanzees, 49.5% (47/95) were not cited by any
trials, the median time for translation was seven
subsequent papers, demonstrating minimal contri-
years (range 1–15). The frequency of translation
butions toward the advancement of biomedical
was not affected by the laboratory animal species
knowledge. This is of particular concern, because it
used, the type of disease or therapy under examina-
can be assumed that research judged to be of lesser
tion, the journal, year of publication, methodologi-
value was not published. Hence, it appears that the
cal quality, and even, surprisingly, the citation rate.
majority of chimpanzee research generates data of
However, animal studies incorporating dose–
questionable value, which make little obvious con-
response gradients were more likely to be trans-
tribution toward the advancement of biomedical
lated to clinical trials (odds ratio [OR] = 3.3; 95%
confidence interval [CI] = 1.1–10.1).
35.8% (34/95) of the 95 published chimpanzee
Although the rate of translation of these animal
studies were cited by 116 papers that clearly did not
studies to clinical trials was low, as Hackam and
describe well-developed methods for combating
Redelmeier stated, it is nevertheless higher than
human diseases. Only 14.7% (14/95) of them were
that of most published animal experiments, which
cited by 27 papers that had abstracts which indi-
are considerably less likely to be translated than
cated well-developed prophylactic, diagnostic or
these highly-cited animal studies published in lead-
therapeutic methods for combating human dis-
ing journals. Furthermore, the selective focus on
eases. However, a detailed examination of these 27
positive animal data, whilst ignoring negative
medically-oriented papers revealed that in vitro
results (optimism bias), was one of several factors
studies, human clinical and epidemiological studies,
proposed that may have increased the likelihood of
molecular assays and methods, and genomic stud-
translation beyond that which was scientifically
ies, contributed most to their development. 63.0%
merited. As Hackam (58) stated, the rigorous meta-
(17/27) were wide-ranging reviews of 26–300
analysis of all relevant animal experimental data
(median 104) references, to which these cited chim-
would probably significantly decrease the transla-
panzee studies made very small contributions.
Duplication of human outcomes, inconsistency with
In addition, only 48.7% (37/76) of these highly-
other human or primate data, and other causes,
cited animal studies were considered to be of good
resulted in the absence of any chimpanzee study
methodological quality. Despite their publication in
able to demonstrate an essential contribution, or, in
most cases, a significant contribution of any kind,
In many cases, animal models did indicate efficacy,
toward the development of the medical method
but this did not translate to humans. In a few
reviews, the authors speculated on the possible
Despite the low utility of chimpanzee experi-
causes. For example, Jonas and colleagues (70)
ments in advancing human health which was indi-
hypothesised that the poor clinical efficacy of neuro-
cated by these results, it remains true that
protectants which had been found to be successful in
chimpanzees are the species most closely related to
animal models, was due to differences in the timing of
human beings. Hence, it is highly likely other labo-
the initiation of treatment. Curry (71) hypothesised
ratory species are even less useful as experimental
that the human clinical failure of fourteen neuropro-
models of humans in biomedical research and toxi-
tective agents which were successful in animal mod-
els, was due to the antagonism of glutamate — whichmay be associated with neuroprotection — by drugtreatment in clinically-normal individuals. He there-
Clinical utility of stroke and head injury
fore proposed that clinical trials should be restricted
to real stroke patients, who experience elevatedplasma glutamate levels. However, such speculation
Despite the existence of literature on the efficacy of
has not resulted in improvements in the poor clinical
more than 700 drugs in treating experimental mod-
record of neuroprotectants which were previously
els of stroke (artificially-induced focal cerebral
found to be successful in animal models.
ischaemias; 64), only recombinant tissue plasmino-
The utility of the majority of these animal studies
gen activator (rt-PA) and aspirin have convincingly
also appears to have been impeded by their poor
demonstrated efficacy in human clinical trials of
methodological quality. Examples include: animal
treatments for acute ischaemic stroke (65–67).
studies on the efficacy of melatonin (64); 20 animal
Hence, Macleod and colleagues (64) stated that,
studies on the efficacy of nimodipine (68); 29 animal
This failure of putative neuroprotective drugs in
studies on the efficacy of FK506 (72); 45 animal stud-
clinical trials represents a major challenge to the
ies on five compounds from different classes of
doctrine that animals provide a scientifically-valid
alleged neuroprotective agents — clomethiazole,
model for human stroke. At least 10 published sys-
gavestinel, lubeluzole, selfotel, and tirilazad mesylate
tematic reviews have described the poor human
(73); 25 animal studies on the efficacy of nitric oxide
clinical utility of animal experimental models of
(NO) donors and L-arginine (74); and 73 animal stud-
stroke and head injuries (64, 68–76).
ies of the efficacy of NO synthase inhibitors (75).
In some cases, clinical trials proceeded, despite
The methodological quality of animal studies was
equivocal evidence of efficacy in animal studies. For
typically scored on the basis of the presence of char-
example, Horn and colleagues (68) systematically
acteristics such as: appropriate animal models (aged,
reviewed 20 animal studies on the efficacy of
diabetic or hypertensive animals are considered to
nimodipine, of which only 50% showed beneficial
more-closely model human stroke patients); power
effects following treatment. They concluded that,
calculations of sample sizes; random allocation to
.the results of this review did not show convincing
treatment and control groups; use of a clinically-rel-
evidence to substantiate the decision to perform tri-
evant time window for commencement of treatment;
als with nimodipine in large numbers of patients.
blinded drug administration; use of anaesthetics
These clinical trials also demonstrated equivocal
without significant intrinsic neuroprotective activity
evidence of efficacy, and furthermore, proceeded
(ketamine, for example, may alter neuroprotective
concurrently with the animal studies, despite the
activity); blinded induction of ischaemia (given that
fact that the latter are intended to be conducted
the severity of induced infarcts may be subtly
prior to clinical trials, to facilitate the detection of
affected by knowledge of treatment allocation);
blinded outcome assessment; assessment of both
O’Collins and colleagues (69) conducted a very
infarct volume and functional outcome; adequate
large review of 1,026 experimental drugs for acute
monitoring of physiological parameters; assessment
stroke that had been tested in animal models. They
during both the acute (e.g. one to six days) and
found that the effectiveness in animals of 114 drugs
chronic (e.g. seven to 30 days) phases; statement of
chosen for human clinical use was no greater than
temperature control; compliance with animal wel-
that of the remaining 912 drugs not chosen for clin-
fare regulations; peer-reviewed publication; and con-
ical use, thereby demonstrating that effectiveness
flict of interest statements. Typically, one point was
in animal models had no measurable effect on
given for the presence of each characteristic. For
whether or not these drugs were selected for human
example, The Stroke Therapy Academic Industry
clinical use. Accordingly, O’Collins and colleagues
Roundtable recommendations for standards with
questioned whether the most efficacious drugs are,
regard to preclinical and restorative drug develop-
in fact, being selected for clinical trials, and called
ment involve an eight-point scale (68, 77).
for greater rigour in the conduct, reporting, and
Median quality scores were: four out of 10 (13
studies; range zero to six [64]); four out of 10 (29
Poor human clinical and toxicological utility of animal experiments 647
studies; range zero to seven [72]); three out of 10
review (78–85, of which 79 and 80 described a sin-
(45 studies [73]); and three out of 8 (73 studies;
gle review), in only two cases — one of which was
range one to six [75]). Common deficiencies
contentious — did the animal models appear to be
included lack of: sample size calculations, aged ani-
clearly useful in the development of human clinical
mals or those with appropriate co-morbidities, ran-
interventions, or substantially consistent with
domised treatment allocation, blinded drug
administration, blinded induction of ischaemia,
As in the case of stroke, some clinical trials pro-
blinded outcome assessment, and conflict of inter-
ceeded, despite equivocal evidence of efficacy in ani-
est statements. Some studies also used ketamine
mal studies. Upon systematically reviewing the
anaesthesia, and there was also substantial varia-
effects of Low Level Laser Therapy (LLLT) on
wound healing in 36 cell or animal studies, Lucas
van der Worp and colleagues (73), for example, con-
and colleagues (78) found that an in-depth analysis
cluded that the collective evidence for neuroprotec-
of studies with the highest methodological quality
tive efficacy which formed the basis for 21 clinical
showed no significant pooled treatment effect.
trials, was obtained in animal studies with a method-
Despite this, the clinical trials proceeded. Further -
ological quality that would not, in retrospect, justify
more, almost from the beginning of LLLT investi-
such a decision. Wilmot and colleagues (74) also
gations, animal experiments and clinical studies
found considerable variations in animal experiment
occurred simultaneously, rather than sequentially.
protocols, which concerned: animal species; physio-
The human trials also failed to demonstrate signif-
logical parameters (such as blood pressure); drug
administration (timing, dosage, and route); surgical
Roberts and colleagues (79), and Mapstone and
methodology; and duration of ischaemia. Statistical
colleagues (80), all systematically reviewed a group
analysis (Egger’s test) also revealed the likely exis-
of 44 randomised, controlled animal studies on the
tence of publication bias (an increased tendency to
efficacy of fluid resuscitation in bleeding animals. A
publish studies in which a treatment effect is appar-
previous systematic review by some of these inves-
ent, or a decreased tendency to so publish, e.g. result-
tigators of clinical trials of fluid resuscitation had
ing from commercial pressures, particularly in the
found no evidence that the practice improved out-
case of patented drugs under development). Macleod
comes, and had even identified the possibility that
and colleagues (64) commented that, These deficien-
it might be harmful (86). In this later review
cies apply to most, if not all, of the animal literature.
(79–80), they found that fluid resuscitation reduced
This is of particular concern, because Macleod and
mortality in animal models of severe haemorrhage,
colleagues (72) reported that efficacy was apparently
but increased mortality in those with less severe
lower in higher quality studies, which raised concerns
that the apparent efficacy may have been artificially
After clinical trials in humans failed to provide
elevated by factors such as poor methodological qual-
evidence of benefit, Lee and colleagues (81) con-
ducted a systematic review and meta-analysis of
A related review, not limited solely to stroke, exem-
controlled trials of endothelin receptor blockade in
plified some of these issues. Perel and colleagues (76)
animal models of heart failure. Meta-analysis failed
examined therapeutic interventions with unambigu-
to provide evidence of overall benefit, and indicated
ous evidence of a treatment effect (benefit or harm),
increased mortality with early administration.
in clinical trials related to the following: corticos-
In their investigation of the contributions of
teroidal treatment for head injury; anti-fibrinolytics
human clinical trial results and analogous experi-
for the treatment of haemorrhage; thrombolysis, and
mental studies to asthma research — one of the
also tirilazad, for the treatment of acute ischaemic
most common and heavily-investigated of modern
stroke; antenatal corticosteroids in the prevention of
diseases — Corry and Kheradmand (82) demon-
neonatal respiratory distress syndrome; and bisphos-
strated that failure to conduct and analyse the
phonates in the treatment of osteoporosis. They
results of animal studies before proceeding to clini-
found that three interventions had similar outcomesin animal models, whilst three did not, suggesting
cal trials is not uncommon: Research along two
that the animal studies did not reliably predict the
fronts, involving experimental models of asthma
human outcomes. Perel and colleagues reported that
and human clinical trials, proceeds in parallel,
the animal studies varied in methodological quality
often with investigators unaware of their counter-
and sample sizes, that randomisation and blinding
were rarely reported, and that publication bias was
The clinical utility of animal models is clearly
questionable in such cases, in which clinical trialsproceed concurrently with, or prior to, animal stud-ies, or continue, despite equivocal evidence of effi-
Clinical utility of other animal experiments
As in the case of stroke, the clinical utility of the
Of seven systematic reviews on the utility of animal
majority of these animal studies also appears to
models in other clinical fields identified by this
have been limited by their poor methodological
quality. Examples include: 36 cell or animal studies
humans with aspirin, but discordant results were
on the effects of LLLT on wound healing (78); 44
obtained with calcium and wheat bran (the equiva-
studies on the efficacy of fluid resuscitation in
lent β-carotene results were not available). Corpet
bleeding animals (79–80); and studies on the effi-
and Pierre concluded that these results suggest that
cacy of endothelin receptor blockade in animal mod-
the use of the rodent models can roughly predict
els of heart failure (81). Common flaws included
treatment effects in humans, but that the predic-
inadequate sample sizes, leaving studies underpow-
tion is not accurate for all agents, and that the car-
ered, and lack of randomisation and blinding.
cinogen-induced rat model is more predictive than
In some cases, obvious deficiencies within the
the Min mouse model. However, relatively few
animal models were identified. In commenting on
agents were tested, and two of the three agents
the clinical relevance of animal models for testing
tested in mice produced different outcomes in
the effects of LLLT on wound healing, Lucas and
humans, so the conclusion that rodents are predic-
colleagues (78) noted that the animal models
tive of human treatment effects, albeit only
excluded common problems associated with wound
healing in humans, such as ischaemia, infection,and necrotic debris.
Difficulties were also apparent, in translating
Toxicological utility: carcinogenicity
animal outcomes to human clinical protocols, in atleast one case. Lazzarini and colleagues (83)
Due to the limited availability of data on human
reviewed experimental studies on osteomyelitis, to
exposure, the identification and regulation of expo-
ascertain their impacts on the systemic antibiotic
sure to potential human toxins has traditionally
treatment of human osteomyelitis. Although they
relied heavily on animal studies. However, system-
found that most of the animal models reviewed
atic reviews have indicated that the utility of ani-
were reproducible and dependable, they also found
mal studies for these purposes is lacking in the
that the human predictivity of these studies was
fields of carcinogenicity (at least five reviews:
unclear, and was possibly undermined by difficul-
87–91) and teratology (one review: 92). No system-
ties in establishing the right dose regimen in the
atic review demonstrated a contrary result. The
animals. Although they considered that the use of
sensitivities of animal models to a range of human
antibiotic combinations was associated with better
toxicities (i.e. the ability to identify them) high-
outcomes in the majority of animal studies, and
lighted by one review (93) generally appears to be
that these studies did provide indications of appro-
accompanied by poor human specificity (i.e. the
priate minimum treatment durations, they con-
ability to correctly identify human non-toxins),
cluded that these studies had limited relevance to
resulting in a high incidence of false-positive
In two cases, reviewers reported that animal and
human outcomes were substantially consistent,although in one case this conclusion was con-
tentious. While reviewing therapeutic approachesto streptococcal endocarditis, Scheld (84) reported
The regulation of human exposure to potentially
good overall correlations among results obtained by
carcinogenic chemicals constitutes society’s most
in vitro susceptibility testing (especially killing
important use of animal carcinogenicity data. In
kinetics in broth), in animal experiments, and in
2004, to examine the utility of animal carcinogenic-
clinical trials on different antimicrobial regimens in
ity data in protecting public health, I surveyed the
humans with streptococcal endocarditis.
EPA’s Integrated Risk Information System (IRIS)
To investigate the efficacy of rodent models of
chemicals database. This database contains the
carcinogenesis in predicting treatment outcomes in
environmental contaminants of greatest concern in
humans, Corpet and Pierre (85) conducted a sys-
the USA, together with their animal, and, in a small
tematic review and meta-analysis of colon cancer
minority of cases, human toxicity data, along with
chemoprevention studies involving the use of
the human toxicity assessments based on this
aspirin, β-carotene, calcium, and wheat bran, in
pooled data. However, of the 160 IRIS chemicals
rats, mice and humans. Controlled intervention
lacking even limited human exposure data, but pos-
studies on the recurrence of adenomas in human
sessing animal data, for which human toxicity
volunteers were compared with chemoprevention
assessments existed, the EPA considered the ani-
studies of carcinogen-induced tumours in rats, and
mal carcinogenicity data to be inadequate to sup-
of polyps in Min (Apc[+/–]) mice. 6,714 humans,
port a classification of probable human carcinogen
3,911 rats and 458 mice were included in the meta-
or non-carcinogen in the majority of cases (58.1%,
analyses. Corpet and Pierre found that comparable
results were achieved in rats and humans with
aspirin, calcium, β-carotene, and wheat bran.
Organisation’s International Agency for Research
Comparable results were found in Min mice and
on Cancer (IARC) indicated that the true utility of
Poor human clinical and toxicological utility of animal experiments 649
animal carcinogenicity data for deriving human car-
been added to the 1993 number, yielding a total of
cinogenicity assessments is actually substantially
885 agents or exposure circumstances listed in the
lower than that indicated solely by EPA assess-
IARC Monographs (95). The proportion of definite
ments. Of 128 chemicals with human or animal
or probable human carcinogens had increased only
data assessed by both the EPA and the IARC,
slightly, from 13.3% in 1993 to 17.1% in 2004.
human carcinogenicity classifications were consis-tent between the two agencies only for the 17 chem-icals for which at least limited human data were
available. For those 111 chemicals for which theclassification was primarily reliant on animal data,
Surveys by other investigators have also demon-
the EPA was much more likely than the IARC to
strated the poor human predictivity of animal car-
assign carcinogenicity classifications indicative of
cinogenicity data. After examining the studies on
greater human risk (p < 0.0001; 87).
471 substances contained within the US National
The IARC is a leading international authority on
Toxicology Program (NTP) carcinogenicity data-
carcinogenicity assessments, and the significant dif-
base as of 1 July 1998, Haseman (89) concluded
ferences between its human carcinogenicity classifi-
that, although 250 (53.1%) produced carcinogenic
cations and those of the EPA, for identical
effects in at least one sex–species group, the actual
chemicals, indicate that: i) in the absence of signifi-
proportion which posed a significant carcinogenic
cant human data, the EPA is over-reliant on animal
risk to humans was probably far lower, for reasons
carcinogenicity data; ii) as a result, the EPA tends
such as interspecies differences in mechanisms of
to over-predict carcinogenic risk; and iii) the true
predictivity for human carcinogenicity of animal
Similarly, around half of all chemicals tested on
data is even poorer than that indicated by EPA fig-
animals and included in the comprehensive
ures alone. EPA policy erroneously assuming that
Berkeley-based carcinogenic potency database,
tumours in animals are indicative of human car-
whether natural or synthetic, gave positive results
cinogenicity, was implicated as a primary cause of
(89). Rall (96) estimated that only around 10% of
these errors, which have substantial US public
chemicals are truly carcinogenic to humans. Ashby
health implications concerning the regulation of
and Purchase (97) speculated that all chemicals
human exposures to environmental contaminants
would eventually display some carcinogenic activ-
ity, if tested in sufficient rodent strains. Even com-mon table salt has been classified as a tumourpromoter in rats (98).
Fung and colleagues (99) estimated that, if all the
75,000 chemicals in use were tested for carcino-
The poor human predictivity of animal carcino-
genicity via the standard NTP bioassay, signifi-
genicity studies was also demonstrated in 1993 by
cantly less than 50% would prove carcinogenic in
Tomatis and Wilbourn (88), who surveyed the 780
animals, and less than 5–10% would warrant fur-
chemical agents or exposure circumstances evalu-
ther investigation. They suggested that the higher
ated and listed within Volumes 1–55 of the IARC
positivity rate recorded is due to chemical selection
Monographs series (94). Of these, 502 (64.4%) had
based on a priori suspicion of carcinogenicity.
definite or limited evidence of animal carcinogenic-
However, examination of the carcinogenicity litera-
ity, and 104 (13.3%) were assessed as definite or
ture reveals that chemicals are selected for study
probable human carcinogens. Virtually all of the
for many reasons other than a priori suspicion,
latter group would, of course, have been members
including production volumes, occupational and
of the former; so at least 398 animal carcinogens
environmental exposure risks, and investigations of
were assessed and considered not to be definite or
mechanisms of carcinogenesis (100). Despite this,
the positivity rate of the carcinogenicity bioassay in
The positive predictivity of a test is the propor-
the general literature remains around 50% (101).
tion of positive outcomes that are truly positive for
Huff (90) demonstrated a significant variation in
the characteristic being tested for, while the false-
carcinogenicity test results between two major car-
positive rate refers to the proportion that are not.
cinogenicity testing programmes, at the NTP
Hence, based on these IARC figures, the positive
(Research Triangle Park, NC, USA) and the Rama -
predictivity of the animal bioassay for definite or
zzini Foundation (RF; Bentivoglio, Italy). Both lab-
probable human carcinogens was, at best, only
oratories had carried out several hundred chemical
20.7% (104/502), while the false-positive rate was at
carcinogenesis bioassays: around 500 at the NTP,
and 200 at the RF. Of these, 21 chemicals were eval-
More-recent IARC classifications indicate little
uated by both laboratories, of which published
improvement in the positive predictivity of the ani-
results were available for 14. The results were
mal bioassay for human carcinogens. By 1 January
inconsistent for 3 of these 14 chemicals (21.4%),
2004, a decade later, only 105 additional agents had
which had been declared carcinogenic by one labo-
ratory but not the other, questioning the reliability
Toxicological utility: various
of these assays. Of the remaining 11 chemicals,both laboratories found nine to be carcinogenic, and
Under the auspices of the International Life
Sciences Institute’s Health and Environmental
Possible causes for such different toxicity results
Sciences Institute, Olsen and colleagues (93) sought
between laboratories include differences in: the test
to determine the extent to which various types of
species, strain, age or gender; the quantity, dura-
human toxicities evident during clinical trials could
tion and consistency of dosing; the route and
be predicted from standard toxicology studies.
method of administration; diet and laboratory envi-
Based on a multi-company database of 131 pharma-
ronmental conditions; and the criteria used for the
ceutical agents with one or more human toxicities
identified during clinical trials, they reported a
Ennever and Lave (91) demonstrated that nei-
true-positive prediction rate of animal models for
ther of the two commonly-used interpretations of
human toxicity of 69%, and also that study results
rodent carcinogenicity data provide valid conclu-
from non-rodent (dog, primate) species have good
sions about human carcinogenicity. If a risk avoid-
potential to identify human toxicities from many
ance interpretation is used, in which any positive
result in male or female mice or rats is considered
These results concur with those of the other tox-
positive, then nine of the 10 known human carcino-
icity reviews described. Animal studies are often
gens among the hundreds of chemicals tested by the
reasonably sensitive for human toxins. However,
NTP are positive (102), but so are an implausible
their human predictivity and toxicological utility
22% of all chemicals tested (99). If a less risk-sensi-
are limited by their poor human specificity, which
tive interpretation is used, whereby only chemicals
results in high false-positive rates.
positive in both mice and rats are considered posi-tive, then only three of the six known human car-cinogens tested in both species are positive (102). Causes of the poor human utility of animal
The former interpretation could result in the need-
less denial of potentially useful chemicals to society,while the latter could result in widespread exposure
When evaluated overall, these 27 systematic reviews
clearly do not support the widely-held assumptions ofanimal ethics committees and the opinions of advo-cates of animal experimentation, that laboratory ani-
Toxicological utility: teratogenicity
mal use is generally beneficial in the development ofhuman therapeutic interventions and the assessment
In 2005, my colleagues and I published an extensive
of human toxicity. On the contrary, they frequently
survey examining the human predictivity of animal
demonstrate that animal experiments are of low util-
teratogenicity testing (92). We examined nearly
ity for these purposes. This appears to result both
every putative teratogen tested in more than one
from limitations of the animal models themselves,
species, including 1,396 studies. Data for 11 groups
and also from the poor methodological quality and
of known human teratogens tested in 12 animal
statistical design of many animal experiments.
species were analysed. Discordance between specieswas apparent in just under 30% of these 1,396reports. Almost a quarter of all the outcomes in the
six main species used (mouse, rat, rabbit, hamster,primate and dog) were equivocal. For known human
Chimpanzees are our closest living relatives, but
teratogens, there was high variability in positive pre-
despite great similarities between the structural
dictivity between species, the mean of which was
regions of chimpanzee DNA and human DNA, impor-
only 51% — hardly better than tossing a coin. Some
tant differences between the regulatory regions exert
species exhibited a high false-negative rate. Only
an “avalanche” effect on large numbers of structural
around half of these known human teratogens were
genes (103). Despite nucleotide difference between
teratogenic in more than one primate species. Fewer
chimpanzees and humans of only 1–2%, this effect
than one in 40 of the substances designated as poten-
results in differences of around 20%, in terms of pro-
tial teratogens from animal studies, were conclu-
tein expression (104), representing a marked pheno-
sively linked to human birth defects.
typic differences between the species. These
We concluded that the poor human predictivity of
differences manifest as: altered susceptibility to the
animal-based teratology warrants the cessation of
aetiology and progression of various diseases; differ-
animal testing, and that resources should be reallo-
ences in the absorption, tissue distribution, metabo-
cated into the further development and implemen-
lism, and excretion of chemotherapeutic agents; and
tation of quicker, cheaper and more reliable,
differences in the toxicity and efficacy of pharmaceu-
scientifically validated alternatives, such as the
ticals and other agents (59, 103). Such effects appear
to be responsible for the demonstrated inability of
Poor human clinical and toxicological utility of animal experiments 651
most chimpanzee research to contribute substan-
quality of many of the animal studies examined,
tially to the development of methods which are effi-
and none of the reviews demonstrated good
cacious in combating human diseases (59).
methodological quality in a majority of studies.
Other laboratory animal species are much less
While the omission of study details due to publica-
similar to humans, both genetically and phenotypi-
tion space constraints may artificially lower appar-
cally, and are therefore less likely to be useful for
ent quality, the prevalence of such deficiencies
accurately modelling the progression of human dis-
exceeds that which might reasonably be expected,
eases or of human responses to chemicals and puta-
and is, accordingly, grounds for considerable con-
Common deficiencies included lack of: sample
size calculations, sufficient sample sizes, appropri-
ate animal models (e.g. aged animals or those withappropriate comorbidities), randomised treatment
Rodents are by far the most common laboratory
allocation, blinded drug administration, blinded
animal species used in toxicity studies. Several fac-
induction of ischaemia in the case of stroke models,
tors contribute to the demonstrated inability of
blinded outcome assessment, and conflict of inter-
rodent bioassays to reliably predict human toxicity.
est statements. Some studies also used anaesthetics
The stresses incurred during handling, restraint,
that may have altered the experimental outcomes,
other routine laboratory procedures, and particu-
and substantial variation was evident in the param-
larly, the stressful routes of dose administration
common to toxicity tests, alter immune status and
These deficiencies limited the clinical utility of
disease predisposition in ways which are very diffi-
these studies in various significant ways. For exam-
cult to accurately predict, and which distort the pro-
ple, it is well established that studies lacking ran-
gression of diseases and responses to chemicals and
domisation or blinding often over-estimate the
putative chemotherapeutic agents (105, 106).
magnitude of the effects of treatments (107–109).
In addition, animals have a broad range of physi-
Bebarta and colleagues (110) described the impacts
ological defences against general toxic insults, such
of lack of randomisation or blinding on estimations
as epithelial shedding and inducible enzymes,
of the significance of treatment effects in 389 ani-
which commonly prove effective at environmentally
mal studies and in 2,203 cell line studies. They
relevant doses, but which may be overwhelmed at
found that studies lacking randomisation or blind-
the high doses commonly applied in routine toxicity
ing, but not both, were more likely to report a treat-
testing (101). Carcinogenicity assays, in particular,
ment response than studies that used these
involve chronic, high level dosing. This may result,
measures (OR = 3.4; 95% CI = 1.7 to 6.9, and OR =
inter alia, in insufficient rest intervals between
3.2; 95% CI = 1.3 to 7.7, respectively), and that
doses for the effective operation of DNA and tissue
studies lacking both randomisation and blinding
repair mechanisms, which, with the unnatural ele-
were even more likely to report a treatment
vation of cell division rates during ad libitum feed-
response (OR = 5.2; 95% CI = 2.0 to 13.5).
ing, may predispose the animals to mutagenesis andcarcinogenesis. Lower doses, greater intervalsbetween exposures, shorter total periods of expo-
sure, and intermittent feeding, which represent amore realistic approach to the environmental expo-
Insufficient sample sizes left many studies under-
sure of humans to most potential toxins, might not
powered, limiting the statistical validity of the
result in toxic changes at all (106).
study conclusions. Animal lives and other resources
Finally, differences in rates of absorption and
may also be wasted, if experiments subsequently
transport mechanisms between test routes of
require repetition as a result. As stated by the UK
administration and other important human routes
Medical Research Council (111), The number of ani-
of exposure, and the considerable variability of
mals used… must be the minimum sufficient to cre-
organ systems in response to toxic insults, between
ate adequate statistical power to answer the question
and within species, strains and genders, render pro-
foundly difficult any attempt to accurately predict
According to Balls and colleagues (112), however,
human hazard on the basis of animal toxicity data
…surveys of published papers, as well as more anec-dotal information, suggest that more than half of thepublished papers in biomedical research have statis-tical mistakes, many seem to use excessive numbersof animals, and a proportion are poorly designed. Festing (113) similarly stated that, Surveys of pub-
At least 11 systematic reviews (57, 64, 68, 72–76,
lished papers show that there are many errors, both
78–81 [of which, 79 and 80 described a single
in the design of the experiments and in the statisti-
review]) demonstrated the poor methodological
cal analysis of the resulting data. This must resultin a waste of animals and scientific resources, and it
ing the statistical power of small samples, are par-
is surely unethical. De Boo and Hendriksen (114)
ticularly appropriate when marked ethical, cost or
noted the tendency to alter animal numbers based
practical constraints limit the number of animals
on scientifically irrelevant issues, such as availabil-
that may be used (e.g. in experiments involving
Factors that should be considered when calculat-
Finally, the appropriate statistical analysis of the
ing appropriate sample sizes include: detectability
resultant data should be closely linked to the exper-
threshold (the size of the difference between treat-
imental design, and to the type of data produced
ment groups considered significant); known or
(124). The relatively poor statistical knowledge of
expected data variation; the required significance of
many animal researchers may be the cause of the
the test (‘p’ or ‘α’: the probability of a Type I error
high prevalence of poor sample size choices in ani-
— assuming a difference where none exists); the
mal studies. Solutions could include the training of
acceptable probability of assuming no difference
researchers in statistics, and the direct input of
where one does exist (‘β’, a Type II error. The
statisticians in experimental design and data analy-
‘power’ of an experiment = 1–β; 0.8 is the usual
choice); and the type of statistical analysis to whichthe data will be subjected. Smaller thresholds,greater data variation, smaller acceptable error
Raising standards: evidence-based medicine
probabilities (greater power), and certain statisticaltests for differences, all require larger samples.
Evidence-based medicine (EBM) bases clinical deci-
No universal rule for calculating correct sample
sions on methodologically-sound, prospective, ran-
sizes exists (114). Festing (115), for example,
domised, blinded, and controlled clinical trials. The
describes two methods, the preferred ‘power calcu-
gold standard for EBM is large prospective epidemio-
lation,’ and the ‘resource equation.’ Power calcula-
logical studies, or meta-analyses of randomised and
tions use formulae which are available in
blinded, controlled clinical trials (126). The applica-
interactive computer programmes (e.g. 116, 117),
tion to animal experiments of the EBM standards
and calculate the minimum sample sizes required to
which are currently applied to human clinical trials,
detect treatment effects with specified degrees of
would make the results more robust and would
certainty. Mead’s ‘resource equation’ (118) calcu-
increase their applicability (76, 127–130). However,
lates sample sizes by using degrees of freedom, and
mechanisms would be needed to ensure compliance
incorporates statistical parameters, such as treat-
with such standards. Compliance could, for example,
ment effects, block effects and error degrees of free-
be made a prerequisite for research funding, ethics
committee approval, and the publication of results.
Strategies should also be considered for minimis-
These measures would require the education and co-
ing animal numbers without unacceptably compro-
operation of funding agencies, ethics committees and
mising statistical power. Several of these strategies
aim to decrease data variability by minimising het-
erogeneity in experimental environments and pro-
researchers who are planning clinical trials, to ref-
tocols. This can be achieved by: i) the appropriate
erence systematic reviews of related previous work
use of environmental enrichment, aimed at decreas-
before they are permitted to proceed (51). To facili-
ing physiological variation resulting from barren
tate the detection of toxicity and of potentially effi-
laboratory housing and stressful procedures; ii)
cacious drugs, such reviews should also include all
choosing, where possible, to measure variables with
relevant animal research (76). A similar require-
relatively low inherent variability; iii) the use of
ment to reference, or where necessary, conduct, sys-
genetically homogeneous (isogenic or inbred) or
tematic reviews of relevant animal studies, prior to
specified pathogen-free animal strains; and iv)
the commencement of further animal studies,
screening raw data for obvious errors or outliers
would encourage a more complete and impartial
assessment of the existing evidence (51).
Meta-analysis involves the aggregation and sta-
Mechanisms are also needed to encourage the
tistical analysis of suitable data from multiple
reporting of negative results. The negative results
experiments. For some purposes, treatment and
of preclinical studies are much more likely to
control groups can be combined, permitting group
remain unpublished than are the negative results of
numbers to be minimised. Although new informa-
clinical trials (131). In a systematic review of stud-
tion can be derived through meta-analysis, more
ies on the efficacy of nicotinamide in combating
frequently, the results allow the refinement of
experimentally-induced stroke, comparisons pub-
existing knowledge. By designing experiments and
lished only in abstract form gave a significantly
reporting protocols to maximise their utility for
lower estimate of effect size than those published in
later meta-analyses, the benefit of individual ran-
full, demonstrating publication bias (132). van der
domised controlled experiments can be maximised
Worp and colleagues (73) commented on the pres-
(123). Strategies such as these, aimed at maximis-
sure to obtain and publish positive results: It is
Poor human clinical and toxicological utility of animal experiments 653
therefore conceivable that the career of a preclinical
process should be utilised to improve the efficiency
investigator is more dependent on obtaining positive
of the formal validation process, by ensuring satis-
results, than that of a clinical trialist.
factory protocol refinement and transferability, andtest performance (138).
However, it is not always scientifically necessary,
Fundamental constraints on the human
or even logistically possible, to conduct multi-centre
utility of animal models
practical studies. Hence weight-of-evidence valida-tion, also known as validation by retrospective
Strategies designed to increase the full and impar-
analysis (139, 140), may be conducted, based on the
tial examination of existing data before conducting
assessment of existing data in a structured, system-
animal studies, to improve their methodological
atic and transparent manner, provided that data of
quality, and to decrease bias during the publication
sufficient quantity and quality are available (141).
of results, would minimise the consumption of ani-
Regardless of the approach taken, the criteria
mal, financial and other resources within studies of
required for formal validation are comprehensive
questionable merit and quality, and would increase
(136, 141). Key objectives include: establishing the
the potential utility of animal data in addressing
role and necessity of the test model; ensuring clar-
human situations and problems. However, the poor
ity of the defined goals; defining a prediction model,
human clinical or toxicological utility of many ani-
i.e. an algorithm for converting the test data into
mal experiments is unlikely to result solely from
meaningful predictions of in vivo toxicity; examin-
their poor methodological quality, or from publica-
ing the mechanistic relevance and credibility of the
tion bias. As stated by Perel et al. (76), the failure of
model with respect to those goals; and providing a
animal models to adequately represent human dis-
description of the limitations of the model.
ease may be another fundamental cause, which, in
Where practical validation studies do occur, these
contrast, could be technically and theoretically
should adhere to best practice standards, designed to
ensure good methodological quality, including, for
The genetic modification of animal models
example, statistical justifications of sample sizes, ran-
through the addition of foreign genes (transgenic
domised allocation to test groups, and blinded treat-
animals) or the inactivation or deletion of genes
ment and assessment of results. Where possible,
(knockout animals) is being attempted, to make
inter-laboratory reproducibility should be demon-
them more-closely model humans. However, as well
as being technically very difficult to achieve, such
Whether validation studies are conducted by prac-
modification may not permit clear conclusions, due
tical or weight-of-evidence approaches, experience
to a large number of factors, including those reflect-
has shown that transparency and independence from
ing the intrinsic complexity of living organisms,
commercial, political or other interests should be
such as the variable redundancy of some metabolic
maximised through the use of independent experts
pathways between species (133). Furthermore, the
and the peer-reviewed publication of outcomes (136).
animal welfare burdens incurred during the cre-
Scientific validation should lead to the reasoned
ation and use of GM animals are particularly high
overall assessment that sufficient evidence exists to
demonstrate that a model is, or is not, relevant andreliable for the specified purpose, or that insuffi-cient evidence exists to be reasonably certain either
Implications for scientific validation of
way. In some cases, an interim assessment can be
experimental models
made, until further evidence becomes available(141).
Proposed non-animal test models are generally
The European Centre for the Validation of
required to pass formal scientific validation before
Alternative Methods (ECVAM) was created by the
their use is widely or officially accepted.
EC in 1991, to fulfil the requirements of Directive
Pharmaceutical licensing agencies, for example, are
86/609/EEC on the protection of animals used for
generally unwilling to accept non-animal test data
experimental and other scientific purposes. These
as evidence of the human safety of proposed new
requirements state that the EC and its Member
pharmaceuticals, until the test models used have
States should actively support the development,
validation and acceptance of methods which could
Scientific validation has traditionally involved
replace, refine or reduce the use of laboratory ani-
the demonstration, in multiple independent labora-
mals (142). The US equivalent is the Interagency
tories, that the test in question is relevant and reli-
Coordinating Committee on the Validation of
able for its specified purpose (practical validation;
Alternative Methods (ICCVAM), which has similar
135), such as the prediction of a certain in vivo out-
goals. Despite the high standards required for suc-
come. It should also be preceded by an evaluation of
cessful validation, between 1998 and 2007, 21 dis-
the necessity for the test and of the adequacy of its
tinct tests or categories of test methods that could
development (136, 137). A three-stage prevalidationreplace, reduce or refine laboratory animal use,
had been validated and registered with ECVAM,
animal data can be generally assumed not to be sub-
and nine had achieved regulatory acceptance
Likely causes of this inadequacy include inherent
However, unlike non-animal models, animal mod-
genotypic and phenotypic differences between
els are generally assumed to be reasonably predictive
human and non-human species, the distortion of
of human outcomes in preclinical drug development,
experimental outcomes arising from experimental
toxicity testing, and other fields of biomedical
environments and protocols, and the poor method-
research, without the need to undergo formal valida-
ological quality of many animal experiments, as was
tion studies. Yet the 27 systematic reviews examined
apparent in at least 11 reviews. There were no
in this study, demonstrate that it is insufficient to
reviews in which a majority of animal experiments
assume that animal models are reliably predictive of
were of good methodological quality. Some of these
human outcomes, even those in use for long periods,
problems might be minimised with concerted effort
without subjecting them to critical assessment.
(given their widespread prevalence), but the limita-
Clearly, formal validation should be consistently
tions resulting from interspecies differences are
applied to all proposed experimental models,
likely to be technically and theoretically impossible
regardless of their animal, non-animal, historical,
contemporary or possible future status, and models
Despite the fact that they have not passed and,
should be chosen on the basis of critical scientific
indeed, could not pass, the formal scientific validation
review, with appropriate consideration also given to
process required of non-animal models prior to regu-
animal welfare, ethical, legal, economic, and any
latory acceptance, most animal models are incorrectly
assumed to be predictive of human outcomes. The
consistent application of formal validation studies to
Chemicals Bureau, the EC agencies responsible for
all test models is clearly warranted, regardless of
technical aspects of validation and for EU chemicals
their animal, non-animal, historical, contemporary or
regulations, respectively, at that time, made a simi-
lar call in 1995, in which they urged that prevalida-
should be based on such critical scientific review,
tion and independent assessment be applied with
with appropriate cons ideration also given to animal
equal force to all new or modified animal and non-
welfare, ethical, legal, economic and other relevant
Likely benefits would include greater selection of
models truly predictive for human outcomes,
increased safety of people exposed to chemicals thathave passed toxicity tests, increased efficiency during
The historical and contemporary paradigm, that ani-
the development of human pharmaceuticals and
mal models are generally reasonably predictive of
other therapeutic interventions, and decreased
human outcomes, provides the basis for their wide-
wastage of animal, personnel and financial resources.
spread use in toxicity testing and biomedical
In addition, the poor human clinical and toxicolog-
research aimed at preventing or developing cures for
ical utility of most animal models for which data
human diseases. However, their use persists for his-
exists, in conjunction with their generally substantial
torical and cultural reasons, rather than because
animal welfare and economic costs, justify a ban on
they have been demonstrated to be scientifically
the use of animal models lacking scientific data
valid. For example, many regulatory officials “feel
clearly establishing their human predictivity or util-
more comfortable” with animal data (145), and some
even believe that animal tests are inherently valid,simply because they are conducted in animals (146).
However, most existing systematic reviews have
Received 02.03.07; received in final form 10.07.07;accepted for publication 11.07.07.
demonstrated that animal experiments are insuffi-ciently predictive of human outcomes to providesubstantial benefits during the development of
human clinical interventions, or in deriving humantoxicity assessments. In only two of 20 reviews in
Anon. (2007). Annex to the Fifth Report on the Stat -
which clinical utility was examined, did the authors
istics on the Number of Animals Used for Experi
conclude that the animal models were either signif-
mental and other Scientific Purposes in the Member
icantly useful in contributing to the development of
States of the European Union (COM(2007)675 final),
clinical interventions, or were substantially consis-
277pp. Brussels, Belgium: European Commission.
tent with clinical outcomes (84, 85), and one of
Goldberg, A.M. (2002). Use of animals in research:a science–society controversy? The American per-
these conclusions was contentious. Seven additional
spective: animal welfare issues. ALTEX 19,
reviews also failed to clearly demonstrate utility in
predicting human toxicological outcomes, such as
Stephens, M.L., Alvino, G.M. & Branson, J.B. (2002).
carcinogenicity and teratogenicity. Consequently,
Animal pain and distress in vaccine testing in the
Poor human clinical and toxicological utility of animal experiments 655
United States. Developments in Biologicals 111,
(2006). Strategies to reduce animal testing in US
EPA’s HPV program. ALTEX 23 Special Issue,
Anon. (2007). FY 2006 AWA Inspections, 11pp.
Riverdale, MD, USA: United States Department of
Brom, F.W. (2002). Science and society: different
Agriculture Animal and Plant Health Inspection
bioethical approaches towards animal experimenta-
Service (USDA APHIS). Available at: http://www.
tion. ALTEX 19, 78–82.
a p h i s . u s d a . g o v / a n i m a l _ w e l f a r e / d o w n l o a d s /
Festing, M.F.W. (2004). Is the use of animals in bio-
awreports/awreport2006.pdf (Accessed 12.12.07).
medical research still necessary in 2002? Unfort
Carbone, L. (2004). What Animals Want: Expertise
unately, “Yes”. ATLA 32 Suppl. 1B, 733–739. and Advocacy in Laboratory Animal Welfare Policy,
Pawlik, W.W. (1998). The significance of animals in
291pp. Oxford, UK: Oxford University Press.
biomedical research. [Znaczenie zwierzat w badani-
Office of Technology Assessment, US Congress
ach biomedycznych.] Folia Medica Cracoviensia 39,
(1986). Alternatives to Animal Use in Research, Test -ing and Education, OTA-BA-273, 437pp. Washing
Kjellmer, I. (2002). Animal experiments are neces-
ton, DC, USA: US Government Printing Office.
sary. Coordinated control functions are difficult to
Home Office (2007). Statistics of Scientific Proced -
study without the use of nature’s most complex sys-
ures on Living Animals: Great Britain 2006, 49pp.
tems: mammals and human beings. [Djurförsök är
nödvändiga. Samordnade kontrollfunktioner låter
O’Shea, D. (2000). Johns Hopkins enters suit over
sig svårligen studeras utan tillgång till naturens
lab animal regulations. Press Release, 22 Septem -
mest komplexa system: däggdjur och människa.]
ber, 2000. Baltimore, MD, USA: Johns Hopkins
Lakartidningen 99, 1172–1173.
Osswald, W. (1992). Ethics of animal research and
Fishbein, E.A. (2001). What price mice? Journal of
application to humans. [Etica da investigação no
the American Medical Association 235, 939–941.
animal e aplicação ao homem.] Acta Medica Port -
Sauer, U.G., Kolar, R. & Rusche, B. (2005). The use
uguesa 5, 222–225.
of transgenic animals in biomedical research in
Greek, C.R. & Greek, J.S. (2002). 4th World Con gress
Germany. Part 1: Status Report 2001–2003. [Die
Point/Counterpoint: Is Animal Research Necess ary in
Verwendung transgener Tiere in der biomed
2002?, 54pp. Los Angeles, CA, US: Americans for
ischen Forschung in Deutschland. Teil 1: Sach
stands bericht 2001–2003.] ALTEX 22, 233–246.
Singer, P. (1990). Animal Liberation: A New Ethics
Anon. (2007). Swiss animal use statistics for 2005. for our Treatment of Animals, 2nd edn, 320pp. New
Pain & Distress Report 7, 2. Available at: http://www.
York, NY, USA: New York Review/Random House.
hsus.org/pain_distress_report (Accessed 12.12.07).
La Follette, H. & Shanks, N. (1994). Animal experi-
Rusche, B. (2003). The 3Rs and animal welfare —
mentation: the legacy of Claude Bernard. Inter -
conflict or the way forward? ALTEX 20 Suppl. 1, national Studies in the Philosophy of Science 8,
Combes, R.D., Balls, M., Bansil, L., Barratt, M., Bell,
Greek, C.R. & Greek, J.S. (2000). Sacred Cows and
D., Botham, P., Broadhead, C., Clothier, R., George,
Golden Geese, 242pp. New York, NY, USA: Cont -
E., Fentem, J., Jackson, M., Indans, I., Loizou, G.,
Navaratnam, V., Pentreath, V., Phillips, B., Stemp -
Greek, C.R. & Greek, J.S. (2002). Specious Science,
lewski, H. & Stewart, J. (2004). The Third FRAME
288pp. New York, NY, USA: Continuum.
Toxicity Committee: Working toward greater imple-
Anon. (2006). Statement re: TGN1412. Available at:
mentation of alternatives in toxicity testing. ATLA
http://www.tegenero.com/news/statement_re_tgn
32 Suppl. 1B, 635–642.
Green, S. & Goldberg, A.M. (2004). TestSmart and
Anon. (2006). Frequently asked questions regarding
toxic ignorance. ATLA 32 Suppl. 1A, 359–363.
TGN1412. Available at: http://www.tegenero.com/
Fenner-Crisp, P.A., Maciorowski, A.F. & Timm,
news/faqs_re_tgn1412/index.php (Accessed 18.04.06).
G.E. (2000). The endocrine disruptor screening pro-
Bhogal, N. & Combes, R. (2006). TGN1412: time to
gram developed by the US Environmental Protec -
change the paradigm for the testing of new phar-
tion Agency. Ecotoxicology 9, 85–91.
maceuticals. ATLA 34, 225–229.
Green, S., Goldberg, A.M. & Zurlo, J. (2001). The
Coghlan, A. (2006). Mystery over drug trial debacle
TestSmart-HPV program — Development of an
deepens. NewScientist.com news service, 14 August,
integrated approach for testing high production vol-
2006. Available at: http://www.newscientist.com/
ume chemicals. Regulatory Toxicology & Pharm
article.ns?id=dn9734 (Accessed 12.12.07). acology 33, 105–109.
Graham, D.J., Campen, D., Hui, R., Spence, M.,
Armstrong, T.W., Zaleski, R.T., Konkel, W.J. & Park -
Cheetham, C., Levy, G., Shoor, S. & Ray, W.A.
erton, T.J. (2002). A tiered approach to assessing chil-
(2005). Risk of acute myocardial infarction and sud-
dren’s exposure: a review of methods and data.
den cardiac death in patients treated with cyclo-oxy-
Toxicology Letters 127, 111–119.
genase 2 selective and non-selective non-steroidal
Charles, G.D. (2004). In vitro models in endocrine
anti-inflammatory drugs: nested case-control study.
disruptor screening. ILAR Journal 45, 494–501. Lancet 365, 475–481.
Stokes, W.S. (2004). Selecting appropriate animal
Dahl, S.L. & Ward, J.R. (1982). Pharmacology, clin-
models and experimental designs for endocrine dis-
ical efficacy, and adverse effects of the nonsteroidal
ruptor research and testing studies. ILAR Journal
anti-inflammatory agent benoxaprofen. Pharmaco -45, 387–393. therapy 2, 354–366.
Louekari, K., Sihvonen, K., Kuittinen, M. & Sømnes,
Gad, S.C. (1990). Model selection in toxicology: prin -
V. (2006). In vitro tests within the REACH informa-
ciples and practice. Journal of the American College of
tion strategies. ATLA 34, 377–386. Toxicology 9, 291–302.
Sandusky, C., Even, M., Stoick, K. & Sandler, J.
Ross-Degnan, D., Soumerai, S.B., Fortess, E.E. &
Gurwitz, J.H. (1993). Examining product risk in
human medicine within more than 10 years.
context. Market withdrawal of zomepirac as a case
[Lecture abstract.] ALTEX 23, 111.
study. Journal of the American Medical Association
Hackam, D.G. & Redelmeier, D.A. (2006). Trans
270, 1937–1942.
lation of research evidence from animals to humans.
Peters, T.S. (2005). Do preclinical testing strategies
Journal of the American Medical Association 296,
help predict human hepatotoxic potentials? Tox -icologic Pathology 33, 146–154.
Hackam, D.G. (2007). Translating animal research
Venning, G.R. (1983). Identification of adverse reac-
into clinical benefit: poor methodological standards
tions to new drugs. I: What have been the important
in animal studies mean that positive results may
adverse reactions since thalidomide? British Med -
not translate to the clinical domain. British Medicalical Journal 286, 199–202. Journal 334, 163–164.
Wallenstein, L. & Snyder, J. (1952). Neurotoxic reac -
Knight, A. (2007). The poor contribution of chim-
tion to chloromycetin. Annals of Internal Medicine
panzee experiments to biomedical progress. Journal36, 1526–1528. of Applied Animal Welfare Science 10, 281–308.
Blum, M.D., Graham, D.J. & McCloskey, C.A.
Conlee, K.M., Hoffeld, E.H. & Stephens, M.L.
(1994). Temafloxacin syndrome: review of 95 cases.
(2004). A demographic analysis of primate research
Clinical Infectious Diseases 18, 946–950.
in the United States. ATLA 32 Suppl. 1A, 315–322.
Mulder, P., Richard, V. & Thuillez, C. (1998). Diff -
Morris, E. (Undated). Sampling from Small Popul -
erent effects of calcium antagonists in a rat model of
ations. Available at: http://uregina.ca/~morrisev/
heart failure. Cardiology 89 Suppl. 1, 33–37.
S o c i o l o g y / S a m p l i n g % 2 0 f r o m % 2 0 s m a l l % 2 0
Food and Drug Administration, US Department of
populations.htm (Accessed 12.12.07).
Health and Human Services (2004). Innovation or
Guenther, W.C. (1973). A sample size formula for
Stagnation: Challenge and Opportunity on the Crit -
the hypergeometric. Journal of Quality Technologyical Path to New Medical Products, 31pp. Available
5, 167–170.
at: http://www.fda.gov/oc/initiatives/criticalpath/
Green, J. (1982). Asymptotic sample size for given
confidence interval length. Applied Statistics 31,
Lazarou, J. & Pomeranz, B. (1998). Incidence of
adverse drug reactions in hospitalized patients: a
Macleod, M.R., O’Collins, T., Horky, L.L., Howells,
meta-analysis of prospective studies. Journal of the
D.W. & Donnan, G.A. (2005). Systematic review and
American Medical Association 279, 1200–1205.
meta-analysis of the efficacy of melatonin in experi-
Koppanyi, T. & Avery, M.A. (1966). Species differ-
mental stroke. Journal of Pineal Research 38,
ences and the clinical trial of new drugs: a review. Clinical Pharmacology & Therapeutics 7, 250–270.
The National Institute of Neurological Disorders and
Villar, D., Buck, W.B. & Gonzalez, J.M. (1998).
Stroke rt-PA Stroke Study Group (1995). Tissue plas-
Ibuprofen, aspirin and acetaminophen toxicosis and
minogen activator for acute ischemic stroke. New
treatment in dogs and cats. Veterinary & HumanEngland Journal of Medicine 333, 1581–1588. Toxicology 40, 156–162.
Chinese Acute Stroke Trial (CAST) Collaborative
Wilson, J.G., Ritter, E.J., Scott, W.J. & Fradkin, R.
Group (1997). Randomised placebo-controlled trial
(1977). Comparative distribution and embryotoxic-
of early aspirin use in 20,000 patients with acute
ity of acetylsalicylic acid in pregnant rats and rhe-
ischaemic stroke. Lancet 349, 1641–1649.
sus monkeys. Toxicology & Applied Pharmacology
International Stroke Trial Collaborative Group
41, 67–78.
(1997). The International Stroke Trial (IST): a ran-
National Institutes of Health (2006). Information on
domised trial of aspirin, subcutaneous heparin, or
Clinical Trials and Human Research Studies.
both, or neither, among 19,435 patients with acute
Available at: http://clinicaltrials.gov/ct/info/whatis;
ischaemic stroke. Lancet 349, 1569–1581.
jsessionid=B9D601AD55432DBDD59314931CA8385
Horn, J., de Haan, R.J., Vermeulen, M., Luiten,
P.G.M. & Limburg, M. (2001). Nimodipine in ani-
Pound, P., Ebrahim, S., Sandercock, P., Bracken,
mal model experiments of focal cerebral ischemia: a
M. & Roberts, I. (2004). Where is the evidence that
systematic review. Stroke 32, 2433–2438.
animal research benefits humans? British Medical
O’Collins, V.E., Macleod, M.R., Donnan, G.A., Horky,
Journal 328, 514–517.
L.L., van der Worp, B.H. & Howells, D.W. (2006).
Nuffield Council on Bioethics (2005). The Ethics of
1026 experimental treatments in acute stroke. Research Involving Animals, 376pp. London, UK:
Annals of Neurology 59, 467–477.
Jonas, S., Aiyagari, V., Vieira, D. & Figueroa, M.
Anon. (2006). Scopus in detail: what does it cover?
(2001). The failure of neuronal protective agents ver-
Available at: http://www.info.scopus.com/detail/
sus the success of thrombolysis in the treatment of
ischemic stroke: the predictive value of animal mod-
National Center for Biotechnology Information
els. Annals of the New York Academy of Sciences 939,
(2006). PubMed overview. Available at: http://www.
ncbi.nlm.nih.gov/entrez/query/static/overview.html
Curry, S.H. (2003). Why have so many drugs with
stellar results in laboratory stroke models failed in
Lindl, T., Völkel, M. & Kolar, R. (2005). [Animal
clinical trials? A theory based on allometric rela-
experiments in biomedical research. An evaluation
tionships. Annals of the New York Academy of
of the clinical relevance of approved animal experi-
Sciences 993, 69–74.
mental projects.] [German.] ALTEX 22, 143–151.
Macleod, M.R., O’Collins, T., Horky, L.L., Howells,
Lindl, T., Völkel, M. & Kolar, R. (2006). Animal
D.W. & Donnan, G.A. (2005). Systematic review and
experiments in biomedical research. An evaluation
meta-analysis of the efficacy of FK506 in experi-
of the clinical relevance of approved animal experi-
mental stroke. Journal of Cerebral Blood Flow &
mental projects: No evident implementation in
Metabolism 25, 1–9.
Poor human clinical and toxicological utility of animal experiments 657
van der Worp, H.B., de Haan, P., Morrema, E. &
New Frontiers in Cancer Causation (ed. O. Iversen),
Kalk man, C.J. (2005). Methodological quality of ani-
pp. 371–387. Washington, DC, USA: Taylor and
mal studies on neuroprotection in focal cerebral
ischaemia. Journal of Neurology 252, 1108–1114.
Haseman, K. (2000). Using the NTP database to
Willmot, M., Gray, L., Gibson, C., Murphy, S. &
assess the value of rodent carcinogenicity studies
Bath, P.M. (2005). A systematic review of nitric
for determining human cancer risk. Drug Metab
oxide donors and L-arginine in experimental stroke;
olism Reviews 32, 169–186.
effects on infarct size and cerebral blood flow. Nitric
Huff, J. (2002). Chemicals studied and evaluated in
Oxide 12, 141–149.
long-term carcinogenesis bioassays by both the
Willmot, M., Gibson, C., Gray, L., Murphy, S. &
Ramazzini Foundation and the National Toxicology
Bath, P. (2005). Nitric oxide synthase inhibitors in
Program. Annals of the New York Academy of
experimental ischemic stroke and their effects on
Sciences 982, 208–230.
infarct size and cerebral blood flow: a systematic
Ennever, F.K. & Lave, L.B. (2003). Implications of
review. Free Radical Biology & Medicine 39,
the lack of accuracy of the lifetime rodent bioassay
for predicting human carcinogenicity. Regulatory
Perel, P., Roberts, I., Sena, E., Wheble, P., Briscoe,
Toxicology & Pharmacology 38, 52–57.
C., Sandercock, P., Macleod, M., Mignini, L.E.,
Bailey, J., Knight, A. & Balcombe, J. (2005). The
Jayaram, P. & Khan, K.S. (2007). Comparison of
future of teratology research is in vitro. Biogenic
treatment effects between animal experiments and
Amines 19, 97–145.
clinical trials: systematic review. British Medical
Olson, H., Betton, G., Stritar, J. & Robinson, D. Journal 334, 197–200.
(1998). The predictivity of the toxicity of pharma-
Stroke Therapy Academic Industry Roundtable
ceuticals in humans from animal data — an interim
(1999). Recommendations for standards regarding
assessment. Toxicology Letters 102–103, 535–538.
preclinical neuroprotective and restorative drug
International Agency for Research on Cancer (IARC)
development. Stroke 30, 2752–2758.
(1972–1992). IARC Monographs on the Eval uation of
Lucas, C., Criens-Poublon, L.J., Cockrell, C.T. & De
Carcinogenic Risks to Humans, Volumes 1–55. Lyon,
Haan, R.J. (2002). Wound healing in cell studies
and animal model experiments by Low Level Laser
International Agency for Research on Cancer
Therapy; were clinical studies justified? A system-
(IARC) (undated). IARC Monographs Programme
atic review. Lasers in Medical Science 17, 110–134. on the Evaluation of Carcinogenic Risks to Humans.
Roberts, I., Kwan, I., Evans, P. & Haig, S. (2002).
Available at: http://monographs.iarc.fr (Accessed
Does animal experimentation inform human health -
care? Observations from a systematic review of inter-
Rall, D.P. (2000). Laboratory animal tests and human
national animal experiments on fluid resuscitation.
cancer. Drug Metabolism Reviews 2, 119–128. British Medical Journal 324, 474–476.
Ashby, J. & Purchase, I.F.H. (1993). Will all chemi-
Mapstone, J., Roberts, I. & Evans, P. (2003). Fluid
cals be carcinogenic to rodents when adequately
resuscitation strategies: a systematic review of ani-
evaluated? Carcinogenesis 8, 489–495.
mal trials. Journal of Trauma 55, 571–589.
Shirai, T., Fukushima, S., Ohshima, M. & Ito, N.
Lee, D.S., Nguyen, Q.T., Lapointe, N., Austin, P.C.,
(1984). Effects of butylated hydroxyanisole, buty-
Ohlsson, A., Tu, J.V., Stewart, D.J. & Rouleau, J.L.
lated hydroxytoluene, and NaCl on gastric car-
(2003). Meta-analysis of the effects of endothelin
cinogenesis initiated with N-methyl-N-nitro-N-
receptor blockade on survival in experimental heart
nitrosoguanidine in F344 rats. Journal of the
failure. Journal of Cardiac Failure 9, 368–374. National Cancer Institute 72, 1189–1198.
Corry, D.B. & Kheradmand, F. (2005). The future of
Fung, V., Barrett, J. & Huff, J. (1995). The carcino-
asthma therapy: integrating clinical and experimen-
genesis bioassay in perspective: application in iden-
tal studies. Immunologic Research 33, 35–51.
tifying human hazards. Environmental Health
Lazzarini, L., Overgaard, K.A., Conti, E. & Shirtliff,
Perspectives 103, 680–683.
M.E. (2006). Experimental osteomyelitis: What have
100. Gold, L.S., Bernstein, L., Magaw, R. & Slone, T.H.
we learned from animal studies about the systemic
(1989). Interspecies extrapolation in carcinogenesis:
treatment of osteomyelitis? Journal of Chemotherapy
prediction between rats and mice. Environmental18, 451–460. Health Perspectives 81, 211–219.
Scheld, W.M. (1987). Therapy of streptococcal endo-
101. Gold, L.S., Slone, T.H. & Ames, B.N. (1998). What
carditis: correlation of animal model and clinical
do animal cancer tests tell us about human cancer
studies. Journal of Antimicrobial Chemotherapy 20
risk? Overview of analyses of the carcinogenic
potency database. Drug Metabolism Reviews 30,
Corpet, D.E. & Pierre, F. (2005). How good are
rodent models of carcinogenesis in predicting effi-
102. Johnson, F.M. (2001). Response to Tennant et al.:
cacy in humans? A systematic review and meta-
Attempts to replace the NTP rodent bioassay with
analysis of colon chemoprevention in rats, mice and
transgenic alternatives are unlikely to succeed.
men. European Journal of Cancer 41, 1911–1922. Environmental Molecular Mutagenesis 37, 89–92.
Roberts, I., Evans, A., Bunn, F., Kwan, I. & Crow -
103. Bailey, J. (2005). Non-human primates in medical
hurst, E. (2001). Normalising the blood pressure in
research and drug development: a critical review.
bleeding trauma patients may be harmful. LancetBiogenic Amines 19, 235–255. 357, 385–387.
104. Glazko, G., Veeramachaneni, V., Nei, M. & Makal -
Knight, A., Bailey, J. & Balcombe, J. (2006). Animal
owski, W. (2005). Eighty percent of proteins are dif-
carcinogenicity studies: 1. Poor human predictivity.
ferent between humans and chimpanzees. GeneATLA 34, 19–27. 346, 215–219.
Tomatis, L. & Wilbourn, J. (1993). Evaluation of car -
105. Balcombe, J., Barnard, N. & Sandusky, C. (2004).
cin ogenic risk to humans: the experience of IARC. In
Laboratory routines cause animal stress. Contemp -orary Topics in Laboratory Animal Science 43,
Moore, G.J., Overend, P. & Wilson, M.S. (1998).
Reducing the use of laboratory animals in biomedical
106. Knight, A., Bailey, J. & Balcombe, J. (2006). Animal
research: problems and possible solutions. ATLA 26,
carcinogenicity studies: 2. Obstacles to extrapola-
tion of data to humans. ATLA 34, 29–38.
125. Balls, M., Goldberg, A.M., Fentem, J.H., Broadhead,
107. Poignet, H., Nowicki, J.P. & Scatton, B. (1992).
C.L., Burch, R.L., Festing, M.F.W., Frazier, J.M.,
Lack of neuroprotective effect of some sigma ligands
Hendriksen, C.F., Jennings, M., van der Kamp, M.D.,
in a model of focal cerebral ischemia in the mouse.
Morton, D.B., Rowan, A.N., Russell, C., Russell,
Brain Research 596, 320–324.
W.M.S., Spielmann, H., Stephens, M.L., Stokes, W.S.,
108. Aronowski, J., Strong, R. & Grotta, J.C. (1996).
Straughan, D.W., Yager, J.D., Zurlo, J. & Van
Treatment of experimental focal ischemia in rats
Zutphen, B.F. (1995). The Three Rs: the way forward:
with lubeluzole. Neuropharmacology 35, 689–693.
109. Marshall, J.W., Cross, A.J., Jackson, D.M., Green,
Workshop 11. ATLA 23, 838–866.
A.R., Baker, H.F. & Ridley, R.M. (2000). Clometh -
126. Evidence-Based Medicine Working Group (1992).
iazole protects against hemineglect in a primate
Evidence-based medicine. A new approach to teach-
model of stroke. Brain Research Bulletin 52, 21–29.
ing the practice of medicine. Journal of the
110. Bebarta, V., Luyten, D. & Heard, K. (2003). Emer -
American Medical Association 286, 2420–2425.
gency medicine animal research: does use of ran-
127. Watters, M.P.R. & Goodman, N.W. (1999). Com
domisation and blinding affect the results?
parison of basic methods in clinical studies and inAcad emic Emergency Medicine 10, 684–687. vitro tissue and cell culture studies in three anaes-
111. Medical Research Council (MRC) (1993). Respon -
thesia journals. British Journal of Anaesthesia 82, sibility in the Use of Animals in Medical Research,
128. Moher, D., Schulz, K.F. & Altman, D.G. (2001). The
112. Balls, M., Festing, M.F.W. & Vaughan, S. (eds)
CONSORT statement: revised recommendations
(2004). Reducing the use of experimental animals
for improving the quality of reports of parallel-
where no replacement is yet available. ATLA 32
group randomised trials. Lancet 357, 1191–1194.
129. Arlt, S. & Heuwieser, W. (2005). [Evidence based
113. Festing, M.F.W. (2004). Good experimental design
veterinary medicine.] [German.] Deutsche Tierärzt -
and statistics can save animals, but how can it be
liche Wochenschrift 112, 146–148.
promoted? ATLA 32 Suppl. 1A, 133–135.
130. Schulz, K.F. (2005). Assessing allocation concealment
114. De Boo, J. & Hendriksen, C. (2005). Reduction
and blinding in randomised controlled trials: why
strategies in animal research: a review of scientific
bother? Equine Veterinary Journal 37, 394–395.
approaches at the intra-experimental, supra-experi-
131. Brown, C.M., Calder, C., Linton, C., Small, C.,
mental and extra-experimental levels. ATLA 33,
Kenny, B.A., Spedding, M. & Patmore, L. (1995).
Neuroprotective properties of lifarizine compared
115. Festing, M.F.W. (1997). Experimental design and
with those of other agents in a mouse model of focal
husbandry. Experimental Gerontology 32, 39–47.
cerebral ischaemia. British Journal of Pharm
116. van Wilgenburg, H., van Schaick Zillesen, P.G. &
acology 115, 1425–1432.
Krulichova, I. (2003). Sample power and ExpDesign:
132. Oktem, I.S., Menku, A., Akdemir, H., Kontas, O.,
tools for improving design of animal experiments.
Kurtsoy, A. & Koc, R.K. (2000). Therapeutic effect of
Laboratory Animals 32, 39–43.
tirilazad mesylate (U-74006F), mannitol, and their
117. van Wilgenburg, H., van Schaick Zillesen, P.G. &
combination, on experimental ischemia. Research in
Krulichova, I. (2004). Experimental design: com-
Experimental Medicine 199, 231–242.
Aesthetic Dermatology S K I N & A L L E R G Y N E W S • J u l y 2 0 0 8 Photo at left shows a patient before treatment / with the OLDG ENTER photopneumatic device. Photo at ICHAEL right shows . M improvement of the patient’s acne B Y S H A R O N W O R C E S T E R after receiving TESTY L
Details of my illness The first symptom of my AL-amyloidosis in 2001 was a decline in my immunoglobulin IgG level, which had remained a mystery for years. Otherwise, there was nothing unusual about my blood proteins, and the determination of free light chains was neither common nor Starting in the middle of 2003 some other symptoms appeared sporadically, including an occasional slight feel