Akossamples.de

Current Drug Discovery Technologies, 2005, 2, 55-67
QSAR Modeling of Carcinogenic Risk Using Discriminant Analysis and
Topological Molecular Descriptors

Joseph F. Contrera*, Philip MacLaughlina, Lowell H. Hallb and Lemont B. Kierc Center for Drug Evaluation and Research, Office of Pharmaceutical Science, U. S. Food and Drug Administration,Rockville, MD 20857; aMDL Information Systems, 200 Wheeler Road, Burlington, MA 01803; bDepartment ofChemistry, Eastern Nazarene College, Quincy, MA 02170; cDepartment of Medicinal Chemistry, School of Pharmacy,Virginia, Commonwealth University, Richmond, VA 23298, USA Abstract: A discriminant analysis model is presented for carcinogenic risk. The data set is obtained from the two-year
rodent study FDA/CDER database and was divided into a training set of 1022 organic compounds and an external
validation test set of 50 compounds. The model is designed to use as a decision support tool for a defined decision
threshold, and is thus a binary discrimination into “high risk” and “low risk” categories. The carcinogenic risk
classification is based on the method for estimating human risk from two-year rodent studies developed at the
FDA/CDER/ICSAS. The paradigm chosen for this model allows a straightforward risk analysis based on historic
information, as well as the computation of coverage, probability and confidence metrics that can further qualify the
computed result. The molecular structures were represented as MDL mol files. The molecular structure information was
obtained as topological structure descriptors, including atom-type and group-type E-State and hydrogen E-State indices,
molecular connectivity chi indices, topological polarity, and counts of molecular features. The MDL®QSAR software
computed all these descriptors. Furthermore, the discriminant analyses were all performed with the MDL®QSAR
software. The reported model is based on fifty-three descriptors, using the nonparametric normal kernel method and the
Mahalanobis distance to determine proximity. The model performed very well on the fifty compounds of the test set,
yielding the following statistics: 76% correctly classified “high risk” (carcinogenic) and 84% correctly classified as “low
risk” (non-carcinogenic).
Keywords: Carcinogenicity, discriminant analysis, in silico, predictive toxicology, topological structure descriptors,
QSAR, e-state, chemoinformatics.
I. INTRODUCTION AND BACKGROUND
sponsoring company and the regulatory agency in the formof additional review cycles and time and effort invested in Rodent carcinogenicity studies are required for the failed applications. Predictive modeling can reduce the marketing of most chronically administered drugs. These likelihood of developing a compound that produces studies are the most costly and time-consuming non-clinical significant rodent tumors and can, therefore, lead to regulatory testing requirement in the development of a drug.
significant savings for both the pharmaceutical industry and The cost is approximately $2 million for a rat and mouse the regulatory agency. The rodent carcinogenicity bioassay is study, requiring 2 years of treatment, and at least an also a pivotal component of food safety and environmental additional 1-2 years for histopathological analysis and report writing. The human carcinogenic potential of a compound isa property that cannot be evaluated in clinical trials and Ready access to scientific knowledge is critical to therefore safety decisions are made mainly on the basis of support safety-related regulatory and product development animal study results and risk/benefit considerations. The decisions, particularly in situations where available results of rodent carcinogenicity studies can have experimental information is inadequate or unavailable, to considerable impact on drug approvability. Even when identify information gaps, and to prioritize research. A rodent carcinogenicity findings do not prevent marketing, current challenge is the development of better means to they can seriously restrict the marketing of some products or identify useful relationships and insights from large sets of reduce their competitive advantage. Rodent carcinogenicity data. Based on the major advances in computer technology, studies are usually initiated relatively late in drug chemoinformatics, and predictive toxicology, the development when considerable resources and have already accumulated results of rodent carcinogenicity studies in been invested in a potential new product. Significant public databases and FDA files can be more effectively used carcinogenic findings at this stage of drug development can to improve the scientific basis of regulatory and product have disastrous and costly consequences for both the development decisions and reduce the use of animals intesting. It is conceivable that over time with increasedexperience and confidence in carcinogenicity predictive * Address correspondence to this author at the Center for Drug Evaluation software, it may be possible to reduce carcinogenicity testing and Research, Office of Pharmaceutical Science, U. S. Food and Drug for compounds that have molecular structures that are highly Administration, Rockville, MD 20857, USA; Tel: 301-827-5188; Fax: 301- represented in the carcinogenicity database. This process 827-3787; E-mail: [email protected] 1570-1638/05 $50.00+.00
2005 Bentham Science Publishers Ltd.
56 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
would reduce unnecessary testing and also free resources for connectivity indices as a systematic organizer of a database, testing compounds that are truly new molecular entities and imparting a rich information source that can produce are poorly represented in the carcinogenicity database.
potentially useful structure patterns [6-8]. This present studyrequires that the structure representation employed be able to In recent years methods have been developed for organize the data set molecules in such a way that those with grouping together molecules with similar molecular high potential for a particular property (i.e. carcinogenic structures, based on the use of topological structure risk) be more closely associated with each other than with information. Over the past decade there has been a those that are associated with another property such as low significant growth in the use of similarity-based searching of carcinogenic risk. Previous work appears to indicate such a databases in drug design and these methods have been possibility [6-9]. The ability of the simple molecular shown to have a broader application [1-4]. The objective has connectivity indices to organize a set of skeletal structures been to organize a database of molecules according to a set has been demonstrated [9]. Molecular skeletons are grouped of structure criteria so that compounds can be identified as in a meaningful manner. Furthermore directions within the being similar to a reference or target molecule. These similar representation space have meaning in terms of significant compounds become candidates for screening or further chemical information such as degree of skeletal branching, analysis in the design process. The rationale is that adjacency of branch points, and number of rings and types of compounds that are similar to a reference molecule are likely fused ring systems. The atom-type E-State structure to be related to the behavior of the reference molecule in descriptors have also been shown to organize molecular some sense. With the growth of combinatorial chemistry, the structures in a chemically meaningful manner, emphasizing compounds in a database may be entirely or partially virtual; electronic information [10,11]. Based on the structure space in other words, they are synthesized in silico. As a result, provided by the atom type E-State descriptors, excellent there may be no property value information with the similarity searches through a chemical database have been molecules; hence, similarity is based entirely on the reported [7,8,10]. This combination of structure information structural descriptions chosen in a particular study. There is representations provided the basis for the use of structure thus no useful way of evaluating similarity based on physical similarity methods together with topological descriptors that properties except by virtue of the future success of the drug have recently been applied to QSAR modeling of rodent design project employing this general method.
Lajiness has shown quite clearly that a random search The result of these investigations indicates that the use of through a list of molecules is inferior to a search through an the atom-type E-State descriptors together with the organized database, based on its ability to generate similarity molecular connectivity chi indices provides a structure space or diversity in a study [3]. Some form of encoding structure in which molecular structures are organized in chemically information should be present for meaningful exploitation of meaningful ways so that carcinogenic properties associated a database. The code of structure information thus becomes with those structures can also be expected to be usefully the metric to evaluate similarity or its complement, diversity.
organized. As a result, statistical methods of analysis can be This approach is not an exercise in multi-parameter QSAR successfully applied to a data set based on E-State and modeling. With virtual molecules, many or all of the molecular connectivity descriptors, as is demonstrated in this property values are unknown. The search is conducted by selecting a set of descriptors deemed important and findingthe relation of molecules relative to a reference molecule II. EXPERIMENTAL DATA AND METHODS
using a metric such as distance or a grouping such as nearestneighbors. The objective is to create a cluster of molecules of The FDA/CDER Rodent Carcinogenicity Database
potential interest based on several structure indices.
Interesting compounds may appear that can be selected for The FDA/CDER Rodent Carcinogenicity database was screening or for further applications in the database search created from summary rat and mouse carcinogenicity study findings for over 1300 compounds that include bothindustrial chemicals and pharmaceuticals. Rodent The encoding and subsequent searching can be a carcinogenicity study results in the FDA database were browsing process, using electrotopological state indices (E- obtained from the National Toxicology Program (NTP) State) values or other information-rich indices, such as rodent carcinogenicity database, the Lois Gold Carcinogen molecular connectivity, removing the need for carefully Potency (CPD) Database [13], FDA/CDER archives, and the delineated structural features which may be unknown or scientific literature. The database includes the name and which can severely limit diversity. The choice of limiting identification codes, the chemical structure represented as an distance values among molecules in the database makes it MDL MOL file, and numeric carcinogenic activity units possible to reduce the number of output molecules. A assigned (discussed below) to each compound.
qualitative advantage of this process is the stimulation of thechemist's imagination [5].
Acceptance Criteria for Carcinogenicity Studies
A large number of descriptors are available to be Most of the carcinogenicity study results for employed in the organization of a database. It is not our pharmaceuticals in the FDA database were derived from intention here to create a list of these or to make pharmacology/toxicology and biostatistics reviews and comparisons, each method being suitable for different reports in FDA files. The results of carcinogenicity studies in circumstances. Our intention however is to build on the use FDA new drug application (NDA) regulatory reviews for of atom-type E-State descriptors along with molecular marketed products are available under the Freedom of QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 57
Information Act and are considered non-proprietary. The equivocal single site responses were assigned an activity identity of pharmaceuticals currently under regulatory value of 30. Studies with no statistically significant review as an investigational new drug application (IND) or treatment-related tumor findings were assigned an activity new drug application (NDA) or drugs that have never been value of 10. Compounds with 30 or more activity units in 2 marketed are proprietary and cannot be disclosed without the or more study cells (2Plus) , that is, having activity that consent of the sponsor. Proprietary compounds represent crossed the biological barrier of gender or species, were approximately 8 % of the total number of carcinogenicity classified as high risk carcinogens. Compounds with less studies in the FDA/CDER database and are coded in this than 30 activity units in 3 or more study cells were considered not to be high risk carcinogens. Compounds thatwere tested only in the rat or mouse may also be considered Carcinogenicity Study Design and Analysis
positive if there were significant tumor findings in bothmales and females. Compounds tested only in one species The design of rodent carcinogenicity studies for that have no tumor findings cannot be considered negative pharmaceuticals is essentially the same as the design without additional information from at least one other study employed for industrial and environmental chemicals and cell. Applying these rules, a training carcinogenicity U.S. National Toxicology Program (NTP) rodent database was created containing 1022 compounds with 4 cell carcinogenicity studies. Male and female rats and mice are or equivalent data of which 649 compounds were classified divided randomly into one or two control and three treatment as carcinogenic (High Risk), having tumor findings in at groups of 50-70 animals per group per species. Historically, least 2 study cells, and 373 compounds were non- the highest dose in the studies analyzed generally carcinogenic (Low Risk) with negative findings in 3 or more approximates the maximum tolerated dose (MTD) in the test study cells. The greater number of positive compounds is species, and is administered daily, usually in the feed or by partly a function of the scoring method employed. This oral gavage for 2 years. The rodent strains most often used in scoring method is the same as that used to predict rodent NTP studies is the inbred Fisher 344 rat and the hybrid carcinogenicity based on molecular similarity [12] and is a B6C3F1 (C3H x C57B16) mouse. In pharmaceutical studies simplification of the multi-cell method used for MCASE-ES submitted to the FDA, the predominant rodent strains are the rodent carcinogenicity predictions [16].
Sprague-Dawley derived CD rat, and the CD-1 Swiss-Webster derived mouse. Despite the long experience in the The name and structure of proprietary compounds were FDA with these assays, the significance of tumors from life- coded and kept confidential by the FDA. Electrotopological time exposure at the maximum tolerated dose, the dose descriptors derived from proprietary molecules were response extrapolation and the relevance of rodent tumors to included in the training data set. Although electrotopological humans continue to be highly controversial issues.
state and other topological descriptors employed containsufficient information for successful modeling they are Classification and Stratification of Rodent Tumor
insufficient to unambiguously recreate a proprietary Findings
In studies reviewed by FDA/CDER, tumor findings are A validation experiment employing a total of 50 test classified as positive if either benign and/or malignant compounds that were not part of the MDL®QSAR (see findings are statistically significant in pair-wise comparison below) control or training data set were used in this to concurrent controls by Fisher's Exact Test or equivalent investigation. The 50 test compounds were randomly statistical analysis. An adjustment for rare and common removed from the 1072 compound rodent carcinogenicity events is also applied to tumor findings [14]. Tumors are training set. The carcinogenicity model was based on the considered significant if they attained a level of p ≤ 0.01 for remaining 1022 training set compounds. The 50 randomly common tumors and p ≤ 0.05 for rare tumor types. Rare selected test compounds included 38 pharmaceuticals of tumors are those with a spontaneous background incidence which 9 (18%) were newer pharmaceuticals currently under rate equal to or less than a 1%. The incidence of benign and regulatory review that are not yet marketed (structures and malignant tumors (adenomas and carcinomas) are combined identity not disclosed) and 12 industrial chemicals. The 50 and statistically evaluated where appropriate [15].
validation test compounds contained 25 “High Risk”compounds with tumor findings in two or more study cells Data Transformation: The Numeric Representation of
(2Plus) and 25 “Low Risk” compounds with either no tumor Carcinogenic Activity
findings or findings in only 1 study cell. Table 3 lists the
compounds, their assigned risk level from the FDA/CDER
Carcinogenicity studies are generally carried out in male Rodent Carcinogenicity database and the risk level as and female rats and mice. Each sex/species is considered an predicted from the model presented in this work.
individual study cell and therefore a complete battery ofcarcinogenicity studies for a compound is comprised of 4 III. COMPUTATIONAL METHODS
study cells. A simplified numerical activity scale was used toquantify and stratify the results of rodent carcinogenicity Descriptors and Descriptor Selection
studies. Compounds that produce statistically significant (by The MDL®QSAR module implements molecular pair-wise comparison) tumors at multiple organ/tissue sites topological descriptors available within the Molconn-Z in a study cell were assigned the highest activity value of 50.
program [17a, 17b]. (A list of publications that illustrate the Compounds that produce statistically significant single site nature of topological descriptors and their applications is tumors received an activity value of 40 and weaker or available [17c].) An initial set of 195 topological descriptors 58 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
was computed by the MDL®QSAR module for the entire or the diagonal matrix of variances can be used to calculate training set of 1022 compounds that were tested for rodent the Mahalanobis distances. In our studies, various carcinogenicity. The descriptors included atom-type, group- combinations of model building preferences have been type, and individual atom E-State and hydrogen E-State explored to achieve a model with the highest accuracy. The indices, molecular connectivity chi indices, kappa shape performance of each candidate model was assessed by indices, topological polarity, counts of molecular features making use of the prediction error rate in the training set (number of rings, number of H-bond donors and acceptors, (i.e., probabilities of misclassification). For each model, the etc), and others. This initial set was reduced using the true positive (TP), true negative (TN), false positive (FP), following criteria: first, descriptors were only considered that and false negative (FN) rates were studied both for re- had non-zero value for at least 95% of all compounds and, substitution analysis as well as those resulting from leave- second, the variance of the descriptor values had to be no one-out cross-validation. The computation of each model less than a certain threshold, set equal to 1.
studied took several seconds on a Pentium 4 processor with2.8 MHz and 1 Gb memory.
Model Development
IV. COMPUTATIONAL STUDIES
The compounds in the FDA rodent carcinogenicity dataset are characterized either as carcinogenic or non- The task of finding the best model falls into two carcinogenic; therefore, this dataset presents a typical interconnected parts: the search for the best subset of example of a binary classification problem. For the analysis descriptors and the selection of the type and optimal of such datasets, MDL®QSAR employs methods parameters of the model. Both these subtasks admit no discriminant analysis. The complete description of this formal algorithmic solution and require some method analysis can be found in a number of textbooks and experimentation to achieve the best solution. In our studies, monographs [18,19]. Herein, we provide a short description more than 3000 discriminant analysis models were built in of the method pertinent to its implementation within MDL® total, using different criteria for descriptor reduction and various parameters of discriminant analysis as described inthe Methods Section. The best model included 53 variables MDL® QSAR incorporates the algorithms to develop (Table 1) and was characterized by the following parameters:
discriminant models and the graphics interface that allows normal kernel; smoothing parameter of 2; Mahalanobis users to input data sets, initiate calculations, analyze and distance to determine proximity; distance calculations based manipulate resulting models. Each model is characterized by on the full individual within-group covariance matrices; rich statistics available to the user. MDL®QSAR implements the entire range of discriminant analysismethods such as parametric, nonparametric kernel, and The correct classification rates for the best discriminant nearest-neighbor approaches. The classic parametric method model, which is supplied with MDL® QSAR, are shown in of discriminant analysis is applicable in the case of Table 2. Total accuracy of the model, both for C re-
approximately normal within-class distributions. The method substitution and LOO cross-validation analyses is shown as generates either a linear discriminant function (the within- well as separate data for the test set prediction of class covariance matrices are assumed to be equal) or a carcinogenic (high risk carcinogens) as well as non- quadratic discriminant function (the within-class covariance carcinogenic (low risk carcinogen) compounds.
matrices are assumed to be unequal). Our initialchemometric analysis of the FDA data set demonstrated that V. DISCUSSION
the distribution of the descriptor values did not follow the Risk Mitigation
Gaussian law, which was indicated by the normaldistribution hypothesis testing with the confidence level of The modeling approach in this study was chosen 0.01. When the distribution is not assumed to follow a specifically for a risk analysis approach. Unexpected results particular law or is assumed to be other than the multivariate in long-term carcinogenicity bioassays on a new drug normal distribution, nonparametric methods can be used to candidate can be extremely costly in time, money, and derive classification criteria. The nonparametric methods market viability. With today’s technology, applicants must available within the MDL® QSAR include the kernel and k- carry the risk of long-term carcinogenicity well into phase 2- nearest-neighbor (kNN) methods. The main types of kernels 3 clinical development with little or no mitigation. At this implemented in MDL® QSAR include uniform, normal, stage of development, even a single failure can result in a Epanechnikov, biweight, or tri-weight kernels, which are huge loss: the late-stage non-approval of a drug can mean the used to estimate the group specific density at each loss of $700 million or more [20], and represent the loss of six to eight years of development effort.
In general, either Mahalanobis or Euclidean distances can The chosen modeling paradigm, in its selection of be used to determine proximity between compound-vectors chemical structure variable type, its identification of a real in multidimensional descriptor space. When the k-nearest- world risk threshold in endpoint definition, and its neighbor method is used, the Mahalanobis distances are straightforward statistical method, can be called ‘actuarial’ in based on the pooled covariance matrix. When a kernel approach. This allows the user of the model to view various method is used, the Mahalanobis distances are based on confidence and applicability measures and restrict acceptable either the individual within-group covariance matrices or the ranges as desired. Two such calculated metrics are the pooled covariance matrix. Either the full covariance matrix Distance and the Probability of Membership in Class. The QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 59
Table 1. Descriptors selected in the best model for the prediction of carcinogenicity. The descriptors appear in groupings that relate
to their structure information content. A ranking is also given, based on descending order of F-value for inclusion of the descriptor in
the model. A brief definition is given for each descriptor along with an illustration for a selected few descriptors. For more specific
information on structure interpretation see the appropriate references [6-11,17d].

Description
Encode degree of skeletal branching and molecular size. Low order indices x0 and x1 increase with Low Order Chi
molecular size and decrease with increased branching. Chi 2 (x2) shows the greatest sensivity to differences in branching and increases with branching. Valence indices add information aboutheteroatoms and valence state. Simple illustration for x2 given below.
Simple chi 0 index decreases minimally with increased branching, insensitive to adjacency.
Simple chi 1 index encodes degree of branching and decreases with increased branching.
Simple chi 2 index gives high sensitivity and increases with increasing branching.
Valence chi 0 index is highly intercorrelated with molecular surface area and volume.
Valence chi 1 is similar to x1 but also includes heteroatom and valence state information.
Valence chi 2 index includes heteroatom and valence state information with high sensitivity.
Encode complexities and specifics of overall skeletal variation, including degree of branching andmolecular size. Each higher order path index encodes different aspects of skeletal variation. Valence Path Chi Indices
indices add information about heteroatoms and valence state. A simple illustration for xp3 is givenbelow.
Simple chi path 3 is sensitive to adjacent branch points in the molecular skeleton.
Simple chi path 4 is sensitive to branch points separated by one atom in the skeleton.
Simple chi path 5-8 encode specific skeletal information to disciminate among skeletal classes.
Valence chi path 3 is similar to xp3 with aditional heteroatom and valence state information.
Valence chi path 4 is similar to xp4 with aditional heteroatom and valence state information.
Valence chi path 5 is similar to xp5 with aditional heteroatom and valence state information.
Cluster & Path/Cluster
Encode structure information specifically based on a branch point, emphasizing the immediate branch Chi Indices
point environment. Simple illustration for xvpc4 given below.
Simple chi cluster 3 index is defined for a single branch point and encodes the number and branching Simple chi path-cluster 4 index is defined for the isobutane skeleton and is espeically sensitive to adjacency of skeletal branch points.
Valence path-cluster 4 index encodes information similar to xpc4 but with heteroatom and valence Knotp gives the difference between chi cluster-3 and chi path/cluster-4 descriptor. Knotp is largestwhere an xc3 subgaph is not associated with an xpc4 subgaph. Each path cluster-4 (xpc4) subgraphcontains a cluster-3 (xc3) subgraph and one additional atom. Each xc3 subgraph may be associatedwith up to three of these additional atoms and thus be contained within up to 3 xpc4 subgraphs, asshown in the table below. The knotp descriptor helps to separate this overlapping structure informationinto distinct numerical values.
60 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
(Table 1) contd.….
Description
Encode for a specified atom type a combination of electron accessiblity, presence/absence, and the Atom Type
count of the atom type in the molecule based on the electrotoplogical state indices. The atom type E- State has been shown to be very useful for similarity analysis and structure classification and modelingof various properties [6,10,11].
Sum of the atom level E-state values for all non-substituted Sum of the atom level E-state values for all the substituted Sum of the atom level E-state values for all the methylenes Sum of the atom level E-state values for all the carbon atoms in methyl groups in the molecule.
Sum of the hydrogen atom level E-state values for all the Sum of the atom level E-state values for all the oxygen Sum of the atom level E-state values for all the ether oxygen Sum of atom level E-statevalues in molecule Sum of the atom level E-state values for all the nitrogen Sum of the atom level E-state values for all the chlorine Sum of the atom level E-state values for all the double bonded oxygen atoms in the molecule.
Atom Type Count
Number (count) of all non-substituted aromatic carbon atoms in the molecule.
Number (count) of methylene groups in the molecule Number (count) of substituted aromatic carbon atoms in the molecule Number (count) of =C< groups in the molecule Number (count) of methyl groups in the molecule Number (count) of double bonded oxygen molecules in the molecule Internal Hydrogen
The largest single product of E-state and H E-state values from all acceptor and donor pairs separated Bonding E-state
by 4 skeletal bonds and not part of a rigid skeletal structure.
Donor acceptor pair do not form an internal hydrogen bond.
This group is associated with acids, amides, etc.
Forms 5-membered ring for potential internal H bond.
Forms 6-membered ring for potential internal H bond.
QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 61
(Table 1) contd.….
Description
Molecular
A group of structure descriptors that encode a general aspect of structure information for the whole Properties
A whole molecule polarity index that decreases in value as the polarity increases and more sensitive to Number of graph verticies (non-hydrogen atoms) in the moleclue.
Sum of the Hydrogen E-state values for hydrogens on carbon atoms.
The maximum hydrogen atom level E-state value in a molecule.
The maximum positive hydrogen atom level E-state value in a molecule.
Number (count) of (independent) rings in the molecule.
A kappa shape molecular flexibility descriptor that increases with homologation and decreases with increased branching or cyclicity. Larger Phia values indicate greater molecular flexibility.
The maximum atom level E-state value in a molecule.
Sum of the hydrogen atom level E-state values for all hydrogens bonded to donating atoms.
Number (count) of hydrogen bond donors in the molecule.
Number (count) of hydrogen bond acceptors in the molecule.
Number (count) of chemical elements in the molecule.
Number (count) of graph circuits in the molecule.
A whole molecule E-state polarity index that decreases in value as the polarity increases.
Table 2. Carcinogenic Risk Prediction Accuracy
Training set, 1022 compounds
Test set, 50 compounds
calculated Distance shows whether the subject compound compound test set, overall coverage was reduced to 76%, vector is adequately represented within the historic variance Sensitivity rose from 76% to 87%, Specificity remained of chemical structure descriptors. The Probability of essentially constant at 84% (83%), and Concordance Membership in Class is a measure of how well the historic improved from 80% to 84% (See Table 4). By placing a
knowledge is able to discriminate high risk compounds from minimum of 65% Probability in Class, Coverage was 70%, low risk compounds within the nearest space of the subject Sensitivity 93%, Specificity 86% and Concordance 89% (See Table 5). Exercising this option allows a flexibility in
how the model is employed, perhaps allowing a wider range
Probability of Membership in Class
of acceptable probability in screening large compound The results for prediction are given in Table 3 along with
libraries to glean general characteristics, while restricting the original rodent data. The prediction rates are 84% correct this range when assessing safety risks in lead compounds for for low risk and 76 % for high risk with an overall rate of 80% correct. Incorrect predictions are marked in bold. In thispresent study, we found that by placing limits on probability Distance Measure
in class we could trade overall coverage for accuracy. By MDL®QSAR evaluates two quantitative measures of placing a minimum of 60% Probability-in-Class on our 50 applicability of data models to new observations: 1. regression 62 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
Table 3. List of Compounds used in the Validation Test Set along with the rodent data. Compounds showing only a number for
“Name” are currently confidential at the FDA because they are under regulatory review. Predicted values are shown in bold for
incorrect prediction. Predictions shown here are made without regard to probability level. See Tables 4 and 5 for tabulations based
on selected probability ranges. (See text for details). See Experimental Data and Methods in text, pages 5-8, for detailed description of
the data fields in Tables 3, 4 & 5.

Female Rat
Male Mouse
Female Mouse
Predicted
‘HIGH’
‘HIGH’
‘HIGH’
‘HIGH’
‘LOW’
‘LOW’
‘LOW’
‘LOW’
‘LOW’
‘LOW’
QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 63
Table 4. List of test compounds together with the posterior probability for classification based on a 60% probabilit-in-class dividing
line (See text). Compounds showing only a number for “molecule” are currently confidential at the FDA because they are under
regulatory review. Compounds ‘not covered’ using this threshold are outlined in grey.

Posterior Probability of Membership in Class
Molecule
Experimental
Predicted
64 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
(Table 4) contd.….
Molecule
Experimental
Predicted
60% Minimum Probability
Table 5. List of test compounds together with the posterior probability for classification based on a 65% probability-in-class dividing
line (See text). Compounds showing only a number for “molecule” are currently confidential at the FDA because they are under
regulatory review. Compounds ‘not covered’ using this threshold are outlined in grey.

Posterior Probability of Membership in Class
Molecule
Experimental
Predicted
QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 65
(Table 5) contd.….
Molecule
Experimental
Predicted
65% Minimum Probability
models and 2.discriminant analysis models. These are based Having chosen the distance in the observation space, one on the following simple assumption: each constructed model should also decide which distances are to be considered has a certain applicability region in the space of independent “large” or “small“. In the case of regression models, it is variables. Specifically, if our molecule is found to exist in an natural to use an obvious analogy between outliers in the observation space “far” from the set used to build the model, training set and “far-flung” observations. First note that the the prediction for the object should be treated with caution, sum of Mahalonobis distances for all observations from the with a less degree of confidence than in the case when the training set is p(n – 1). Consider quantity model built used objects found “nearer” to our observationspace.
Let X be a n*×p matrix of data with columns being n- which is referred to as centered leverage value and forobservations from the training set lies between 0 and 1.
dimensional vectors of variables Xi and rows being p-dimensional vectors of observations x Based on the rules for separation of outliers in the observation space, which are recommended in regressionanalysis, the degree of applicability of a regression model to  1 mX
an object that is not a member of the training set is evaluated  2 mX
If d exceeds p/n, its average across the training set, more than twofold, one should treat the prediction for such a case  n mX
If d > 0.5, the degree of applicability of a model to an be a centered matrix of observations, and A = (1/(n - 1))X T object is taken to be very low; if 0.2 < d ≤ 0.5 low, and if d ≤ 0.2 we consider it optimal to use the model.
_ be a covariance matrix. A reasonable measure of proximity of a molecule to the training set in the observation For discriminant analysis models, such as the model space is the Mahalonobis distance, evaluated for new contained in the MDL® Carcinogenicity Module, similar methods for separation of outliers are not used. In order topartition distances into “large” and “small”, an approach to data standardization is applied that is traditional for statistics.
Suppose that distance D (Euclidian, see definition above) is and its special case (at A = E), common Euclidean space D: normally distributed across the general population of data.
Then, the standardized distance d1 (the difference between the distance and its sample mean over the training set) 66 Current Drug Discovery Technologies, 2005, Vol. 2, No. 2
Contrera et al.
divided by the sample estimate of mean-square deviation, modeling but are insufficient to unambiguously recreate a will approximately follow a normal distribution with molecular structure, making this a valuable tool for data parameters 0, 1. Thus probabilities for d1 to fall into one or sharing while preserving confidentiality.
another interval of the real axis are found from the tabulated Computational toxicology combines databases and values of the Laplace function. For example, consider points chemoinformatic data mining techniques with statistical x0 = 1.65 and x1 = 2.33 marked on the axis, such that the methods to identify relationships between chemical probability to fall to the left of them is, respectively, 0.95 structures and toxicological activities. Computational or and 0.99. If a new observation produces value d1 ≥ x1, the predictive toxicology software programs are a means of degree of applicability of a model to such an object may be evaluating knowledge accumulated from decades of taken to be very low; if x0 ≤ d1 < x1 – low.
toxicology studies to provide effective regulatory and VI. CONCLUSIONS
product development decision support information. Thisapproach is especially useful for prioritizing potential hazard The FDA/CDER rodent carcinogenic database provides a and identifying data gaps in situations where toxicological sound basis for development of a model for the prediction of data is limited, e.g., indirect food additives or contaminants carcinogenicity risk. The combination of the experimental and degredants/contaminants in the pharmaceutical data and the experience in the FDA provided the basis for manufacturing process. In drug development, the application qualifying 1072 compounds for the database. The of combinatorial chemistry and high throughput screening MDL®QSAR software provided a useful basis for has resulted in an unprecedented increase in the number of calculating the molecular descriptors and performing the compounds identified with potentially desirable discriminant analysis to establish a classification algorithm.
pharmacological properties. The selection of lead The topological structure descriptors included compounds for development is currently hampered by electrotopological state (E-State) descriptors and molecular limitations in the available toxicity screening methods.
connectivity chi indices that have been shown to provide a Making better use of accumulated scientific knowledge sound basis for classifying molecular structures. A non- incorporated in predictive software is one way to minimize parametric method was used to obtain the final model based toxicity related drug failures and improve pharmaceutical on the normal kernel method. Descriptors were selected by risk management. Identifying serious potential toxicity early examining models with varying numbers of descriptors and in the drug development process before significant deciding upon the model with the best classification statistics investments in time and resources are expended is a major on the training set. The discriminant model presented heredemonstrates good prediction statistics on the external goal for FDA/CDER and the pharmaceutical industry. A validation test set of fifty compounds with sensitivity of 76% current cause for concern is that too many drugs are failing and specificity of 84% in addition to concordance of 80%.
late in the development process in phase III either for lack of This test set includes 38 pharmaceuticals and 12 industrial efficacy or toxicity. It is estimated that 20% of total R&D chemicals. Nine of the pharmaceuticals are newer costs per drug are spent on compounds that ultimately fail compounds that are still under regulatory review and due to unfavorable ADME/Toxicity. The selection of drug confidential. Twenty-five of the test set compounds are candidates with better safety profiles will also reduce the considered high risk. Based on these results the model regulatory review burden and speed the approval process by appears useful as an indicator of potential carcinogenicity reducing the number of drugs submitted with serious safety risk for candidate molecules in a design process and for issues that necessitate multiple review cycles or result in termination. Review resources expended for drugs that nevermake it to market are lost and could better be used for Data transformation is an essential component in the QSAR modeling of carcinogenicity from rodent bioassay.
This process converts tumor incidence findings into Computational or predictive toxicology has potential weighted numerical form with the highest score given to regulatory and drug development applications that can multi-site and trans-species tumor findings. This simulates ultimately benefit the public health as well as refine and aspects of the weight of evidence process used in regulatory reduce the use of animals in the assessment of safety.
risk analysis. Tumor site is not considered in this modelingprocess because there is poor tumor site concordance ACKNOWLEDGEMENT
between rats and mice making it a poor factor for QSARmodeling [21].
We wish to express our appreciation to Vladimir Shwartz, University of St. Petersburg, St. Petersburg, Russia, Converting molecular structure into electrotopological for his assistance with the discriminant analysis and related state (E-State) descriptors and molecular connectivity chi indices also provides a means for modeling proprietarymolecular structures that does not disclose the exact structure REFERENCES
and identity of proprietary molecules. In this reportproprietary compounds were included in the training data set Willett P.: Three-Dimensional Chemical Structure and in the 50 test compounds. The name and structure of Handling; John Wiley & Sons: New York, (1991).
proprietary compounds was encoded and kept confidential Lajiness M.S.: Molecular Similarity-Based Methods by the FDA. The proprietary structure information for Selecting Compounds for Screening, In descriptors contained sufficient information for successful Computational Chemical Graph Theory, Rouvray, QSAR Modeling of Carcinogenic Risk
Current Drug Discovery Technologies, 2005, Vol. 2, No. 2 67
D.H. Ed.; Nova Science: New York, pp. 300-312, bioassays published in the general literature through 1988, by the National Toxicology Program through1989. Environrn. Health Perspect. 100, 65-135, Johnson M., Maggiora G.M.: Concepts and Applications of Molecular Similarity: John Wiley &Sons: New York, (1990).
Haseman J.K.: A re-examination of false-positiverates for carcinogenicity studies. Fundam. Appl.
Willett P.: Similarity and Clustering in Chemical Information Systems, John Wiley & Sons: New York,(1987).
McConnell E.E., Solleveld H.A., Swenberg J.A.,Boorman G. A.: Guidelines for combining neoplasms Warr W.: Chemical Structures. The International for evaluation of rodent carcinogenesis studies. JNCI, Language of Chemistry, Springer: Berlin, (1988).
Hall L.H., Kier L.B.: Electrotopological state indices Matthews E.J., Contrera J.F.: A new highly specific for atom types: A Novel combination of electronic, method for predicting the carcinogenic potential of topological and valence state information, J. Chem.
pharmaceuticals in rodents using enhanced MCASE Inf. Comput. Sci. 35, 1039-1045, (1995).
QSAR-ES software. Regul. Toxicol. Pharmacol. 28, Kier L.B.; Hall L.H.: Molecular Structure Description: The Electrotopological State: Academic a) MDL Information Systems, 200 Wheeler Road, Hall L.H., Kier L.B.: Molecular Connectivity Indices b) Kellogg, G. E.; Hall, L. H.; Molconn-Z. See for Database Analysis and Structure-Property Modeling, in Topological Indices and RelatedDescriptors in QSAR and QSPR, Devillers, J. and c) For a list of publications illustrating applications Balaban, A. T. Eds.; pp. 307-360, (1999).
of the descriptors in Molconn-Z, seehttp://www.eslc.vabiotech.com/molconn/mconpubs.ht Hall L.H.; Kier L.B.: Issues in the representation of molecular structure: The development of molecularconnectivity. J. Molecular. Model. Graphics 20, 4-18, d) See MDL®QSAR Users Guide for specific illustration of topological descriptors.
Kier L.B., Hall L.H.: Database organization and Anderson T.W.: An Introduction to Multivariate similarity searching with E-State indices. SAR QSAR Statistical Analysis, Second Edition, John Wiley & Hall L.H.: A structure-information approach to Kendall M.G., Stuart A., Ord J.K.: The Advanced prediction of biological activities and properties.
Theory of Statistics, Macmillan Publishing: New Chem. Biodiversity 1, 183-201, (2004).
York, Vol. 3, Fourth Edition, (1983).
Contrera J.F., Matthews E.J., Benz R.D.: Predicting Grabowski H., Vernon J., DiMasi J.: Returns on the carcinogenic potential of pharmaceuticals in research and development for 1990s new drug rodents using molecular structural similarity and E- introductions. Pharmacoeconomics 20, (Suppl. 3), 11- state indices. Regul. Toxicol. Pharm. 38, 243-259, Contrera J.F., Jacobs A.C., DeGeorge J.J.: Gold L.S., Manley N., Slone T., Garfinkel G., Carcinogenicity testing and the evaluation of Rohrback L., Ames B.N.: The fifth plot of the regulatory requirements for pharmaceuticals. Regul.
carcinogenic potency database: Results of animal Toxicol. Pharmacol. 25, 130 -145, (1997).

Source: http://akossamples.de/pdf/QSAR_contrera-cddt.pdf

Step-09-(701-707).pmd

A.N.U. B.PHARMACY SYLLABUS (WITH EFFECT FROM 2008 - 09 ACADEMIC YEAR)(BIOPHARMACEUTICS, PHARMACOKINETICS & NEW DRUG DELIVERY SYSTEMS)Unit : 01Biopharmaceutics :Introduction , Definitions, Fate of drug after administration , Blood level curves,Routes of drug administration, Drug absorption and disposition . Significancein product, formulation and development. Drug absorption –Structure of b

Microsoft word - document3

SCHEDULE- I. [ See rules 56(a), 70(a) and 71 ] Manner of test and examination before taking lifting appliance, lifting gear and wire Test Loads: 1. Lifting appliances . - Every lifting appliance with its accessory gear, shall be subjected to a test load which shall exceed the safe working load (SWL) as specified in the following Safe Test load. Test load. 25 percent in exc

Copyright © 2011-2018 Health Abstracts