This is not the document you are looking for? Use the search form below to find more!

Report home > Health & Fitness

Comparative genomic assessment of novel broad-spectrum targets for ...

0.00 (0 votes)
Document Description
Single and multiple resistance to antibacterial drugs currently in use is spreading, since they act against only a very small number of molecular targets; finding novel targets for anti-infectivesis therefore of great importance. All protein sequences from three pathogens ( Staphylococcus aureus, Mycobacterium tuberculosis and Escherichia coli O157:H7 EDL993) were assessed via comparative genomics methods for their suitability as antibacterial targets according to a number of criteria, including the essentiality of the protein, its level of sequence conservation, and its distribution in pathogens, bacteria and eukaryotes (especially humans.
File Details
  • Added: February, 07th 2011
  • Reads: 188
  • Downloads: 0
  • File size: 1.79mb
  • Pages: 24
  • Tags: genomics, antibacterial, antimicrobial, pathogen, virulence, compara tivegenomics, antibiotics, bioinformatics
  • content preview
Submitter
  • Name: katja
Embed Code:

Add New Comment




Related Documents

CELL PHONE SIGNAL JAMMER GSM CDMA BROAD SPECTRUM BLOCKER DEVICE OR MOBILE PHONE

by: connie, 2 pages

CELL PHONE SIGNAL JAMMER GSM CDMA BROAD SPECTRUM BLOCKER DEVICE OR MOBILE PHONE ,all CELL PHONE SIGNAL JAMMER GSM CDMA BROAD SPECTRUM BLOCKER DEVICE OR MOBILE PHONE at hong-shop.com is brand New, ...

A Psycholinguistic Tool for the Assessment of Language Loss: The ...

by: etoile, 13 pages

A major obstacle to the early diagnosis of language loss and to the assessment of language maintenance efforts is the absence of an easy-to-use psycholinguistic measure of language strength. In this ...

Riverton of the High Desert Apartments for Rent Brochure Victorville, CA

by: bella, 7 pages

Riverton of the High Desert Apartments for Rent Brochure Victorville, CA

Environmental Impact Assessment of Gotvand Hydro-Electric Dam on ...

by: kazunari, 8 pages

Today Environmental Impact Assessment (EIA) is about positive change it can lead to conflicts. In the past, the known as one of the most important tools for decision makers in the promotion of ...

Environmental impact assessment of oil and gas sector: A case ...

by: bailey, 6 pages

This study focuses on the environmental impact assessment of Magurcherra gas field through environmental, socio-economical and meteorological study. The major activities involved are seismic ...

Poster75: Quantitative real time PCR assessment of cassava transgenic plants: copy number estimate and quantification of gene expression

by: bailey, 1 pages

Poster75: Quantitative real time PCR assessment of cassava transgenic plants: copy number estimate and quantification of gene expression

Dynamic assessment of academic writing: macro-Theme and hyper-Theme

by: jonny, 21 pages

Dynamic assessment of academic writing : macro-Themes and hyper-Themes Prithvi Shrestha OpenELT, Department of Languages The Open University, UK What I want to talk about ...

Assessment of deterioration in RHA-concrete due to magnesium sulphate attack

by: dragongx, 6 pages

The assessment of magnesium sulphate attack on concretes containing rice husk ash (RHA, 20wt% of the cementitious materials) with various average particle sizes was investigated. The total ...

Rapid assessment of avoidable blindness and needs assessment of cataract surgical services in Satkhira District, Bangladesh

by: shinta, 5 pages

Global estimates suggest that there are approximately 141 million visually impaired people, of whom 37 million are blind. VISION 2020—the right to sight is the joint ...

Cost-Benefit Analysis and Regulatory Reform: An Assessment of the Science and the Art

by: shinta, 67 pages

The continuing efforts in the 104th Congress to legislate requirements for cost-benefit analysis (CBA) and the revised Office of Management and Budget guidelines for the conduct of such ...

Content Preview
Comparative and Functional Genomics
Comp Funct Genom 2004; 5: 304–327.
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.411
Research Article
Comparative genomic assessment of novel
broad-spectrum targets for antibacterial
drugs

Thomas A. White1 and Douglas B. Kell2*
1Department of Biology, University of York, Box 373, Heslington, York YO10 5YW, UK
2Department of Chemistry, UMIST, Faraday Building, Sackville St, PO Box 88, Manchester M60 1QD, UK
*Correspondence to:
Abstract
Douglas B. Kell, Department of
Chemistry, UMIST, Faraday

Single and multiple resistance to antibacterial drugs currently in use is spreading,
Building, Sackville Street, PO Box
since they act against only a very small number of molecular targets; finding novel
88, Manchester, M60 1QD, UK.
targets for anti-infectives is therefore of great importance. All protein sequences from
E-mail: dbk@umist.ac.uk
three pathogens (Staphylococcus aureus, Mycobacterium tuberculosis and Escherichia
coli
O157:H7 EDL993) were assessed via comparative genomics methods for their
suitability as antibacterial targets according to a number of criteria, including the
essentiality of the protein, its level of sequence conservation, and its distribution in
pathogens, bacteria and eukaryotes (especially humans). Each protein was scored and
ranked based on weighted variants of these criteria in order to prioritize proteins as
potential novel broad-spectrum targets for antibacterial drugs. A number of proteins
proved to score highly in all three species and were robust to variations in the scoring
system used. Sensitivity analysis indicated the quantitative contribution of each metric
to the overall score. After further analysis of these targets, tRNA methyltransferase
(trmD) and translation initiation factor IF-1 (infA) emerged as potential and novel
antimicrobial targets very worthy of further investigation. The scoring strategy used
might be of value in other areas of post-genomic drug discovery. Copyright
2004
John Wiley & Sons, Ltd.

Received: 24 November 2003
Revised: 24 March 2004
Keywords:
genomics; antibacterial; antimicrobial; pathogen; virulence; compara-
Accepted: 1 April 2004
tive genomics; antibiotics; bioinformatics
Introduction
gene transfer mechanisms allow this resistance
to be passed between different bacterial strains
Within two decades of the introduction of peni-
and species (Davies, 1994; Heinemann, 1999).
cillin, the majority of the existing classes of
Antibacterial resistance has developed steadily as
antibacterial drugs had been discovered by system-
new agents have been introduced, and the past
atic screening of natural product libraries. Remark-
10–15 years have shown a dramatic increase in the
ably, no new chemical classes of active antibacte-
occurrence of resistant populations of microbes in
rial drugs were successfully introduced for a further
both community and hospital environments (Stru-
30 years (Hancock and Knowles, 1998). Table 1
elens, 1998).
shows the very restricted set of modes of action of
Measures such as chemical modification of exist-
the major antibacterial drugs currently in use.
ing antibacterial drugs and the development of
Microorganisms have also shown themselves to
inhibitors of resistance genes will have a signif-
be extremely versatile in overcoming the effects
icant impact on antibacterial therapy in the short
of antibacterial drugs. Bacteria have developed
term. However, it is obvious that new drug tar-
a variety of resistance mechanisms and lateral
gets need to be found if the use of antibacterial
Copyright  2004 John Wiley & Sons, Ltd.

Assessment of novel antibacterial targets
305
Table 1. Mode of action of the principal established antibacterial drugs
Drug/class
Function inhibited
Molecular target
β-Lactams
Peptidoglycan synthesis
Transpeptidases and carboxpeptidases
Bacitracin
Peptidoglycan synthesis
Undecaprenyl pyrophosphate
D-Cycloserine
Peptidoglycan synthesis
D-alanine racemase and D-alanyl-D-alanine synthetase
Fosfomycin
Peptidoglycan synthesis
UDP-N-acetylglucosamine enolpyruvyl transferase
Glycopeptides
Peptidoglycan synthesis
Cell wall peptidoglycan
Quinolones
DNA replication/transcription
Gyrase and topoisomerase IV
Rifamycins
Transcription
RNA polymerase
Aminoglycosides
Protein synthesis
30S ribosomal subunit
Chloramphenicol
Protein synthesis
50S ribosomal subunit
Fusidic adid
Protein synthesis
Elongation factor G
Macrolides
Protein synthesis
50S ribosomal subunit
Oxazolidinones
Protein synthesis
50S ribosomal subunit
Streptogramins
Protein synthesis
50S ribosomal subunit
Tetracyclines
Protein synthesis
30S ribosomal subunit
Mupirocin
Charging of isoleucyl tRNA
Isoleucyl tRNA synthetase
Sulphonamides
Folate synthesis
Dihydropteroate synthetase
Trimethoprim
Folate synthesis
Dihydrofolate reductase
After Chopra et al. (2002).
drugs is to continue successfully (Schmid, 1998).
It has been shown that such data-driven strate-
To this end, genomic approaches are providing a
gies can be used to identify novel drug targets
new strategy by revealing new molecular targets
(Spaltmann et al., 1999). A number of metrics are
that are giving rise to novel antibacterial agents
chosen which should be properties of a potential
(Allsop and Illingworth, 2002; Dougherty et al.,
drug target, such as essentiality and specificity.
2002; Haney et al., 2002; Isaacson, 2002; Ji, 2002;
Each potential target in a genome of interest is
McDevitt and Rosenberg, 2001), as these new
scored for these properties. These scores can be
agents are unlikely to face the current problems
weighted differently to add more or less emphasis
of established mechanisms of resistance (McDevitt
to any particular property. This scoring system can
and Rosenberg, 2001). In anti-infective research,
be tuned so that targets which have already been
the inevitable selection for resistant strains means
identified score highly, showing that the scoring
that drugs with multiple targets may be preferred
system is capable of identifying useful targets. Pre-
viously unidentified genes may also score highly,
(e.g. multiple penicillin-binding proteins or multi-
and these can be prioritized as potential drug targets
ple forms of two-component systems; Stephenson
for further study. The top-scoring gene in the study
and Hoch, 2002; Stephenson and Hoch, 2004). In
carried out by Spaltmann et al. (1999) on antifun-
other pharmaceutical areas it is encouraging that
gal targets was α,α-trehalose-phosphate synthase,
the rational utility of traditional targets is being
a gene which had never before been suggested
confirmed by systematic knock-out studies (Zam-
as a potential drug target. This shows that post-
browicz and Sands, 2003).
genomic research has much to offer in terms of
With the release of data from numerous sequenc-
novel target identification (Allsop and Illingworth,
ing projects, the number of potential drug targets
2002; Buysse, 2001; Dougherty et al., 2002; Glass
has increased massively. However, not all of these
et al., 2002; Haney et al., 2002; Isaacson, 2002;
molecules will become drug targets (Hopkins and
Ji, 2002; Knowles and King, 1998; McDevitt and
Groom, 2002), and the big challenge is to select
Rosenberg, 2001; Payne et al., 2001a, 2001b; Will-
the targets most relevant for a given situation (Ter-
ins et al., 2002).
stappen and Reggiani, 2001).
In the present study a number of criteria were
Machine learning methods seek to devise new
chosen on which to characterize proteins as targets.
ideas and hypotheses from more or less unstruc-
These were suggested by the extensive literature
tured data (Gillies, 1996; Kell and Oliver, 2004;
on the subject (see e.g. Alksne, 2002; Allsop and
Mitchell, 1997; Mjolsness and DeCoste, 2001).
Illingworth, 2002; McDevitt and Rosenberg, 2001;
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

306
T. A. White and D. B. Kell
Projan, 2002; Spaltmann et al., 1999; Terstappen
hit against a look-up table that listed the classifi-
and Reggiani, 2001). A full list of the criteria used
cation of the organism (http://ca.expasy.org/cgi-
is given in the Methods section.
bin/speclist). A list of bacteria treated as pathoge-
nic in this study is given in Table 2. Bacteria may
or may not act as pathogens, depending on the cir-
Methods
cumstances and the host, and so the list given here
covers a broad range of pathogens but is perhaps
Data collection and motives
not completely comprehensive.
The presence of homologues in mice was con-
Data were collected from three pathogenic bacterial
sidered important not only as this will allow targets
species, Staphylococcus aureus, Escherichia coli
which are present in higher organisms to be further
O157:H7 EDL993 and Mycobacterium tuberculo-
down-weighted, but also because further down the
sis. These species were chosen as they represent
line the target’s absence in mice will make animal
a broad cross-section of bacterial types. Targets
trials more effective. Lactobacillus spp. are con-
which prove to score well in these three species
sidered to beneficial or probiotic bacteria, so using
will probably be good targets across a broad spec-
this metric might be able to prioritize targets which
trum of pathogens.
diminish any unwanted side-effects of a new drug.
The entire set of sequences of proteins encoded
The scores of BLAST hits against pathogens
by S. aureus, E. coli O157:H7 EDL993 and M.
were also parsed to find how well conserved
tuberculosis were downloaded from the NCBI
a particular gene is amongst pathogens. Obvi-
website
(http://www.ncbi.nlm.nih.gov/PMGifs/
ously a protein that is well-conserved across many
Genomes/micr.html). Each protein was then char-
pathogens will make a better target for broad-
acterized by a number of criteria which could then
spectrum antibacterial drugs. A high degree of
be used to prioritize the most suitable proteins as
conservation may also mean that mutations in the
potential antibacterial targets.
protein are not tolerated, such that resistance is
A Perl program carried out most of the char-
less likely to emerge. The numbers of identical
acterization automatically (see Figure 1 for an
residues in each pathogenic hit compared to the
overview). Each protein was parsed to find the gene
query sequence were summed and then divided by
index (gi) number and name of the protein. If the
the number of hits against pathogens. This number
function of the protein was known, or if a function
was normalized by dividing by the length of the
had been assigned to the protein on the basis of
query sequence, to give a ratio of conservation for
sequence homology, then this was noted.
this protein across pathogens.
Each protein was then submitted to a BLAST
The query protein was submitted to BLAST
(Altschul et al., 1990, 1997) search (BLASTp,
separately against the human genome (protein
using default parameters except for an ‘expectation
sequences)
(ftp://ftp.ncbi.nih.gov/genomes/H
value’ of 0.01) against a local copy of the SwissProt
sapiens/protein/) and the number of hits was
database (ftp://ftp.ebi.ac.uk/pub/). The SwissProt
recorded. The closest hit against a human protein
database was used because it is well curated,
was also recorded, with a ratio of similarity given
well annotated, non-redundant, and since entries
by the number of positive residue matches (matches
are easily parseable due to its consistent format.
where amino acids are identical or have similar
There also exist a large number of associated files
biochemical properties) divided by the length of
and websites which use SwissProt-style codes (for
the query sequence. The number of positives was
species and gene/protein names). Using SwissProt
chosen so as to err on the side of caution. Any
therefore allows these resources to be integrated
drug designed against a particular bacterial protein
easily into the program, thus making efficient
may act just as well against a human protein, even
automation possible.
if certain key residues are not identical. Similar-
The results of each BLAST search were parsed
ity of residues may be enough for activity. This
to find how many homologues of this protein
metric was included so that potential targets which
existed in bacteria, pathogenic bacteria, eukary-
were not so similar to human proteins would not
otes, mice and Lactobacillus. This was done by
be so heavily penalized. Even if a human homo-
comparing the SwissProt species ID code of each
logue does exist, it may still be possible (e.g. using
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

Assessment of novel antibacterial targets
307
Read in genome from
For each query protein
file and split into
carry out a restricted
individual protein
BLAST search against
sequences (FASTA
essential genes in Bacilus
format)
subtilis, Escherichia coli K12,
Mycobacterium tuberculosis
and Staphylococcus aureus
(expectation value 0.01)
Take each protein
sequence in turn and
parse to find the gene
name, gi number and
For each query protein
whether a function has
carry out a restricted
been assigned to the
BLAST search against
protein
virulence genes in
Bacillus anthracis,
Escherichia coli O157:H7
EDL993, Mycobacterium
tuberculosis, Neisseria
BLAST each protein
Meningitidis and
against the SwissProt
Staphylococcus aureus
database (expectation
(expectation value 0.01)
value 0.01)
For each query protein
Parse the BLAST
carry out another BLAST
output from each
search of SwissProt (with
protein to find
higher expectation value
taxonomic distribution
1×10-10) and parse output
of protein and its
to find any hits which are
conservation in
known antibacterial targets
pathogens
or which have a structure
in the Protein Data Bank
BLAST each protein
against all protein
Output the information for
sequences from
each protein to file. This
human (expectation
can then be used to score
value 0.01). Find
and rank proteins
number of human
according to potential as
homologues and
novel broad-spectrum
similarity of closest
drug targets
human hit.
Figure 1. Flow chart illustrating the process of data collection
structure–activity relationship studies) to design a
as a target or structural similarity it was thought
drug which targets only the bacterial version of the
safer to report only very close homologues.
protein.
After running the BLAST algorithm, the output
The query gene was then again submitted to
was parsed to find whether the query gene was
the BLAST program to find homologues which
homologous to a known antibacterial target. This
are known antibacterial targets or whose structures
was done by comparing the SwissProt gene ID
have been deciphered. This time an ‘expectation
against a list of SwissProt IDs (from the ExPASy
value’ of 1 × 10−10 was used, as to infer suitability
website: http://ca.expasy.org/enzyme/) of proteins
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

308
T. A. White and D. B. Kell
Table 2. List of bacteria treated as pathogenic in this study
Acinetobacter calcoaceticus
Klebsiella pneumoniae
Shigella dysenteriae
Bacillus anthracis
Legionella pneumophila
Shigella flexneri
Bacillus cereus
Leptospira interrogans
Staphylococcus aureus
Bordetella pertussis
Listeria monocytogenes
Staphylococcus aureus strain Mu50/ATCC 700 699
Borrelia burgdorferi
Moraxella catarrhalis
Staphylococcus aureus strain MW2
Brucella abortus
Moraxella lacunata
Staphylococcus aureus strain N315
Brucella melitensis
Mycobacterium leprae
Staphylococcus capitis
Brucella suis
Mycobacterium tuberculosis
Staphylococcus epidermidis
Campylobacter jejuni
Mycoplasma fermentans
Staphylococcus saprophyticus
Chlamydia muridarum
Mycoplasma genitalium
Streptococcus agalactiae
Chlamydia pneumoniae
Mycoplasma hominis
Streptococcus agalactiae serotype III
Chlamydia trachomatis
Mycoplasma penetrans
Streptococcus agalactiae serotype V
Clostridium botulinum
Mycoplasma pneumoniae
Streptococcus mutans
Clostridium perfringens
Neisseria gonorrhoeae
Streptococcus pneumoniae
Clostridium tetani
Neisseria meningitidis
Streptococcus pyogenes
Corynebacterium diphtheriae
Neisseria meningitidis serogroup A
Streptococcus pyogenes serotype M18
Enterococcus faecalis
Neisseria meningitidis serogroup B
Streptococcus pyogenes serotype M3
Enterococcus faecium
Neisseria meningitidis serogroup C
Streptococcus pyogenes serotype M5
Escherichia coli O111:H−
Pasteurella multocida
Treponema pallidum
Escherichia coli O127:H6
Propionibacterium acnes
Tropheryma whipplei
Escherichia coli O157:H7
Proteus mirabilis
Ureaplasma urealyticum
Escherichia coli O6
Providencia rettgeri
Vibrio cholerae
Flavobacterium meningosepticum
Providencia stuartii
Vibrio parahaemolyticus
Francisella tularensis
Pseudomonas aeruginosa
Vibrio vulnificus
Fusobacterium nucleatum
Rickettsia conorii
Wolinella recta
Haemophilus ducreyi
Rickettsia prowazekii
Wolinella succinogenes
Haemophilus influenzae
Salmonella cholerae-suis
Xanthomonas maltophilia
Haemophilus parainfluenzae
Salmonella enteritidis
Yersinia pestis
Helicobacter pylori
Salmonella typhi
Helicobacter pylori J99
Salmonella typhimurium
that are known antibacterial targets (Chittum and
homology to a protein of known structure is likely
Champney, 1995; Egebjerg et al., 1989; Kornder,
to have a similar structure (although this is not
2002; Lin et al., 1997; Neu and Gootz, 1996;
always true) and so may be favoured as a potential
Schnappinger and Hillen, 1996). Of course, not all
novel drug target.
current drug targets are perfect examples; indeed,
Each protein was then submitted to several
many of the drugs that target them are toxic to
more restricted BLAST searches against selected
humans and resistance has begun to emerge in
bacterial genomes. The BLAST searches were
many cases. Nevertheless, treatments which utilize
restricted by gi number; specifically the gi numbers
these targets have been shown to be effective in
of genes found to be essential or involved in
disease control, and so novel targets possessing
virulence. These genomes chosen are listed in
similar characteristics to known targets may be
Table 3.
useful.
These genomes were selected as they cover a
The SwissProt species and protein ID codes of
wide range of bacterial types, and also because
each hit in the BLAST results were compared
they are well characterized and are amongst the few
to a look-up table (ftp://beta.rcsb.org/pub/pdb/
species for which this work has been carried out to
uniformity/derived data/) to find out whether any
any great extent. For those species for which this
homologues of the query gene had an entry in the
kind of work has not been done, genomics methods
PDB database (ftp://ftp.ncbi.nih.gov/genomes/H
may allow us to predict essentiality or involve-
sapiens/protein/). A protein with a known struc-
ment in virulence. Proteins that have significant hits
ture is more attractive from the point of view of fur-
against essential genes or genes involved in viru-
ther research, as structure-based drug design can be
lence are likely to have the same characteristics
carried out straightaway. A protein with sequence
themselves and so may score highly as potential
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

Assessment of novel antibacterial targets
309
Table 3. List of genomes used for restricted BLAST
the importance, or so-called control coefficients
searches against essential genes or genes involved in
(http://dbk.ch.umist.ac.uk/mca home.htm; Fell,
virulence
1996; Heinrich and Schuster, 1996; Kell and West-
Essential genes
erhoff, 1986), by which each enzyme controls the
Genomes
Bacillus subtilis (Kobayashi et al., 2003)
flux through a metabolic pathway, but can in fact be
Escherichia coli K12
used to find the relative importance of any variable
(http://www.shigen.nig.ac.jp/ecoli/pec/
which contributes to a total. The equation giving
About.html)
the sensitivity of overall metric A to individual met-
Mycobacterium tuberculosis (Sassetti et al.,
2003)
ric vi is given by (equation 1)
Staphylococcus aureus (Forsyth et al., 2002)
Virulence genes
Genomes
Bacillus anthracis (Hoffmaster and Koehler,
C A = ∂A · vi = ∂ ln A
(1)
i
v
A
∂ ln v
1999; Koehler, 2002)
i
i
Escherichia coli O157:H7 EDL993 (Brunder
et al., 2001; Sharma and Dean-Nystrom,
Here a more discretized sensitivity analysis was
2003; Stuber et al., 2003; Wang et al., 2002)
done for each target by taking the score of each
Mycobacterium tuberculosis (Triccas and
metric of the target, finding 1% of this score, divid-
Gicquel, 2000)
ing this number by the total score and multiply-
Neisseria meningitidis (Sun et al., 2000)
ing by 100. When this is done for all metrics,
Staphylococcus aureus (Dunman et al., 2001)
these ‘contributions’ sum to 1. Thus, sensitivity
analysis asks, ‘By altering the score of one vari-
drug targets. The more ‘model’ genomes in which
able by 1%, what percentage change would this
the gene is found to be essential, the more likely
induce in the total score?’. These sensitivity analy-
it is that this gene is indeed essential for the query
ses could clearly show when some variables were
species, and also has greater potential as a target
exerting too much or too little influence on the
for a broad-spectrum antibacterial drug.
total score and therefore the weights could be opti-
Having assigned each gene in the query genome
mized accordingly. This novel approach proved
values for a number of characteristics, these values
very useful in carefully modifying the scoring
could then be weighted, summed and ranked to
systems.
produce a list of high-priority potential targets.
Using different weighting schemes also allowed
This ranking approach was used instead of a
the analysis of how robust a particular high-ranking
machine learning-based approach, as the ‘training
target was to the weighting scheme. Clearly, a
set’ of known antibacterial targets is very small and
target which scores highly due to having favourable
not necessarily optimal (see Introduction). While
characteristics in one highly weighted metric is less
the ranking approach is more subjective, it does
good than one which ranks highly under a number
allow targets to be prioritized which score better
of different scoring systems.
according to our metrics than currently known
For each of five different scoring systems
targets.
(Table 4) used on S. aureus, E. coli O157:H7
EDL993 and M. tuberculosis the top 20 ranking
targets were recorded. These top 20 lists could
Assigning weights and the robustness of target
then be checked against each other to see whether
prioritization
robust targets had emerged. The top 20 lists were
A number of different weighting schemes were
then cross-checked to see whether any targets were
tried so that the weighting scheme could be refined
robust in all three species (see Table 5). This ‘vot-
to reflect the relative importance of the various met-
ing’ method approach can be seen as combining the
rics. After a weighting scheme was run on the raw
output of several weak learners, which is known to
data, the scores for each metric could be summed
be a very effective approach to data mining (Bauer
and the total scores of the targets then ranked.
and Kohavi, 1999; Dietterich, 2000; Hastie et al.,
The refinement of the weightings was done by
2001).
carrying out a sensitivity analysis on the metric
The first scoring system was designed to give
scores for the top few ranking targets. Sensitivity
most influence to those metrics which were felt to
analysis is more normally used in biology to find
be the most important and least influence to those
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

310
T. A. White and D. B. Kell
n
o
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
5
as
as
as
r
vati
400
×
as
as
as
as
a
s
a
s
a
s
a
s
a
s
a
s
a
s
a
s
as
as
as
as
a
s
e
e
e
nse
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
o
t
i
o
a
m
ame
am
am
am
am
am
am
am
Sam
Sam
Sam
C
ra
Sam
Sam
Sam
Sam
S
S
S
S
S
S
S
S
Sam
Sam
Sam
Sam
S
o
o
o
o
n
n
n
n
i
f
i
f
i
f
i
f
0
0
0
0
4
1
1
1
1
1
1
1
1
1
1
1
s,
s,
s,
s,
1
1
1
1
1
1
a
s
a
s
a
s
a
s
a
s
a
s
a
s
a
s
a
s
as
as
ye
ye
ye
ye
as
a
s
a
s
a
s
a
s
as
e
e
e
e
e
e
e
e
e
i
f
i
f
i
f
i
f
e
e
e
e
e
e
am
am
am
am
am
am
am
am
me
am
am
am
am
S
S
S
S
S
S
S
S
Sa
Same
Sam
150
150
100
100
Sam
S
S
S
S
Sam
o
o
o
o
n
n
n
n
no
i
f
i
f
i
f
i
f
i
f
0
0
0
0
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
s,
s,
s,
s,
1
s
,
0
as
as
as
as
as
as
as
as
a
s
a
s
a
s
a
s
a
s
a
s
a
s
ye
ye
ye
ye
a
s
e
e
e
e
e
e
e
e
e
e
e
e
e
e
y
e
i
f
i
f
i
f
i
f
i
f
e
m
0
e
a
m
ame
am
am
am
am
am
am
Sam
Sam
Sam
Sam
Sam
Sam
Sam
Sam
S
S
S
S
S
S
S
5
100
100
100
100
S
t
argets
f
syst
o
i
n
ne
s
f
istinct
ge
ens
d
c
ent
gue
istinct
o
f
100
og
d
o
×
t
i
o
e
ranking
Scoring
of
l
o
f
ra
s
o
f
t
h
o
ith
o
yoti
pres
e
o
)
an
)
y
ous
)
pa
(No.
w
No.
1
1
m
1
if
o
o
o
o
o
o
o
o
o
o
o
o
t
he
n
n
n
n
n
n
n
n
n
n
n
n
in
No.


r
ati
ukar
+
+
f
+
0
n
copi
hom
No.
ith
]
n
e
hum
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
f
f
o
o
× w
o
nt,
0
0
0
0
0
0
0
0
0
0
0
0
of
of
× es
es
es
o
es
o
es
es
)
u
u
u
u
u
u
bacteria
.
.
(proximit
s
e
s,
s,
s,
s,
s,
s,
s,
s,
s,
s,
s,
o.
o.
r
vati
o
o
(No.
a
b
yes,
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
eria
N
N


if
if
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
nse
0
s
ystem
ct
100/89)
o
100)
0
100/N
(
100/N
ba
homolog
[(
pathogens
homolog
distinct
homolog
pathogens)
C
100/(
homolog
100/(
homolog
100
×
100
homolog
1
100
100
100
100
100
100
100
100
100
100
100
100
scoring
i
n
istinct
ne
s
ens
f
d
t
he
istinct
o
f
200
ent
f
ge
og
d
o
×
c
t
i
o
e
o
gue
f
of
l
o
f
t
h
o
ith
o
ra
res
s
o
o
(No.
w
yoti
)
)
y
)
p
e
pa
No.
1
an
1
ous
1
i
f
o
o
o
o
o
o
o
o
o
nce
12

r
ati
m
No.
in
No.
ith
]

n
ukar
+
+
f
+
0
n
no
n
no
n
n
n
no
n
n
n
n
copi
hom
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
es
× w es
es
o
e
hum
f
es
f
es
o
es
of
× u
u
u
o
u
o
u
u
ent,
0
0
0
0
0
0
0
0
0
i
nflue
of
)
bacteria
r
vati
.
.
(proximit
s
,
0
s,
s
,
0
s,
s,
s,
s
,
0
s,
s,
s,
s,
o.
o.
o
o
(No.
eria
N
N

a
bs
yes,
y
e
ye
y
e
ye
ye
ye
y
e
ye
ye
ye
ye
the
ct
nse

i
f
if
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
i
f
o
100)
0
0
0
0
0
20/N
(
50/N
ba
homolog
[
(
500/89)
pathogens
homolog
distinct
homolog
pathogens)
C
50/(
homolog
50/(
homolog
100
×
1
homolog
1
50
2
50
5
20
20
10
2
20
20
20
10
test
to
is
is
used
2
2
1
1
K
t
uberculos

aureus
K
t
uberculos

aureus
c
oli

c
oli

erium
erium
meningitidis
systems
nthracis
s
ubtilis

a
hylococcus
hylococcus
cherichia
ap
cherichia
ap
Bacillus
Es
Mycobact
St
Bacillus
Es
Mycobact
Neisseria
St
s
coring
t
s
n
ge
y
f
erent
ns
t
ar
a
l
ntr
dif
gue
s
l
o
ues
nce
e
o
huma
nti
l
e
ve
s

n
athoge
s
t
sse
r
u
o
p
ogue
e
known
e
vi
PDB
hom
ol
ogue
omolog
r
uti
i
n
c
los
h
to
to
to
to
The
of
i
b
n
om
ol
.
be
o
h
t
o
4
on
i
s
tr
om
k
nown
d
c
h
omologues
h
c
illus
gous
gous
gous
gous
:
:
l
e

ic
num
s
r
vati
yoti
l
o
l
o
l
o
l
o
a
n
o
o
i
n
o
i
n
o
b
i
buti
py
c
i
e
nse
o
str
o
ctoba
ne
ne
Ta
Metr
C
Di
Spe
C
Eukar
Hum
Proximity
homologue
Mouse
La
Function
Hom
Hom
ge
Hom
ge
Hom
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

Assessment of novel antibacterial targets
311
Table 5. The overall top ten ranking targets
Rank
Gene name/description
Robustness
Total score
1
tRNA methyltransferase (trmD)
15
13 391
2
UDP-N-Acetylmuramate-L-alanine ligase (murC)
15
13 229
3
UDP-N-acetylglucosamine 1-carboxyvinyl transferase (murA)∗
13
13 059
4
Translation initiation factor IF-1 (infA)
14
13 019
5
DNA polymerase III, α chain (dnaE)
13
12 992
6
30S ribosomal protein S4 (rpsD)∗
11
12 779
7
UDP-N-acetylmuramoylalanine-D-glutamate ligase (murD)
11
12 766
8
50S ribosomal protein L10 (rplJ)
11
12 755
9
Chromosomal replication initiator protein (dnaA)
10
12 716
10
UDP-N-acetylmuramoylalanyl-D-glutamate-2,6-diaminopimelate ligase (murE)
9
12 573
These targets rank highly in all three species used and rank in the top 20s of most of the scoring systems used. Robustness is how many times
the gene ranks in the top 20 under five different scoring systems across the three species used, giving a maximum robustness score of 15.
Total score is the sum of the scores for this target in all scoring systems across all species used. The maximum possible total score is 24 120.
∗ Indicates that the gene is a known target of an antibacterial drug (murA is targeted by fosfomycin and rpsD is a target of tetracyclines).
felt to be least important. Homology to essential
searched to find any conserved motifs not identified
genes in M. tuberculosis and S. aureus, and homol-
by BLAST, which could be used to find more
ogy to virulence genes in Bacillus anthracis were
distantly related homologues of the query gene.
weighted lower than homology to essential and vir-
This approach was able to identify any human
ulence genes in other organisms. This was done to
sequences which, although not closely related in
reflect the quality of the data for these organisms,
terms of sequence homology, could be very sim-
as different methods were used and lists of essential
ilar in terms of structure and biochemical proper-
and virulence genes are not always complete.
ties to the query gene. Multiple sequence align-
Under the second scoring system all metrics were
ments and phylogenetic trees were created using
weighted equally, so that a maximum score for one
ClustalX (Thompson et al., 1997) and Mega2.1
metric would be the same as for another. For the
(Kumar et al., 2001). This was done to deter-
other three scoring systems most of the metrics
mine how distinct the genes in these pathogens
were weighted as under the first system. However,
were from those homologues in non-pathogens and
in the third scoring system homology to virulence
eukaryotes, and just how well the ‘active sites’
genes was given greater influence, in the fourth
of these genes were conserved across the differ-
homology to essential genes was given greater
ent pathogenic species. The available literature was
weight, and in the fifth the level of conservation of
also searched to gain more insights into these sug-
the target in pathogens was given more importance.
gested targets.
Further investigation of high-scoring targets
Results and discussion
Having narrowed down the number of potential
drug targets using the methods outlined above,
Scores
the highest-scoring targets could then be investi-
According to the scoring systems used, the major-
gated in greater detail. The top genes were again
ity of genes in S. aureus, E. coli O157:H7 EDL993
subjected to a BLAST search against the Swis-
and M. tuberculosis would make very poor antibac-
sProt database to determine in which pathogens
terial targets (see Figures 2–4). In all three bacteria
they were present. The databases Genbank, EMBL
there are also only a few high-scoring genes. The
and DDBJ were also searched via ENTREZ
highest ranking of these seem to be fairly robust
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
and tend to rank in the top 20, regardless of which
to find whether or not a copy of the query gene
scoring system is used (see Table 5).
existed in a specific pathogen, in case this had been
It is also apparent there are no targets which are
missed by searching only the SwissProt database.
perfect in every way. To obtain a perfect score
PROSITE (http://us.expasy.org/prosite/) was also
in the present metrics a target should be present
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

312
T. A. White and D. B. Kell
in one copy, be present in all pathogens but not
terms of the properties an antibacterial drug tar-
in non-pathogens, eukaryotes or humans. It should
get should possess. Hence, the first scoring sys-
be perfectly conserved across all pathogens. Its
tem rates as very important the properties ‘species
function should be known, it should be homologous
distribution’, ‘conservation in pathogens’, ‘similar-
to a known target, homologous to essential and
ity to human’ and ‘homology to essential genes’.
virulence genes in all the model genomes used
A target which does not perform well on any one
and its structure should be known. Here even the
of these criteria will probably not make a good
highest-ranking targets achieve only some 50% of
drug target. The emphasis accorded to these prop-
the perfect score.
erties means that targets which are not present in
This is perhaps discouraging, as it means that
a wide range of pathogens, are not well conserved,
there is little possibility for the development of
are very similar to targets in humans, or are not
a ‘magic bullet’ drug that is highly effective,
essential will not be able to score highly and thus
specifically targets only pathogens, is easy to
will not be prioritized. The metric ‘species distri-
develop and is immune to the problems of emerg-
bution’ is weighted so that a target will receive the
ing resistance. However, this never was a likely
maximum score if it is present in all the bacteria
prospect.
treated as pathogenic by this study and in no non-
The unusual peaks in the distribution graphs
pathogens. It is unlikely that this maximum would
are due to genes of unknown function that, when
ever be awarded to a target, and so this property
submitted to BLAST with an expectation value of
is given a very high weighting to compensate for
0.01, did not return any hits. The two peaks in
this fact. The other useful properties a target may
E. coli O157:H7 EDL993 target scores occur for
possess are, in a sense, bonuses and are scored to
the same reason, except that the peak at the higher
reflect this. A target does not necessarily need to
score is due to genes that return no hits but have
be (directly) involved in virulence in order for a
been assigned some sort of function, presumably
drug to neutralize an infection. However, involve-
by other methods.
ment in virulence may bring benefits to using a
target, in that the target should be absent from most
Scoring systems and sensitivity analysis
non-pathogens and also absent from humans. The
The first scoring system used was designed to
existence of homologues in humans does not mat-
reflect what is thought to be most important in
ter per se; rather, it is the similarity (or lack) of
800
700
600
y 500
400
Frequenc 300
200
100
0
0 and 19
20 and 39
40 and 59
60 and 79
80 and 99
100 and 119
120 and 139
140 and 159
160 and 179
180 and 199
200 and 219
220 and 239
240 and 259
260 and 279
280 and 299
300 and 319
320 and 339
340 and 359
360 and 379
380 and 399
400 and 419
420 and 439
440 and 459
460 and 479
480 and 499
500 and 519
520 and 539
540 and 559
560 and 579
580 and 599
600 and 619
620 and 639
640 and 659
660 and 679
680 and 699
700 and 719
Score between
Figure 2. Frequency distribution of scores for potential targets in Staphylococcus aureus, based on the first scoring
system used
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

Assessment of novel antibacterial targets
313
1200
1000
800
y
600
Frequenc
400
200
0
0 and 19
20 and 39
40 and 59
60 and 79
80 and 99
100 and 119
120 and 139
140 and 159
160 and 179
180 and 199
200 and 219
220 and 239
240 and 259
260 and 279
280 and 299
300 and 319
320 and 339
340 and 359
360 and 379
380 and 399
400 and 419
420 and 439
440 and 459
460 and 479
480 and 499
500 and 519
520 and 539
540 and 559
560 and 579
580 and 599
600 and 619
620 and 639
640 and 659
660 and 679
680 and 699
700 and 719
Score between
Figure 3. Frequency distribution of scores for potential targets in Escherichia coli O157:H7 EDL993, based on the first
scoring system used
1200
1000
800
y
600
Frequenc
400
200
0
0 and 19
20 and 39
40 and 59
60 and 79
80 and 99
100 and 119
120 and 139
140 and 159
160 and 179
180 and 199
200 and 219
220 and 239
240 and 259
260 and 279
280 and 299
300 and 319
320 and 339
340 and 359
360 and 379
380 and 399
400 and 419
420 and 439
440 and 459
460 and 479
480 and 499
500 and 519
520 and 539
540 and 559
560 and 579
580 and 599
600 and 619
620 and 639
640 and 659
660 and 679
680 and 699
Score between
Figure 4. Frequency distribution of scores for potential targets in Mycobacterium tuberculosis, based on the first scoring
system used
the target to a human homologue which is impor-
the bacterial version of a protein. In a similar way
tant. Again, this is reflected in the scoring system,
‘known function’ and ‘entry in PDB’ are not crucial
with the number of human homologues being less
properties that a potential target must possess. They
important than proximity. Of course the lack of
simply imply that something is already known
any human homologues will bring other benefits,
about these targets which can be used as a jumping-
such as the reduced need for QSAR studies to find
off point for further investigation. ‘Copy number’
lead compounds that will selectively target only
could be potentially important as, if a protein exists
Copyright  2004 John Wiley & Sons, Ltd.
Comp Funct Genom 2004; 5: 304–327.

Download
Comparative genomic assessment of novel broad-spectrum targets for ...

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Comparative genomic assessment of novel broad-spectrum targets for ... to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Comparative genomic assessment of novel broad-spectrum targets for ... as:

From:

To:

Share Comparative genomic assessment of novel broad-spectrum targets for ....

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Comparative genomic assessment of novel broad-spectrum targets for ... as:

Copy html code above and paste to your web page.

loading