Genome prediction of PhoB regulated promoters in Sinorhizobium meliloti and twelve proteobacteria
http://www.100md.com
《中华首席医学网》
ABSTRACT
In proteobacteria, genes whose expression is modulated in response to the external concentration of inorganic phosphate
are often regulated by the PhoB protein which binds to a conserved motif (Pho box) within their promoter regions. Using a
position weight matrix algorithm derived from known Pho box sequences, we identified 96 putative Pho regulon members whose
promoter regions contained one or more Pho boxs in the Sinorhizobium meliloti genome. Expression of these genes was examined
through assays of reporter gene fusions and through comparison with published microarray data. Of 96 genes, 31 were induced
and 3 were repressed by Pi starvation in a PhoB dependent manner. Novel Pho regulon members included several genes of unknown
function. Comparative analysis across 12 proteobacterial genomes revealed highly conserved Pho regulon members including
genes involved in Pi metabolism (pstS, phnC and ppdK). Genes with no obvious association with Pi metabolism were predicted to
be Pho regulon members in S.meliloti and multiple organisms. These included smc01605 and smc04317 which are annotated as
substrate binding proteins of iron transporters and katA encoding catalase. This data suggests that the Pho regulon overlaps
and interacts with several other control circuits, such as the oxidative stress response and iron homeostasis.
INTRODUCTION
Dissection of regulatory networks that control gene transcription is among the primary goals of the post-genomic era of
biology. Whether gene expression is measured from microarrays or reporter gene fusions or other methodologies, it is
generally not possible to distinguish between the direct and indirect modulation of transcription. Bioinformatic approaches
to identify the regulatory networks have included the design of algorithms for genome-wide prediction of conserved regulatory
DNA binding motifs (1,2). A promising approach in the delineation of transcriptional networks lies in combining genomic
scanning or in silico analysis with experimental transcription data obtained from cells grown under diverse experimental
conditions (1,3–12). In this report, we combine in silico prediction with experimental data obtained from reporter gene
fusions and through comparisons with published microarray data. We also explore cross-species comparative genomics as a tool
to identify genes whose expression is controlled by a transcriptional regulator, PhoB, in response to the phosphate
starvation.
Inorganic phosphate (Pi) plays key roles in cells. In ATP, it is involved in energy metabolism, in protein
phosphorylation it is responsible for regulation of transcription and many other cellular processes including chemotaxis and
cell division, and perhaps most importantly, Pi is a major structural component of nucleic acids and membrane phospholipids.
In many gram-negative bacteria, the transport and metabolism of Pi and phosphorous containing compounds is regulated at the
transcriptional level by a two-component PhoR-PhoB signal transduction system. The Pho regulon consists of genes or operons
regulated by PhoB and this has been well studied in Escherichia coli (13–15). Under Pi limiting conditions, the PhoR
histidine kinase sensor undergoes autophosphorylation and subsequently donates its phosphate group to its cognate response
regulator PhoB. Phosphorylated PhoB (PhoB-Pi) then modulates transcription of its targets by binding to a highly conserved 18
nt DNA sequence called the Pho box (or PhoB binding motif) which usually overlaps the –35 region of PhoB-regulated promoters
(16,17). The majority of identified Pho boxes essentially comprised two 7 nt direct repeats of 5'-CTGTCAT-3' separated by a
conserved 4 nt spacer in the middle. It was postulated that the PhoB and Pho box binding complex interacts with the 70
subunit of RNA polymerase to control transcription initiation (18–23). Over the past 30 years, about 30 Pho regulon members,
which predominantly encompass an ensemble of genes involved in Pi uptake and metabolism, have been identified in E.coli as
reviewed by Wanner (14).
We are studying the gram negative -proteobacterium Sinorhizobium meliloti. This organism forms N2-fixing root-nodules on
alfalfa (Medicago sativa) and its genome is unusual as in addition to a 3.4 Mb chromosome it contains two megaplasmids 1.3
and 1.7 Mb in size. Previous studies have identified several Pho regulon members in S.meliloti including the pstSCAB and
phoCDET operons which encode ABC-type high affinity transport systems for Pi and in the case of phoCDET likely phosphonates
(24–27). The orfA-pit operon encodes a low affinity Pi transport system whose expression is negatively regulated by PhoB
(28). Other Pho regulons include the exp, phn and pta-ackA operons (16,29,30). Using DNA microarray and promoter analysis,
Krol and Becker (31) identified several novel putative Pho regulon members including afuA which is annotated as an iron
transport binding protein. In ongoing studies to understand the response of S.meliloti to Pi limitation, we constructed a Pho
box weight matrix based on known E.coli and S.meliloti PhoB binding sites and used this matrix to predict new PhoB binding
sites in the S.meliloti genome. Expression of predicted Pho regulon members then was examined through the analysis of
transcriptional reporter gene fusions and through the previously reported microarray data (31). The frequency weight matrix
was also employed to predict PhoB binding motifs across 12 closely related proteobacterial genomes with a goal to identifying
a common set of PhoB regulated genes as might be expected from a conserved biological response to Pi-limitation.
MATERIALS AND METHODS
Construction of the Pho box weight matrix for prediction of PhoB binding sites
A total of fifteen known Pho boxes from S.meliloti and E.coli were used for weight matrix construction. Five of those Pho
box sequences were collected from previously identified PhoB binding sites from S.meliloti. Of those five, four were from
S.meliloti strain 1021 including one PhoB binding site upstream orfA-pit; two sites from the phoC promoter; one from the phnG
promoter (24,28,32) and one Pho box was taken from the orfA-pta-ackA promoter of S.meliloti strain 104A14 (16). Ten PhoB
binding sites from E.coli were phoA, phoB, phoE, phoH, phnC, pstS1, pstS2, ugpB1, ugpB2 and ugpB3 (18,33–35) (see Table 1).
Following their alignment a matrix was constructed from the relative frequencies of A, T, C or G at each position of the 18
nt Pho box sequence (Table 1). This matrix was used to determine an information-based measure of potential binding sites
according to the method of Schneider et al. (36). An 18 bp window was moved over the entire genome on both strands and the
score (Si) at each nucleotide position (having base i) was calculated according to Si = (1/18) j [2 + log2(Fij)], where Fij
is the frequency matrix for base i at position j. This score, which ranges from –2.62 (the score of the worse match) to 1.39
(the score of the consensus sequence), is a measure of the information content of a potential binding site measured against
the example set. The lowest example score, that of orfA-pit, is 0.36 and a threshold of 0.35 was used to define a ‘hit’. A
scan of the entire S.meliloti genome produced about 1500 hits on each strand. These were filtered to retain only those that
were between –500 to +100 bp on the coding strand from an annotated translational start site.
Generation of gusA transcriptional gene fusions to the PCR amplified Pho box containing promoters
To construct the gusA reporter gene fusions to the Pho box containing promoters, each promoter region was PCR amplified using
the primers as listed in Supplementary Table 2, and the PCR amplified promoter fragments were digested with appropriate
restriction enzymes and cloned into either pFUS1 vector which is a broad host replicable vector containing promoterless gusA
(uidA) gene (37) or into a suicide plasmid pTH1360 [modified pVO155 (38) by replacement of gusA coding and upstream sequences
with the ones in pFUS1]. The corresponding gene fusion plasmids were verified by sequencing and subsequently introduced into
S.meliloti wild-type strains RCR2011 and its derivative RmP559 (RCR2011, PhoB3::TnV) strains or RmP110 and RmH852 (Rm1021,
phoB3::Tn5-233) by tri-parental mating using MT616 as the helper strain as described previously (39).
?-Glucuronidase and alkaline phosphatase assays
To determine the expression of the predicted Pho box containing genes or operons in response to Pi limitation, the
S.meliloti wild-type strains and its PhoB mutant harbouring the plasmid borne promoter::gusA gene fusion were inoculated in 2
ml LBmc containing 2.5 μg/ml tetracycline and grown overnight aerobically at 30°C to OD600 of 1.0. Luria–Bertani (LB)
broth was supplemented with 2.5 mM MgSO4 and 2.5 mM CaCl2 (LBmc) (40). A total of 0.5 m of cultures were spun down in a 1.5
ml microcentrifuge tube, washed twice in 1 ml phosphate free MOPS minimal medium (P0 medium) and resuspended in 250 μl of
the P0 medium. MOPS-buffered minimal medium contains 40 mM morpholinopropane sulfonic acid/20 mM potassium hydroxide; 20 mM
NH4Cl; 2 mM MgSO4; 2 mM CaCl2; 100 mM NaCl; 15 mm filter-sterilized glucose as carbon source and supplied with 0.3 μg/ml
biotin and 10 ng/ml CoCl2 (24,41). Ten microliter aliquots of washed cells were subcultured into 5 ml of P0 medium, or MOPS
minimal medium supplied with 2 mM KH2PO4 (P2 medium). After 32 h incubation at 30°C, 2 ml cultures were spun down at 10 000
r.p.m. for 1 min and resuspended in 1 M Tris–HCl (pH 8.0) for alkaline phosphatase assays as described by Bardin et al.
(24), and 3 ml cultures were left for ?-glucuronidase assays according to the protocol described by Reeve et al. (37).
RESULTS AND DISCUSSION
Weight matrix prediction of potential PhoB regulated genes
A weight matrix to identify potential PhoB binding sites was generated from five S.meliloti and ten E.coli PhoB box
example sequences of 18 nt length (Table 1). The nucleotide frequency matrix was used to calculate an information-based score
for potential binding sites in a scan of the S.meliloti genome. Putative PhoB binding sites were defined by a score of
greater than 0.35 and a location between +100 and –500 nt of the translational start codon on the transcribed strand of an
annotated gene (see Materials and Methods). One hundred and three putative PhoB binding sites were found and are shown with
their downstream annotated genes in Supplementary Table S1. Seven of these promoter regions contained two putative PhoB
boxes, so that 96 distinct genes were found. Three out of four genes whose Pho boxes were used for matrix construction were
also among those 96 genes. No orthologue of orfA-pta-ackA from S.meliloti strain 104A14 was found in Rm1021 strain.
The threshold score (0.35) used to identify putative PhoB binding sites was derived from the lowest score (that of orfA-
pit) among the example sequences. With this threshold, 18 of the top 20 scores were upstream of genes found to be induced by
phosphate starvation, in a PhoB-dependent manner, by gusA fusion analysis in S.meliloti (see next section). However, most
putative PhoB binding sites with scores above the cut-off level did not show phosphate-dependent regulation of transcription.
Possible explanations in addition to false positives are that the matrix method did not include other important features of a
PhoB binding site, such as appropriately positioned –10 and –35 promoter elements. It is also possible that some genes with
PhoB binding sites require interaction with additional regulatory proteins before the gene can be regulated by phosphate
limitation.
Blanco et al. (23) showed that the C-terminal domain of PhoB interacts with a 22 bp region of dsDNA that consists of two
direct repeats of 11 bp. Each 11 bp repeat has a conserved 7 bp region (consensus, CTGTCAT) followed by a less conserved 4 bp
segment. Our weight matrix is comprised of two conserved 7 bp repeats separated by a single, less conserved 4 bp spacer, and
omits the terminal 4 bp segment. However, this terminal segment is not well conserved (23) and will therefore contribute
little to the weight matrix score. Furthermore, our weight matrix will reliably identify overlapping PhoB sites provided that
they are separated by 4 bp ‘spacers’ and individually have component scores greater than 0.35.
Experimental validation of the predicted Pho regulon members by analysis of transcriptional gene fusions
To directly examine whether the S.meliloti genes identified by the frequency matrix were subject to phosphate-dependent
regulation, we generated transcriptional reporter gene fusions to seventy-two of these candidate genes and examined their
expression in defined MOPS-buffered minimal medium during growth under Pi-excess (2 mM Pi) and Pi-starvation (no Pi added)
conditions (see Materials and Methods). Gene expression in a wild-type phoB+ background was compared with expression in an
otherwise isogenic phoB– background (Table 2). Eighteen of the 72 promoter gene fusions were induced upon Pi-starvation in a
PhoB-dependent fashion (Table 2). In addition, regardless of the media Pi concentration, gene fusions to smb20427 (putative
amino acid ABC transport system), smc02886 and smc02675 (rrna) showed 3-, 2- and 10-fold more expression respectively in the
wild-type background relative to the phoB– background.
Three reporter gene fusions were found to be repressed upon Pi-starvation in a PhoB-dependent manner. These were smc00801
(transmembrane protein of unknown function), smc02601 (nadABC) and smc02862 (orfA-pit). In the wild-type background these
fusions were expressed at higher levels in media containing 2 mM Pi than in Pi-starved cells. Also, in the phoB mutant
background the expression level was elevated and did not alter with media Pi. We have previously reported that expression of
the low-affinity Pi-transport system encoded by the orfA-pit genes is repressed by PhoB (28). The repression of smc02601-
smc02602- smc02603 (nadABC) expression suggests that the observed down regulation of NAD+ synthesis in S.meliloti possibly
corresponds to a slight down regulation of smc00161 expression that appears to occur upon Pi-limited growth (Table 2). The
smc00161 is annotated to encode an NH3-dependent NAD+ synthetase and the promoter region of this genes was predicted to carry
a promoter Pho box (Supplementary Table S1).
Comparison of Pho box predictions with DNA microarray data
Employing DNA microarrays, Krol and Becker (31) identified 98 genes (some of which were in operons) that were more than
3-fold induced in a phoB-dependent manner upon Pi limitation. An additional 50 genes showed a strong increase in expression
under phosphate limitation in a partially phoB-dependent or phoB-independent manner. Krol and Becker (31) also identified
potential Pho-box sequences with 2 mismatches from the Pho-box consensus sequence TG(A/T)CA (C/A)-NNNN-C(C/T)(G/T)TCA(C/T)
defined by Summers et al. (16). Of the 19 Pho-box promoters identified by Krol and Becker (31), 14 were also identified with
our weight matrix (Table 3) and data from our gene fusion experiments (Table 2) revealed that 13 of these 14 genes were
regulated by media Pi in a PhoB dependent manner (Table 2). A reporter fusion to the remaining gene, sma0612 has yet to be
examined. Of the five Pho-boxes not identified by our weight matrix, four were unusual (sma1809, sma1822, smc00170 (sinR) and
smc00429) as they contained 3 or 5 nt in the region flanked by the 7 nt direct repeats instead of the 4 (see Table 1). The
Pho-box upstream of the remaining gene, sma0045, lies on the opposite strand to sma0045 and thus would not be included in our
predictions. However reporter gene fusions to these genes should be analyzed as the microarray experiments suggested these
genes were induced in a phoB-dependent manner (31).
In addition to the 13 Pho regulon members predicted both here and by Krol and Becker (31), both the weight matrix data
and data from reporter fusion assays identified an additional 10 genes whose expression was PhoB-regulated in response to Pi
limitation (Tables 2 and 3). With the exceptions of smb20843 (algI), smc00618(ppk), smc02601(nadA) and smc00801(hypothetical,
global homology), these genes also showed PhoB-dependent transcription in microarray studies. The failure to detect
repression of smc02601(nadA) and smc00801expression in microarray experiments is not surprising as the microarray experiments
also failed to detect orfA-pit repression and this operon is known to be repressed by PhoB (25). The failure to detect
induction of smb20843 (algI), smc00618 (ppk), upon Pi-starvation is more surprising as these genes appear to be highly
regulated in the gene fusion experiments. Moreover expression of ppk is known to be Pi-starvation induced in many organisms.
The differences between the microarray and gene fusion data could result from several factors including differences in
experimental growth conditions as in microarray experiments cells were grown in 100 μM Pi source as the Pi-limitation
condition. Alternatively, it is possible that the particular probes employed for ppk and smb20843 yielded low signals.
Through our weight matrix scan, Pho-box sequences were also found upstream of three more genes, sma2410 (rhbF), smc01296
(rpsN) and smc01820 (putative N-carbamyl-L-amino acid aminohydrolase) (Supplementary Table S1). Promoter fusions to these
three genes have not been tested yet. These genes however are shown to be repressed in a PhoB-independent manner in
microarray studies (31). Further studies are required to analyze the regulation of these genes and the nature of their
associated Pho-box sequences.
We note that in the case of orfA-pit, the Pho-box identified by Krol and Becker (31) lay on the opposite strand to the
orfA-pit genes and is different from that identified by Bardin et al. (28). Since orfA-pit expression is negatively regulated
by PhoB, it is of interest to determine the actual PhoB binding site as little is known regarding how PhoB represses
transcription. In summary, of 96 genes with upstream Pho-boxes predicted by the frequency matrix genome analysis, 34 appear
to be Pi and PhoB regulated as revealed from gene fusion and microarray analysis data (Table 3).
Analysis of predicted Pho regulon members across proteobacterial genomes
It is reasonable to assume that at least part of the physiological response to Pi-limitation will be conserved. As the
Pho-box sequence identified by the PhoB proteins of different organisms appears to be conserved (14,16,24,28,42–44), we used
the Pho-box frequency matrix described above (Table 1) to search the genomes of twelve gram negative bacteria (Table 4) for
PhoB-binding sites using the same criteria as employed for S.meliloti. Genes that lay downstream of a predicted Pho-box with
scores greater than 0.35 were further examined. We identified genes, such as pstSCAB, phoA, ugpA, phn and ppk that are known
to be associated with phosphate metabolism (Tables 5–7). The pstS gene encodes the Pi-binding protein of the high affinity
PstSCAB transport system (18,27) and expression of this system in E.coli, S.meliloti and Pseudomonas aeruginosa is known to
be highly induced under Pi-limiting conditions and is PhoB dependent (27). In a number of organisms, such as Caulobacter
crescentus, the pstS gene transcript is separate from the pstCAB-phoUB transcript and in these cases predicted Pho-boxes are
also located upstream of the pstC gene (see Table 5).
It was striking that multiple18 bp Pho-box sequences were predicted upstream of the pst genes in all of the genomes
examined (Table 5). Multiple Pho-boxes consisted of overlapping 7 bp direct repeats separated by 4 bp spacers. The frequency
matrix detected consecutive 18 bp elements and adding a terminal 4 bp spacer formed consecutive 22 bp PhoB binding sites as
defined by Blanco et al. (23). The two 11 bp direct repeat sequences bind the PhoB monomers head to tail (23). The pstS
promoters from E.coli K12 and O157:H7 are predicted to contain five and six of these 11 bp direct repeats, respectively. The
large number of Pho-boxes in all of the pstS promoter regions presumably reflects the importance of the PstSCAB high affinity
transport system in the uptake of Pi under Pi-limiting conditions. Other genes associated with phosphate metabolism for which
multiple Pho-boxes sequences were detected included alkaline phosphatase-like proteins (phoA), genes involved in phosphonate
uptake and metabolism (phn), in glycerol-3-phosphate uptake (ugp and glp), the regulatory genes phoB and phoR (Table 5) and
genes encoding polyphosphate kinase (Table 6).
In addition to the previously reported Pho-box in the orfA-pit promoter region of S.meliloti, Pho-box sequences were also
detected in the promoter region of the orfA-pit orthologues in the -proteobacteria, Bradyrhizobium japonicum and
Mesorhizobium loti and the -proteobacteria Pseudomonas putida and Acinetobacter sp (Table 5). Bardin et al. (28) showed that
the expression of orfA-pit in S.meliloti is repressed upon Pi-starvation, unlike in E.coli where the pit genes appear to be
constitutively expressed (45) and for which no Pho-boxes were detected. The identification of putative Pho-boxes upstream of
the orfA-pit genes in other bacteria suggests that these may also be repressed by Pi-starvation and that such repression may
be a widespread phenomenon.
A number of predicted Pho regulon members not normally associated with Pi metabolism were identified in several genomes
(Table 7). One of the genes in this category was katA encoding catalase and was recently shown to be PhoB dependent in
S.meliloti and P.aeruginosa in Pi-starvation conditions (46). The detection of Pho-box elements upstream of the katA genes of
C.crescentus and P.putida suggests that katA expression in these organisms is also PhoB regulated. Pho-boxes upstream of
several S.meliloti ABC-class transport systems were also detected upstream of homologous clusters in other bacteria. These
were smc01605, smc04317 (afuA) and smc03124 (Table 7). Both the afuABC and smc01605 gene clusters in S.meliloti are annotated
as putatively involved in Fe+3 transport, however definitive evidence is lacking. Choa et al., (47) did not find either of
these ABC transport systems to be up-regulated in S.meliloti when grown in iron-limiting conditions. Therefore, it appears
unlikely that they are actually involved in iron transport. A third ABC system in S.meliloti, smc03124, with conserved Pho-
box sequences in other proteobacteria (Table 7), is annotated as a putative peptide binding protein. The actual substrate(s)
transported by this system is unknown.
We identified a putative Pho-box upstream of smc00772 (potH)- gene clusters well as orthologues in M.loti and Brucella
suis (Table 7). Although fusion data for smc00772 is unavailable, the potFGHI ABC-class, putative putrescine transporter
cluster was identified as upregulated by Pi-limitation in the microarray analysis (31), although no Pho-box was identified by
them. The putative Pho-box upstream of smc00772 (potH) lies within the coding region of potG (smc00771), instead of upstream
of the regulator (potF). The fact that Pho-box-like sequences were identified upstream of genes similar to S.meliloti potH in
M.loti and B.suis suggest that putrescine transport may be PhoB-regulated across a range of organisms and should be further
investigated.
In response to Pi-starvation, S.meliloti replaces phospholipids with other non-Pi-containing lipids sulphoquinovosyl
diacylglycerols (SL), ornithine-containing lipids (OL) and diacylglyceryl-N,N,N-trimethylhomoserines (DGTS) (48,49). In
Rhodobacter sphaeroides it was demonstrated that the smc01848 homolog btaA is directly involved in DGTS biosynthesis (50) and
recently Lopez-Lara et al. (51) established that smc01848 and smc01849 (btaAB) are required for DGTS synthesis. A Pho-box is
predicted 64 nt from the smc01848 start codon and orthologs of smc01848 in M.loti (mlr1574) and Agrobacterium tumefaciens
(atu2119) also have predicted Pho boxes in the corresponding promoter regions (Table 5). These data strongly suggest that
DGTS synthesis induced upon Pi limitation is mediated directly via PhoR-PhoB system.
Pi starvation and polyphosphate metabolism
Inorganic polyphosphates (polyPi) are linear polymers of orthophosphate residues linked by high-energy phosphoanhydride
bonds. These polymers can vary in size from 3 to over 1000 phosphate residues. PolyPi is ubiquitous and the enzyme primarily
responsible for polyPi synthesis in E.coli is polyP kinase (PPK), which uses the gamma phosphate of ATP to make the polymer.
PolyPi can also be hydrolyzed to Pi either by exopolyhosphatases (PPX) or by endopolyphosphatases (PPN). The identification
and assignment of Pho-boxes was sometimes complicated by differences in genome annotation, as in the case of genes encoding
polyphosphate kinase (ppk) (Table 6). Here Pho boxes were predicted in the ppk promoter regions of 10 of the 12 genomes
examined. However, the predicted Pho-box from both M.loti (52) and C.crescentus (53) were located within the annotated gene
coding regions. Alignment of the Ppk amino acid sequence suggests that the actual start codons of the ppk genes in M.loti and
C.crescentus are downstream of the annotated start codons (data not shown). Our reporter gene fusion data showed that the
S.meliloti ppk gene was strongly induced member of the Pho regulon. However, most strikingly, the weight matrix did not
detect a ppk Pho box either from the E.coli K12 genome or the E.coli O157 genome, even at very low cut-off (0.18). In E.coli
there is genetic evidence demonstrated that polyphosphates accumulate upon Pi starvation and depend on PhoB, although the
E.coli ppk promoter has never been mapped. Therefore, it is likely that E.coli PhoB regulates ppk indirectly as suggested
elsewhere (54).
CONCLUSION
Several complementary approaches were integrated to investigate the cellular response to Pi starvation. As a first step,
computational identification of PhoB binding motifs predicted 96 potential Pho regulon members from the entire S.meliloti
genome. These were subsequently investigated by genetic screening of transcriptional reporter gene fusions and through
comparisons with recently available microarray data (31). It was found that 34 out of the 96 in silico predicted Pho regulon
members were regulated by Pi concentration in a PhoB dependent manner (Table 3). These 34 Pho regulon members were analyzed
in silico for conservation or co-occurrence across 12 genomes scanned (Tables 5 and 7). Nineteen of these 34 candidates were
also predicted as having upstream Pho-boxes in at least one of the other genomes scanned in this study. The in silico
analysis provided evidence for the conservation of a core Pho regulon in bacteria and suggests that these organisms share a
common response to Pi limitation. Such a conservation is not surprising as for example in both plants and yeast one of the
major responses to Pi-limitation is the induction of a high affinity Pi transport system and the induction of scavenging
enzymes, such as alkaline phosphatases.
Extending the Pho-box analysis to many more genomes should define the core group of genes that respond to Pi-starvation.
Further it will allow the identification of subgroups of genes, such as katA, whose expression is regulated by PhoB in some
organisms but not in others. Analysis of the distribution of such data may lead to the recognition of associations between
particular regulatory patterns and other phenotypic properties of the organisms.
SUPPLEMENTARY DATA
ACKNOWLEDGEMENTS
This work was supported with funding from the Natural Sciences and Engineering Research Council of Canada, from Genome
Canada through the Ontario Genomics Institute and from the Ontario Research and Development Challenge Fund to T.M.F. The
authors thank Dr Brain Golding for help in computing and he and Weilong Hao and Ying Fong for help and advice for the in
silico comparison analysis. Funding to pay the Open Access publication charges for this article was provided by NSERC and
Genome Canada.
Conflict of interest statement. None declared.
REFERENCES
McGuire, A.M. and Church, G.M. (2000) Predicting regulons and their cis-regulatory motifs by comparative genomics Nucleic
Acids Res, . 28, 4523–4530 .
Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., et
al. (2005) Assessing computational tools for the discovery of transcription factor binding sites Nat. Biotechnol, . 23, 137–
144 .
Thieffry, D., Salgado, H., Huerta, A.M., Collado-Vides, J. (1998) Prediction of transcriptional regulatory sites in the
complete genome sequence of Escherichia coli K12 Bioinformatics, 14, 391–400 .
Hertz, G.Z. and Stormo, G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of
multiple sequences Bioinformatics, 15, 563–577 .
Stormo, G.D. (2000) DNA binding sites: representation and discovery Bioinformatics, 16, 16–23[Abstract/Free Full Text] .
Fernandez De, Henestrosa,A.R., Ogi, T., Aoyagi, S., Chafin, D., Hayes, J.J., Ohmori, H., Woodgate, R. (2000)
Identification of additional genes belonging to the LexA regulon in Escherichia coli Mol. Microbiol, . 35, 1560–1572
Panina, E.M., Mironov, A.A., Gelfand, M.S. (2001) Comparative analysis of FUR regulons in gamma-proteobacteria Nucleic
Acids Res, . 29, 5195–5206 .
Tan, K., Moreno-Hagelsieb, G., Collado-Vides, J., Stormo, G.D. (2001) A comparative genomics approach to prediction of
new members of regulons Genome Res, . 11, 566–584 .
Baichoo, N., Wang, T., Ye, R., Helmann, J.D. (2002) Global analysis of the Bacillus subtilis Fur regulon and the iron
starvation stimulon Mol. Microbiol, . 45, 1613–1629 .
Dombrecht, B., Marchal, K., Vanderleyden, J., Michiels, J. (2002) Prediction and overview of the RpoN-regulon in closely
related species of the Rhizobiales Genome Biol, . 3, research0076.1–research0076.11 .
Gaballa, A., Wang, T., Ye, R.W., Helmann, J.D. (2002) Functional analysis of the Bacillus subtilis Zur regulon J.
Bacteriol, . 184, 6508–6514 .
Zheng, D., Constantinidou, C., Hobman, J.L., Minchin, S.D. (2004) Identification of the CRP regulon using in vitro and in
vivo transcriptional profiling Nucleic Acids Res, . 32, 5874–5893 .
Lee, T.Y., Makino, K., Shinagawa, H., Amemura, M., Nakata, A. (1989) Phosphate regulon in members of the family
Enterobacteriaceae: comparison of the PhoB-PhoR operons of Escherichia coli, Shigella dysenteriae, and Klebsiella pneumoniae
J. Bacteriol, . 171, 6593–6599
Wanner, B.L. (1993) Gene regulation by Phosphate in enteric bacteria J. Cell. Biochem, . 51, 47–54
Scholten, M., Janssen, R., Bogaarts, C., van Strien, J., Tommassen, J. (1995) The Pho regulon of Shigella flexneri Mol.
Microbiol, . 15, 247–254 .
Summers, M.L., Denton, M.C., McDermott, T.R. (1999) Genes coding for Phosphotransacetylase and acetate kinase in
Sinorhizobium meliloti are in an operon that is inducible by Phosphate stress and controlled by PhoB J. Bacteriol, . 181,
2217–2224
Makino, K., Amemura, M., Kawamoto, T., Kimura, S., Shinagawa, H., Nakata, A., Suzuki, M. (1996) DNA binding of PhoB and
its interaction with RNA polymerase J. Mol. Biol, . 259, 15–26
Kimura, S., Makino, K., Shinagawa, H., Amemura, M., Nakata, A. (1989) Regulation of the Phosphate regulon of Escherichia
coli: characterization of the promoter of the pstS gene Mol. Gen. Genet, . 215, 374–380
Wu, H., Kato, J., Kuroda, A., Ikeda, T., Takiguchi, N., Ohtake, H. (2000) Identification and characterization of two
chemotactic transducers for inorganic phosphate in Pseudomonas aeruginosa J. Bacteriol, . 182, 3400–3404
Rosenberg, H., Gerdes, R.G., Chegwidden, K. (1977) Two systems for the uptake of phosphate in Escherichia coli J.
Bacteriol, . 131, 505–511
Yuan, Z.C., Zaheer, R., Finan, T.M. (2005) Phosphate limitation induces catalase expression in Sinorhizobium meliloti,
Pseudomonas aeruginosa and Agrobacterium tumefaciens Mol. Microbiol, . 58, 877–894
Chao, T.C., Buhrmester, J., Hansmeier, N., Puhler, A., Weidner, S. (2005) Role of the regulatory gene rirA in the
transcriptional response of Sinorhizobium meliloti to iron limitation Appl. Environ. Microbiol, . 71, 5969–5982
Geiger, O., Rohrs, V., Weissenmayer, B., Finan, T.M., Thomas-Oates, J.E. (1999) The regulator gene phoB mediates
Phosphate stress-controlled synthesis of the membrane lipid diacylglyceryl-N,N,N-trimethylhomoserine in Rhizobium
(Sinorhizobium) meliloti Mol. Microbiol, . 32, 63–73 .
Lopez-Lara, I.M., Sohlenkamp, C., Geiger, O. (2003) Membrane lipids in plant-associated bacteria: their biosyntheses and
possible functions Mol. Plant Microbe Interact, . 16, 567–579
Klug, R.M. and Benning, C. (2001) Two enzymes of diacylglyceryl-O-4'-(N,N,N,-trimethyl) homoserine biosynthesis are
encoded by btaA and btaB in the purple bacterium Rhodobacter sphaeroides Proc. Natl Acad. Sci. USA, 98, 5910–5915 .
Lopez-Lara, I., M., Gao, J.L., Soto, M.J., Solares-Perez, A., Weissenmayer, B., Sohlenkamp, C., Verroios, G.P., Thomas-
Oates, J., Geiger, O. (2005) Phosphorus-free membrane lipids of Sinorhizobium meliloti are not required for the symbiosis
with alfalfa but contribute to increased cell yields under phosphorus-limiting conditions of growth Mol. Plant Microbe.
Interact, . 18, 973–982
Kaneko, T., Nakamura, Y., Sato, S., Asamizu, E., Kato, T., Sasamoto, S., Watanabe, A., Idesawa, K., Ishikawa, A.,
Kawashima, K., et al. (2000) Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti DNA Res,
. 7, 331–338 .
Nierman, W.C., Feldblyum, T.V., Laub, M.T., Paulsen, I.T., Nelson, K.E., Eisen, J.A., Heidelberg, J.F., Alley, M.R.,
Ohta, N., Maddock, J.R., et al. (2001) Complete genome sequence of Caulobacter crescentus Proc. Natl Acad. Sci. USA, 98, 4136
–4141 .
Kornberg, A., Rao, N.N., Ault-Riche, D. (1999) Inorganic polyphosphate: a molecule of many functions Annu. Rev. Biochem,
. 68, 89–125(Ze-Chun Yuan, Rahat Zahee)
In proteobacteria, genes whose expression is modulated in response to the external concentration of inorganic phosphate
are often regulated by the PhoB protein which binds to a conserved motif (Pho box) within their promoter regions. Using a
position weight matrix algorithm derived from known Pho box sequences, we identified 96 putative Pho regulon members whose
promoter regions contained one or more Pho boxs in the Sinorhizobium meliloti genome. Expression of these genes was examined
through assays of reporter gene fusions and through comparison with published microarray data. Of 96 genes, 31 were induced
and 3 were repressed by Pi starvation in a PhoB dependent manner. Novel Pho regulon members included several genes of unknown
function. Comparative analysis across 12 proteobacterial genomes revealed highly conserved Pho regulon members including
genes involved in Pi metabolism (pstS, phnC and ppdK). Genes with no obvious association with Pi metabolism were predicted to
be Pho regulon members in S.meliloti and multiple organisms. These included smc01605 and smc04317 which are annotated as
substrate binding proteins of iron transporters and katA encoding catalase. This data suggests that the Pho regulon overlaps
and interacts with several other control circuits, such as the oxidative stress response and iron homeostasis.
INTRODUCTION
Dissection of regulatory networks that control gene transcription is among the primary goals of the post-genomic era of
biology. Whether gene expression is measured from microarrays or reporter gene fusions or other methodologies, it is
generally not possible to distinguish between the direct and indirect modulation of transcription. Bioinformatic approaches
to identify the regulatory networks have included the design of algorithms for genome-wide prediction of conserved regulatory
DNA binding motifs (1,2). A promising approach in the delineation of transcriptional networks lies in combining genomic
scanning or in silico analysis with experimental transcription data obtained from cells grown under diverse experimental
conditions (1,3–12). In this report, we combine in silico prediction with experimental data obtained from reporter gene
fusions and through comparisons with published microarray data. We also explore cross-species comparative genomics as a tool
to identify genes whose expression is controlled by a transcriptional regulator, PhoB, in response to the phosphate
starvation.
Inorganic phosphate (Pi) plays key roles in cells. In ATP, it is involved in energy metabolism, in protein
phosphorylation it is responsible for regulation of transcription and many other cellular processes including chemotaxis and
cell division, and perhaps most importantly, Pi is a major structural component of nucleic acids and membrane phospholipids.
In many gram-negative bacteria, the transport and metabolism of Pi and phosphorous containing compounds is regulated at the
transcriptional level by a two-component PhoR-PhoB signal transduction system. The Pho regulon consists of genes or operons
regulated by PhoB and this has been well studied in Escherichia coli (13–15). Under Pi limiting conditions, the PhoR
histidine kinase sensor undergoes autophosphorylation and subsequently donates its phosphate group to its cognate response
regulator PhoB. Phosphorylated PhoB (PhoB-Pi) then modulates transcription of its targets by binding to a highly conserved 18
nt DNA sequence called the Pho box (or PhoB binding motif) which usually overlaps the –35 region of PhoB-regulated promoters
(16,17). The majority of identified Pho boxes essentially comprised two 7 nt direct repeats of 5'-CTGTCAT-3' separated by a
conserved 4 nt spacer in the middle. It was postulated that the PhoB and Pho box binding complex interacts with the 70
subunit of RNA polymerase to control transcription initiation (18–23). Over the past 30 years, about 30 Pho regulon members,
which predominantly encompass an ensemble of genes involved in Pi uptake and metabolism, have been identified in E.coli as
reviewed by Wanner (14).
We are studying the gram negative -proteobacterium Sinorhizobium meliloti. This organism forms N2-fixing root-nodules on
alfalfa (Medicago sativa) and its genome is unusual as in addition to a 3.4 Mb chromosome it contains two megaplasmids 1.3
and 1.7 Mb in size. Previous studies have identified several Pho regulon members in S.meliloti including the pstSCAB and
phoCDET operons which encode ABC-type high affinity transport systems for Pi and in the case of phoCDET likely phosphonates
(24–27). The orfA-pit operon encodes a low affinity Pi transport system whose expression is negatively regulated by PhoB
(28). Other Pho regulons include the exp, phn and pta-ackA operons (16,29,30). Using DNA microarray and promoter analysis,
Krol and Becker (31) identified several novel putative Pho regulon members including afuA which is annotated as an iron
transport binding protein. In ongoing studies to understand the response of S.meliloti to Pi limitation, we constructed a Pho
box weight matrix based on known E.coli and S.meliloti PhoB binding sites and used this matrix to predict new PhoB binding
sites in the S.meliloti genome. Expression of predicted Pho regulon members then was examined through the analysis of
transcriptional reporter gene fusions and through the previously reported microarray data (31). The frequency weight matrix
was also employed to predict PhoB binding motifs across 12 closely related proteobacterial genomes with a goal to identifying
a common set of PhoB regulated genes as might be expected from a conserved biological response to Pi-limitation.
MATERIALS AND METHODS
Construction of the Pho box weight matrix for prediction of PhoB binding sites
A total of fifteen known Pho boxes from S.meliloti and E.coli were used for weight matrix construction. Five of those Pho
box sequences were collected from previously identified PhoB binding sites from S.meliloti. Of those five, four were from
S.meliloti strain 1021 including one PhoB binding site upstream orfA-pit; two sites from the phoC promoter; one from the phnG
promoter (24,28,32) and one Pho box was taken from the orfA-pta-ackA promoter of S.meliloti strain 104A14 (16). Ten PhoB
binding sites from E.coli were phoA, phoB, phoE, phoH, phnC, pstS1, pstS2, ugpB1, ugpB2 and ugpB3 (18,33–35) (see Table 1).
Following their alignment a matrix was constructed from the relative frequencies of A, T, C or G at each position of the 18
nt Pho box sequence (Table 1). This matrix was used to determine an information-based measure of potential binding sites
according to the method of Schneider et al. (36). An 18 bp window was moved over the entire genome on both strands and the
score (Si) at each nucleotide position (having base i) was calculated according to Si = (1/18) j [2 + log2(Fij)], where Fij
is the frequency matrix for base i at position j. This score, which ranges from –2.62 (the score of the worse match) to 1.39
(the score of the consensus sequence), is a measure of the information content of a potential binding site measured against
the example set. The lowest example score, that of orfA-pit, is 0.36 and a threshold of 0.35 was used to define a ‘hit’. A
scan of the entire S.meliloti genome produced about 1500 hits on each strand. These were filtered to retain only those that
were between –500 to +100 bp on the coding strand from an annotated translational start site.
Generation of gusA transcriptional gene fusions to the PCR amplified Pho box containing promoters
To construct the gusA reporter gene fusions to the Pho box containing promoters, each promoter region was PCR amplified using
the primers as listed in Supplementary Table 2, and the PCR amplified promoter fragments were digested with appropriate
restriction enzymes and cloned into either pFUS1 vector which is a broad host replicable vector containing promoterless gusA
(uidA) gene (37) or into a suicide plasmid pTH1360 [modified pVO155 (38) by replacement of gusA coding and upstream sequences
with the ones in pFUS1]. The corresponding gene fusion plasmids were verified by sequencing and subsequently introduced into
S.meliloti wild-type strains RCR2011 and its derivative RmP559 (RCR2011, PhoB3::TnV) strains or RmP110 and RmH852 (Rm1021,
phoB3::Tn5-233) by tri-parental mating using MT616 as the helper strain as described previously (39).
?-Glucuronidase and alkaline phosphatase assays
To determine the expression of the predicted Pho box containing genes or operons in response to Pi limitation, the
S.meliloti wild-type strains and its PhoB mutant harbouring the plasmid borne promoter::gusA gene fusion were inoculated in 2
ml LBmc containing 2.5 μg/ml tetracycline and grown overnight aerobically at 30°C to OD600 of 1.0. Luria–Bertani (LB)
broth was supplemented with 2.5 mM MgSO4 and 2.5 mM CaCl2 (LBmc) (40). A total of 0.5 m of cultures were spun down in a 1.5
ml microcentrifuge tube, washed twice in 1 ml phosphate free MOPS minimal medium (P0 medium) and resuspended in 250 μl of
the P0 medium. MOPS-buffered minimal medium contains 40 mM morpholinopropane sulfonic acid/20 mM potassium hydroxide; 20 mM
NH4Cl; 2 mM MgSO4; 2 mM CaCl2; 100 mM NaCl; 15 mm filter-sterilized glucose as carbon source and supplied with 0.3 μg/ml
biotin and 10 ng/ml CoCl2 (24,41). Ten microliter aliquots of washed cells were subcultured into 5 ml of P0 medium, or MOPS
minimal medium supplied with 2 mM KH2PO4 (P2 medium). After 32 h incubation at 30°C, 2 ml cultures were spun down at 10 000
r.p.m. for 1 min and resuspended in 1 M Tris–HCl (pH 8.0) for alkaline phosphatase assays as described by Bardin et al.
(24), and 3 ml cultures were left for ?-glucuronidase assays according to the protocol described by Reeve et al. (37).
RESULTS AND DISCUSSION
Weight matrix prediction of potential PhoB regulated genes
A weight matrix to identify potential PhoB binding sites was generated from five S.meliloti and ten E.coli PhoB box
example sequences of 18 nt length (Table 1). The nucleotide frequency matrix was used to calculate an information-based score
for potential binding sites in a scan of the S.meliloti genome. Putative PhoB binding sites were defined by a score of
greater than 0.35 and a location between +100 and –500 nt of the translational start codon on the transcribed strand of an
annotated gene (see Materials and Methods). One hundred and three putative PhoB binding sites were found and are shown with
their downstream annotated genes in Supplementary Table S1. Seven of these promoter regions contained two putative PhoB
boxes, so that 96 distinct genes were found. Three out of four genes whose Pho boxes were used for matrix construction were
also among those 96 genes. No orthologue of orfA-pta-ackA from S.meliloti strain 104A14 was found in Rm1021 strain.
The threshold score (0.35) used to identify putative PhoB binding sites was derived from the lowest score (that of orfA-
pit) among the example sequences. With this threshold, 18 of the top 20 scores were upstream of genes found to be induced by
phosphate starvation, in a PhoB-dependent manner, by gusA fusion analysis in S.meliloti (see next section). However, most
putative PhoB binding sites with scores above the cut-off level did not show phosphate-dependent regulation of transcription.
Possible explanations in addition to false positives are that the matrix method did not include other important features of a
PhoB binding site, such as appropriately positioned –10 and –35 promoter elements. It is also possible that some genes with
PhoB binding sites require interaction with additional regulatory proteins before the gene can be regulated by phosphate
limitation.
Blanco et al. (23) showed that the C-terminal domain of PhoB interacts with a 22 bp region of dsDNA that consists of two
direct repeats of 11 bp. Each 11 bp repeat has a conserved 7 bp region (consensus, CTGTCAT) followed by a less conserved 4 bp
segment. Our weight matrix is comprised of two conserved 7 bp repeats separated by a single, less conserved 4 bp spacer, and
omits the terminal 4 bp segment. However, this terminal segment is not well conserved (23) and will therefore contribute
little to the weight matrix score. Furthermore, our weight matrix will reliably identify overlapping PhoB sites provided that
they are separated by 4 bp ‘spacers’ and individually have component scores greater than 0.35.
Experimental validation of the predicted Pho regulon members by analysis of transcriptional gene fusions
To directly examine whether the S.meliloti genes identified by the frequency matrix were subject to phosphate-dependent
regulation, we generated transcriptional reporter gene fusions to seventy-two of these candidate genes and examined their
expression in defined MOPS-buffered minimal medium during growth under Pi-excess (2 mM Pi) and Pi-starvation (no Pi added)
conditions (see Materials and Methods). Gene expression in a wild-type phoB+ background was compared with expression in an
otherwise isogenic phoB– background (Table 2). Eighteen of the 72 promoter gene fusions were induced upon Pi-starvation in a
PhoB-dependent fashion (Table 2). In addition, regardless of the media Pi concentration, gene fusions to smb20427 (putative
amino acid ABC transport system), smc02886 and smc02675 (rrna) showed 3-, 2- and 10-fold more expression respectively in the
wild-type background relative to the phoB– background.
Three reporter gene fusions were found to be repressed upon Pi-starvation in a PhoB-dependent manner. These were smc00801
(transmembrane protein of unknown function), smc02601 (nadABC) and smc02862 (orfA-pit). In the wild-type background these
fusions were expressed at higher levels in media containing 2 mM Pi than in Pi-starved cells. Also, in the phoB mutant
background the expression level was elevated and did not alter with media Pi. We have previously reported that expression of
the low-affinity Pi-transport system encoded by the orfA-pit genes is repressed by PhoB (28). The repression of smc02601-
smc02602- smc02603 (nadABC) expression suggests that the observed down regulation of NAD+ synthesis in S.meliloti possibly
corresponds to a slight down regulation of smc00161 expression that appears to occur upon Pi-limited growth (Table 2). The
smc00161 is annotated to encode an NH3-dependent NAD+ synthetase and the promoter region of this genes was predicted to carry
a promoter Pho box (Supplementary Table S1).
Comparison of Pho box predictions with DNA microarray data
Employing DNA microarrays, Krol and Becker (31) identified 98 genes (some of which were in operons) that were more than
3-fold induced in a phoB-dependent manner upon Pi limitation. An additional 50 genes showed a strong increase in expression
under phosphate limitation in a partially phoB-dependent or phoB-independent manner. Krol and Becker (31) also identified
potential Pho-box sequences with 2 mismatches from the Pho-box consensus sequence TG(A/T)CA (C/A)-NNNN-C(C/T)(G/T)TCA(C/T)
defined by Summers et al. (16). Of the 19 Pho-box promoters identified by Krol and Becker (31), 14 were also identified with
our weight matrix (Table 3) and data from our gene fusion experiments (Table 2) revealed that 13 of these 14 genes were
regulated by media Pi in a PhoB dependent manner (Table 2). A reporter fusion to the remaining gene, sma0612 has yet to be
examined. Of the five Pho-boxes not identified by our weight matrix, four were unusual (sma1809, sma1822, smc00170 (sinR) and
smc00429) as they contained 3 or 5 nt in the region flanked by the 7 nt direct repeats instead of the 4 (see Table 1). The
Pho-box upstream of the remaining gene, sma0045, lies on the opposite strand to sma0045 and thus would not be included in our
predictions. However reporter gene fusions to these genes should be analyzed as the microarray experiments suggested these
genes were induced in a phoB-dependent manner (31).
In addition to the 13 Pho regulon members predicted both here and by Krol and Becker (31), both the weight matrix data
and data from reporter fusion assays identified an additional 10 genes whose expression was PhoB-regulated in response to Pi
limitation (Tables 2 and 3). With the exceptions of smb20843 (algI), smc00618(ppk), smc02601(nadA) and smc00801(hypothetical,
global homology), these genes also showed PhoB-dependent transcription in microarray studies. The failure to detect
repression of smc02601(nadA) and smc00801expression in microarray experiments is not surprising as the microarray experiments
also failed to detect orfA-pit repression and this operon is known to be repressed by PhoB (25). The failure to detect
induction of smb20843 (algI), smc00618 (ppk), upon Pi-starvation is more surprising as these genes appear to be highly
regulated in the gene fusion experiments. Moreover expression of ppk is known to be Pi-starvation induced in many organisms.
The differences between the microarray and gene fusion data could result from several factors including differences in
experimental growth conditions as in microarray experiments cells were grown in 100 μM Pi source as the Pi-limitation
condition. Alternatively, it is possible that the particular probes employed for ppk and smb20843 yielded low signals.
Through our weight matrix scan, Pho-box sequences were also found upstream of three more genes, sma2410 (rhbF), smc01296
(rpsN) and smc01820 (putative N-carbamyl-L-amino acid aminohydrolase) (Supplementary Table S1). Promoter fusions to these
three genes have not been tested yet. These genes however are shown to be repressed in a PhoB-independent manner in
microarray studies (31). Further studies are required to analyze the regulation of these genes and the nature of their
associated Pho-box sequences.
We note that in the case of orfA-pit, the Pho-box identified by Krol and Becker (31) lay on the opposite strand to the
orfA-pit genes and is different from that identified by Bardin et al. (28). Since orfA-pit expression is negatively regulated
by PhoB, it is of interest to determine the actual PhoB binding site as little is known regarding how PhoB represses
transcription. In summary, of 96 genes with upstream Pho-boxes predicted by the frequency matrix genome analysis, 34 appear
to be Pi and PhoB regulated as revealed from gene fusion and microarray analysis data (Table 3).
Analysis of predicted Pho regulon members across proteobacterial genomes
It is reasonable to assume that at least part of the physiological response to Pi-limitation will be conserved. As the
Pho-box sequence identified by the PhoB proteins of different organisms appears to be conserved (14,16,24,28,42–44), we used
the Pho-box frequency matrix described above (Table 1) to search the genomes of twelve gram negative bacteria (Table 4) for
PhoB-binding sites using the same criteria as employed for S.meliloti. Genes that lay downstream of a predicted Pho-box with
scores greater than 0.35 were further examined. We identified genes, such as pstSCAB, phoA, ugpA, phn and ppk that are known
to be associated with phosphate metabolism (Tables 5–7). The pstS gene encodes the Pi-binding protein of the high affinity
PstSCAB transport system (18,27) and expression of this system in E.coli, S.meliloti and Pseudomonas aeruginosa is known to
be highly induced under Pi-limiting conditions and is PhoB dependent (27). In a number of organisms, such as Caulobacter
crescentus, the pstS gene transcript is separate from the pstCAB-phoUB transcript and in these cases predicted Pho-boxes are
also located upstream of the pstC gene (see Table 5).
It was striking that multiple18 bp Pho-box sequences were predicted upstream of the pst genes in all of the genomes
examined (Table 5). Multiple Pho-boxes consisted of overlapping 7 bp direct repeats separated by 4 bp spacers. The frequency
matrix detected consecutive 18 bp elements and adding a terminal 4 bp spacer formed consecutive 22 bp PhoB binding sites as
defined by Blanco et al. (23). The two 11 bp direct repeat sequences bind the PhoB monomers head to tail (23). The pstS
promoters from E.coli K12 and O157:H7 are predicted to contain five and six of these 11 bp direct repeats, respectively. The
large number of Pho-boxes in all of the pstS promoter regions presumably reflects the importance of the PstSCAB high affinity
transport system in the uptake of Pi under Pi-limiting conditions. Other genes associated with phosphate metabolism for which
multiple Pho-boxes sequences were detected included alkaline phosphatase-like proteins (phoA), genes involved in phosphonate
uptake and metabolism (phn), in glycerol-3-phosphate uptake (ugp and glp), the regulatory genes phoB and phoR (Table 5) and
genes encoding polyphosphate kinase (Table 6).
In addition to the previously reported Pho-box in the orfA-pit promoter region of S.meliloti, Pho-box sequences were also
detected in the promoter region of the orfA-pit orthologues in the -proteobacteria, Bradyrhizobium japonicum and
Mesorhizobium loti and the -proteobacteria Pseudomonas putida and Acinetobacter sp (Table 5). Bardin et al. (28) showed that
the expression of orfA-pit in S.meliloti is repressed upon Pi-starvation, unlike in E.coli where the pit genes appear to be
constitutively expressed (45) and for which no Pho-boxes were detected. The identification of putative Pho-boxes upstream of
the orfA-pit genes in other bacteria suggests that these may also be repressed by Pi-starvation and that such repression may
be a widespread phenomenon.
A number of predicted Pho regulon members not normally associated with Pi metabolism were identified in several genomes
(Table 7). One of the genes in this category was katA encoding catalase and was recently shown to be PhoB dependent in
S.meliloti and P.aeruginosa in Pi-starvation conditions (46). The detection of Pho-box elements upstream of the katA genes of
C.crescentus and P.putida suggests that katA expression in these organisms is also PhoB regulated. Pho-boxes upstream of
several S.meliloti ABC-class transport systems were also detected upstream of homologous clusters in other bacteria. These
were smc01605, smc04317 (afuA) and smc03124 (Table 7). Both the afuABC and smc01605 gene clusters in S.meliloti are annotated
as putatively involved in Fe+3 transport, however definitive evidence is lacking. Choa et al., (47) did not find either of
these ABC transport systems to be up-regulated in S.meliloti when grown in iron-limiting conditions. Therefore, it appears
unlikely that they are actually involved in iron transport. A third ABC system in S.meliloti, smc03124, with conserved Pho-
box sequences in other proteobacteria (Table 7), is annotated as a putative peptide binding protein. The actual substrate(s)
transported by this system is unknown.
We identified a putative Pho-box upstream of smc00772 (potH)- gene clusters well as orthologues in M.loti and Brucella
suis (Table 7). Although fusion data for smc00772 is unavailable, the potFGHI ABC-class, putative putrescine transporter
cluster was identified as upregulated by Pi-limitation in the microarray analysis (31), although no Pho-box was identified by
them. The putative Pho-box upstream of smc00772 (potH) lies within the coding region of potG (smc00771), instead of upstream
of the regulator (potF). The fact that Pho-box-like sequences were identified upstream of genes similar to S.meliloti potH in
M.loti and B.suis suggest that putrescine transport may be PhoB-regulated across a range of organisms and should be further
investigated.
In response to Pi-starvation, S.meliloti replaces phospholipids with other non-Pi-containing lipids sulphoquinovosyl
diacylglycerols (SL), ornithine-containing lipids (OL) and diacylglyceryl-N,N,N-trimethylhomoserines (DGTS) (48,49). In
Rhodobacter sphaeroides it was demonstrated that the smc01848 homolog btaA is directly involved in DGTS biosynthesis (50) and
recently Lopez-Lara et al. (51) established that smc01848 and smc01849 (btaAB) are required for DGTS synthesis. A Pho-box is
predicted 64 nt from the smc01848 start codon and orthologs of smc01848 in M.loti (mlr1574) and Agrobacterium tumefaciens
(atu2119) also have predicted Pho boxes in the corresponding promoter regions (Table 5). These data strongly suggest that
DGTS synthesis induced upon Pi limitation is mediated directly via PhoR-PhoB system.
Pi starvation and polyphosphate metabolism
Inorganic polyphosphates (polyPi) are linear polymers of orthophosphate residues linked by high-energy phosphoanhydride
bonds. These polymers can vary in size from 3 to over 1000 phosphate residues. PolyPi is ubiquitous and the enzyme primarily
responsible for polyPi synthesis in E.coli is polyP kinase (PPK), which uses the gamma phosphate of ATP to make the polymer.
PolyPi can also be hydrolyzed to Pi either by exopolyhosphatases (PPX) or by endopolyphosphatases (PPN). The identification
and assignment of Pho-boxes was sometimes complicated by differences in genome annotation, as in the case of genes encoding
polyphosphate kinase (ppk) (Table 6). Here Pho boxes were predicted in the ppk promoter regions of 10 of the 12 genomes
examined. However, the predicted Pho-box from both M.loti (52) and C.crescentus (53) were located within the annotated gene
coding regions. Alignment of the Ppk amino acid sequence suggests that the actual start codons of the ppk genes in M.loti and
C.crescentus are downstream of the annotated start codons (data not shown). Our reporter gene fusion data showed that the
S.meliloti ppk gene was strongly induced member of the Pho regulon. However, most strikingly, the weight matrix did not
detect a ppk Pho box either from the E.coli K12 genome or the E.coli O157 genome, even at very low cut-off (0.18). In E.coli
there is genetic evidence demonstrated that polyphosphates accumulate upon Pi starvation and depend on PhoB, although the
E.coli ppk promoter has never been mapped. Therefore, it is likely that E.coli PhoB regulates ppk indirectly as suggested
elsewhere (54).
CONCLUSION
Several complementary approaches were integrated to investigate the cellular response to Pi starvation. As a first step,
computational identification of PhoB binding motifs predicted 96 potential Pho regulon members from the entire S.meliloti
genome. These were subsequently investigated by genetic screening of transcriptional reporter gene fusions and through
comparisons with recently available microarray data (31). It was found that 34 out of the 96 in silico predicted Pho regulon
members were regulated by Pi concentration in a PhoB dependent manner (Table 3). These 34 Pho regulon members were analyzed
in silico for conservation or co-occurrence across 12 genomes scanned (Tables 5 and 7). Nineteen of these 34 candidates were
also predicted as having upstream Pho-boxes in at least one of the other genomes scanned in this study. The in silico
analysis provided evidence for the conservation of a core Pho regulon in bacteria and suggests that these organisms share a
common response to Pi limitation. Such a conservation is not surprising as for example in both plants and yeast one of the
major responses to Pi-limitation is the induction of a high affinity Pi transport system and the induction of scavenging
enzymes, such as alkaline phosphatases.
Extending the Pho-box analysis to many more genomes should define the core group of genes that respond to Pi-starvation.
Further it will allow the identification of subgroups of genes, such as katA, whose expression is regulated by PhoB in some
organisms but not in others. Analysis of the distribution of such data may lead to the recognition of associations between
particular regulatory patterns and other phenotypic properties of the organisms.
SUPPLEMENTARY DATA
ACKNOWLEDGEMENTS
This work was supported with funding from the Natural Sciences and Engineering Research Council of Canada, from Genome
Canada through the Ontario Genomics Institute and from the Ontario Research and Development Challenge Fund to T.M.F. The
authors thank Dr Brain Golding for help in computing and he and Weilong Hao and Ying Fong for help and advice for the in
silico comparison analysis. Funding to pay the Open Access publication charges for this article was provided by NSERC and
Genome Canada.
Conflict of interest statement. None declared.
REFERENCES
McGuire, A.M. and Church, G.M. (2000) Predicting regulons and their cis-regulatory motifs by comparative genomics Nucleic
Acids Res, . 28, 4523–4530 .
Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., et
al. (2005) Assessing computational tools for the discovery of transcription factor binding sites Nat. Biotechnol, . 23, 137–
144 .
Thieffry, D., Salgado, H., Huerta, A.M., Collado-Vides, J. (1998) Prediction of transcriptional regulatory sites in the
complete genome sequence of Escherichia coli K12 Bioinformatics, 14, 391–400 .
Hertz, G.Z. and Stormo, G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of
multiple sequences Bioinformatics, 15, 563–577 .
Stormo, G.D. (2000) DNA binding sites: representation and discovery Bioinformatics, 16, 16–23[Abstract/Free Full Text] .
Fernandez De, Henestrosa,A.R., Ogi, T., Aoyagi, S., Chafin, D., Hayes, J.J., Ohmori, H., Woodgate, R. (2000)
Identification of additional genes belonging to the LexA regulon in Escherichia coli Mol. Microbiol, . 35, 1560–1572
Panina, E.M., Mironov, A.A., Gelfand, M.S. (2001) Comparative analysis of FUR regulons in gamma-proteobacteria Nucleic
Acids Res, . 29, 5195–5206 .
Tan, K., Moreno-Hagelsieb, G., Collado-Vides, J., Stormo, G.D. (2001) A comparative genomics approach to prediction of
new members of regulons Genome Res, . 11, 566–584 .
Baichoo, N., Wang, T., Ye, R., Helmann, J.D. (2002) Global analysis of the Bacillus subtilis Fur regulon and the iron
starvation stimulon Mol. Microbiol, . 45, 1613–1629 .
Dombrecht, B., Marchal, K., Vanderleyden, J., Michiels, J. (2002) Prediction and overview of the RpoN-regulon in closely
related species of the Rhizobiales Genome Biol, . 3, research0076.1–research0076.11 .
Gaballa, A., Wang, T., Ye, R.W., Helmann, J.D. (2002) Functional analysis of the Bacillus subtilis Zur regulon J.
Bacteriol, . 184, 6508–6514 .
Zheng, D., Constantinidou, C., Hobman, J.L., Minchin, S.D. (2004) Identification of the CRP regulon using in vitro and in
vivo transcriptional profiling Nucleic Acids Res, . 32, 5874–5893 .
Lee, T.Y., Makino, K., Shinagawa, H., Amemura, M., Nakata, A. (1989) Phosphate regulon in members of the family
Enterobacteriaceae: comparison of the PhoB-PhoR operons of Escherichia coli, Shigella dysenteriae, and Klebsiella pneumoniae
J. Bacteriol, . 171, 6593–6599
Wanner, B.L. (1993) Gene regulation by Phosphate in enteric bacteria J. Cell. Biochem, . 51, 47–54
Scholten, M., Janssen, R., Bogaarts, C., van Strien, J., Tommassen, J. (1995) The Pho regulon of Shigella flexneri Mol.
Microbiol, . 15, 247–254 .
Summers, M.L., Denton, M.C., McDermott, T.R. (1999) Genes coding for Phosphotransacetylase and acetate kinase in
Sinorhizobium meliloti are in an operon that is inducible by Phosphate stress and controlled by PhoB J. Bacteriol, . 181,
2217–2224
Makino, K., Amemura, M., Kawamoto, T., Kimura, S., Shinagawa, H., Nakata, A., Suzuki, M. (1996) DNA binding of PhoB and
its interaction with RNA polymerase J. Mol. Biol, . 259, 15–26
Kimura, S., Makino, K., Shinagawa, H., Amemura, M., Nakata, A. (1989) Regulation of the Phosphate regulon of Escherichia
coli: characterization of the promoter of the pstS gene Mol. Gen. Genet, . 215, 374–380
Wu, H., Kato, J., Kuroda, A., Ikeda, T., Takiguchi, N., Ohtake, H. (2000) Identification and characterization of two
chemotactic transducers for inorganic phosphate in Pseudomonas aeruginosa J. Bacteriol, . 182, 3400–3404
Rosenberg, H., Gerdes, R.G., Chegwidden, K. (1977) Two systems for the uptake of phosphate in Escherichia coli J.
Bacteriol, . 131, 505–511
Yuan, Z.C., Zaheer, R., Finan, T.M. (2005) Phosphate limitation induces catalase expression in Sinorhizobium meliloti,
Pseudomonas aeruginosa and Agrobacterium tumefaciens Mol. Microbiol, . 58, 877–894
Chao, T.C., Buhrmester, J., Hansmeier, N., Puhler, A., Weidner, S. (2005) Role of the regulatory gene rirA in the
transcriptional response of Sinorhizobium meliloti to iron limitation Appl. Environ. Microbiol, . 71, 5969–5982
Geiger, O., Rohrs, V., Weissenmayer, B., Finan, T.M., Thomas-Oates, J.E. (1999) The regulator gene phoB mediates
Phosphate stress-controlled synthesis of the membrane lipid diacylglyceryl-N,N,N-trimethylhomoserine in Rhizobium
(Sinorhizobium) meliloti Mol. Microbiol, . 32, 63–73 .
Lopez-Lara, I.M., Sohlenkamp, C., Geiger, O. (2003) Membrane lipids in plant-associated bacteria: their biosyntheses and
possible functions Mol. Plant Microbe Interact, . 16, 567–579
Klug, R.M. and Benning, C. (2001) Two enzymes of diacylglyceryl-O-4'-(N,N,N,-trimethyl) homoserine biosynthesis are
encoded by btaA and btaB in the purple bacterium Rhodobacter sphaeroides Proc. Natl Acad. Sci. USA, 98, 5910–5915 .
Lopez-Lara, I., M., Gao, J.L., Soto, M.J., Solares-Perez, A., Weissenmayer, B., Sohlenkamp, C., Verroios, G.P., Thomas-
Oates, J., Geiger, O. (2005) Phosphorus-free membrane lipids of Sinorhizobium meliloti are not required for the symbiosis
with alfalfa but contribute to increased cell yields under phosphorus-limiting conditions of growth Mol. Plant Microbe.
Interact, . 18, 973–982
Kaneko, T., Nakamura, Y., Sato, S., Asamizu, E., Kato, T., Sasamoto, S., Watanabe, A., Idesawa, K., Ishikawa, A.,
Kawashima, K., et al. (2000) Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti DNA Res,
. 7, 331–338 .
Nierman, W.C., Feldblyum, T.V., Laub, M.T., Paulsen, I.T., Nelson, K.E., Eisen, J.A., Heidelberg, J.F., Alley, M.R.,
Ohta, N., Maddock, J.R., et al. (2001) Complete genome sequence of Caulobacter crescentus Proc. Natl Acad. Sci. USA, 98, 4136
–4141 .
Kornberg, A., Rao, N.N., Ault-Riche, D. (1999) Inorganic polyphosphate: a molecule of many functions Annu. Rev. Biochem,
. 68, 89–125(Ze-Chun Yuan, Rahat Zahee)