Genometric analyses of the organization of circular chromoso

Đinh Văn Khương

Senior Member
Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication

Lionel Guy and Claude-Alain H. Roten,

Département de Microbiologie Fondamentale, Faculté de Biologie et de Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland

Received 24 March 2004; revised 8 June 2004; accepted 29 June 2004. Received by A. Bernardi. Available online 20 August 2004.




Abstract
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes.

Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific.

Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.

Keywords: Origin and terminus of replication; Gene orientation; Genometrics; Skews; RNA genes

Abbreviations: bp, base pair(s); kb, kilobase; mtDNA, mitochondrial DNA; A, adenine; C, cytosine; G, guanine; T, thymine; rRNA, ribosomal RNA; tRNA, transfer RNA


1. Introduction
The Watson–Crick model for duplex DNA (Watson and Crick, 1953) explained the base-pairing rule of Chargaff (1950). An unexpected observation, first reported for the Bacillus subtilis chromosome (Karkas et al., 1968), revealed that the equalities adenine (A) = thymine (T) and guanine (G) = cytosine (C) reported by Chargaff were approximatively true at the level of each of the strands of a double-stranded chromosome. This rule, termed parity rule 2 (PR2; Sueoka, 1995) or Chargaff's second parity rule (Forsdyke and Mortimer, 2000), is valid for all so far sequenced chromosomes of eukaryotes, prokaryotes, and organelles, with the exception of a subset of mitochondrial genomes (see www.unil.ch/comparativegenometrics/).

Nevertheless, local nucleotide frequencies deviate from PR2 (Lobry, 1996a) as routinely calculated by algorithmic analyses of numerous genome sequences (Roten et al., 2002). Indeed, in most prokaryotic genomes, the leading strand, synthesized continuously during chromosome replication, is enriched in Gs relatively to Cs, while the lagging one, polymerized discontinuously, is C-rich and G-poor (Tillier and Collins, 2000). In proteobacteria and high-G+C Gram-positives, the leading strand is enriched in Ts and depleted in As, while in low-G+C Gram-positives, the leading strand is A-rich and T-poor (see www.unil.ch/comparativegenometrics/). Because the origin and the terminus of DNA replication generally correspond to boundaries of regions with different nucleotide biases, it was deduced that the latter are related directly or indirectly to chromosome synthesis (Lobry, 1996a).

A majority of prokaryotic ORFs are encoded on the leading strand and are therefore transcribed in the same direction as the chromosome replication (McLean et al., 1998), a property termed the coorientation rule. Thus, biases in nucleotide composition of coding sequences would lead to PR2 biases if one considers only one arm of the chromosome or the other one. The nucleotide composition of ORFs is affected by several causes. First, preferential use of some nucleotides at the different codon positions bias the codon usage (Trifonov, 1987). Second, purine loading of mRNAs is also proposed as a possible source of biases; to avoid the formation of unwanted secondary structures, messenger RNAs generally are enriched in purines (Szybalski et al., 1966 and Lao and Forsdyke, 2000). The reciprocal relationship between purine loading and G+C content influences the codon usage (Lao and Forsdyke, 2000). Cytosine to thymine deaminations are likely to be another important driving force. Such deaminations, known to occur on single rather than on double-stranded DNA (Frederico et al., 1990), would preferably affect the leading strand, because during gene transcription and chromosome replication it remains temporarily single-stranded. First, while during transcription, the nascent RNA is polymerized on the complementary sequence, leaving the coding sequence temporarily unpaired, and because coding sequences are mostly on the leading strand, the latter would be the main target for cytosine deamination (Francino and Ochman, 1997). Second, inherent to DNA replication, the leading strand replicated by the assembly of Okazaki fragments would, independently of transcription, be the deamination-prone configuration (Lobry, 1996a). These models could, at least partly, account for the observed enrichment of the leading strand in Gs and, to a lesser extent, in Ts, particularly pronounced on nucleotides of the third codon position (Tillier and Collins, 2000).

Although ever increasing, the number of whole genome sequences is nevertheless still modest compared to the number of presently identified species of living organisms. Ribosomal RNA genes which due to their widespread use in taxonomy are sequenced in many prokaryotes, as well as eukaryotes and their organelles, are good candidates to uncover rules common to genomes of cells and organelles. Previous studies on prokaryotes revealed that rRNA genes, and especially unpaired regions of these genes, had an excess of purines (Gutell et al., 2000). Other authors confirmed the same observation and moreover found that the G+C content of the paired regions was higher than 0.5, especially in thermophiles (Wang and Hickey, 2002). Both studies confirmed early experimental results obtained by Elson and Chargaff (1955) on the RNA pellet obtained after ultracentrifugation. We analyzed nucleotide skews on complete ribosomal and transfer RNA genes of the broadest possible taxonomic range. Such analyses, restricted to given genes as illustrated by Rocha (2002), will be designated as intragenic nucleotide skews.

Almost all of the organisms to be examined replicate their chromosomes bidirectionally. However, for eukaryotes, with the exception of Saccharomyces cerevisiae (Yamashita et al., 1997), origins of replication are not precisely located. Mitochondria offer a complex picture with at least three modes of DNA replication. In plants, mitochondrial DNA (mtDNA) is believed to be replicated in the rolling circle mode (Backert et al., 1996), in yeast, the replication of the organelle genome is most likely bidirectional (Lecrenier and Foury, 2000), whereas mammalian mitochondrial genomes seem to have a special mode of replication (Lecrenier and Foury, 2000). Finally, for plastids, two face-to-face origins of replication initiate each the replication of one of the DNA strands. Once the chromosome is replicated, both DNA polymerases continue to function according to the rolling circle mechanism (Kolodner and Tewari, 1975).

In this contribution, we show that intragenic nucleotide skews of ribosomal RNA genes reveal a stringent coorientation of DNA replication and rRNA genes transcription, which therefore represent a reliable marker for the localization of the origin of replication in various species. In addition, rRNA genes have a specific nucleotide composition, suggesting that the structural constraints inherent to rRNA and tRNA functions generate counter-selective pressures reducing the effect of steady cytosine deaminations. Finally, we discuss the possible application of genometric methods to phylogeny.

2. Materials and methods
2.1. Sequences
For coorientation analyses, ribosomal RNA sequences of fully sequenced organisms were obtained from NCBI (http://www.ncbi.nlm.nih.gov/; Wheeler et al., 2001). When necessary, they were completed with TIGR sequences (http://www.tigr.org/; Peterson et al., 2001). To avoid possible biases, tRNA sequence analyses were performed only on fully sequenced genomes (61), with 28 or more identified tRNA genes. These sequences come mainly from the database of the tRNAscan-SE Search Server (http://www.genetics.wustl.edu/eddy/tRNAscan-SE/; Lowe and Eddy, 1997). When necessary, they were completed with NCBI (http://www.ncbi.nlm.nih.gov/; Wheeler et al., 2001) or TIGR sequences (http://www.tigr.org/; Peterson et al., 2001).

For extensive nucleotide composition analyses, rRNA sequences of small (12,420 sequences) and large subunits (1322 sequences) were obtained from rRNA WWW Server (http://rrna.uia.ac.be/; De Rijk et al., 2000). 5S rRNA sequences (857 sequences) were downloaded from the relevant database (http://biobases.ibch.poznan.pl/5SData/; Szymanski et al., 2002). For tRNA sequences of bacteria and archaea, the same set as in the coorientation analysis was used. For mitochondria, only a limited subset composed of five representative Deuterostomia and nine other mitochondria from representative organisms was selected from NCBI (http://www.ncbi.nlm.nih.gov/; Wheeler et al., 2001). For tRNAs of plastids, another representative subset of plastid genomes was analyzed with tRNAscan-SE 1.21 (Lowe and Eddy, 1997). The list of both representative subsets is available at the address http://www.unil.ch/comparativegenometrics/guy_and_roten_2004/ReprMitoChloro.txt.

2.2. Coorientation and cumulative nucleotide skew analyses
To determine the coorientation index between structural RNA genes and chromosome replication, origins of replication of 100 prokaryotic chromosomes were identified by cumulative nucleotide skew analysis. At a given position p, the cumulative nucleotide skew Cskαβ(p), measuring the deviation between nucleotides α and β, is calculated from α and β nucleotide numbers found between positions 1 and p, i.e. nbα(p) and nbβ(p), respectively:

Cskα⁢β(p)=nbα(p)−nbβ(p) (1)


As illustrated by figures posted on the Comparative Genometrics website (Roten et al., 2002), each cumulative nucleotide skew is represented by a curve drawn by plotting the nucleotide position p on the chromosome vs. Cskαβ(p).

For cumulative GC or TA skew analyses, α and β are replaced by G and C or T and A, respectively. A cumulative ORF orientation skew Cskdi(p) is calculated as in (1) by replacing nbα(p) or nbβ(p) by the number of nucleotides contained in ORFs encoded on the genome sequence, nbd(p), or by that of the complementary genome sequence, nbi(p).

As recently shown (Lobry, 1996a, Grigoriev, 1998 and Tillier and Collins, 2000), the extrema of a cumulative GC skew curve frequently correspond to the origin and terminus of chromosome replication. In most of the analyzed genomes, the origin corresponds to the minimum of the cumulative GC curve. In some organisms where this type of analysis does not provide a clear result, a cumulative TA skew analysis was performed. In this case, the origin of replication corresponds either to the minimum or to the maximum of the obtained curve. To determine which of these extrema corresponds to the origin of replication, a cumulative ORF orientation skew analysis can be performed. This analysis has been shown to be strongly correlated with the cumulative GC skew analysis when an origin could be defined by it (Tillier and Collins, 2000). It is less precise than a cumulative nucleotide skew curve, but its minimum always corresponds to the origin of replication.

Should the terminus location remain unidentified, we will assume that it is in the third of the chromosome facing the origin and restrict the rRNA genes coorientation analysis to the two-thirds of the chromosome flanking the origin.

Due to scarcity of experimental data on their origin of replication, eukaryotic nuclear- and organelle chromosomes were not subjected to coorientation analysis of rRNA genes. However, the latter was performed on deuterostomian mitochondria where validation based on experimental data was possible.

2.3. Intragenic skews
Considering two nucleotides α and β, an intragenic αβ skew of gene i, Iskαβ(i) is calculated with nbα(i) and nbβ(i), the number of nucleotides α and β in gene i, a definition previously used for a chromosome sliding window (Lobry, 1996b) or for ORFs (Rocha, 2002):

Iskα⁢β(i)=[nbα(i)−nbβ(i)]/[nbα(i)+nbβ(i)] (2)


The intragenic GC or TA skews of gene i are defined by replacing in (2) α and β by G and C or T and A, respectively.

Two other intragenic measures were calculated: the G+C content, i.e. the frequencies of Gs and Cs in a gene, and the purine loading corresponding to its G+A content.

Intragenic measures for small subunit (ssu) rRNAs are available at the address http://www.unil.ch/comparativegenometrics/guy_and_roten_2004/ssu_rRNAs.zip.

3. Results
3.1. Origin and terminus of DNA replication
Cumulative nucleotide skew analyses of 100 prokaryotic chromosomes, representing 80 species (16 archaea and 64 bacteria), reveal unambiguously the position of the origin and terminus sites in 88 and 77 chromosomes, respectively (supplementary material on the Comparative Genometrics Website at the address http://www.unil.ch/comparativegenometrics/guy_and_roten_2004/table1.htm). For 80 chromosomes, the origin of replication is located by cumulative GC skew analyses, while for the remaining eight chromosomes, cumulative TA and ORF orientation skew analyses has to be performed. For Streptomyces coelicolor A3(2), the origin of replication is located in the middle of the linear chromosome, and surprisingly corresponds to the maximum of the GC skew curve (Bentley et al., 2002).

Sequence homology analysis allows the identification in 70 bacterial chromosomes (59 species) of dnaA, a gene involved in the initiation of DNA replication (Messer, 2002). In 54 of the analyzed chromosomes (45 species), a strong correlation exists between the location of this gene and the origin of replication predicted by cumulative nucleotide skew analyses. The detection of the origin and terminus of replication by these criteria seems unmistakable in proteobacteria, Gram-positives, and spirochaetes (Table 1).

Table 1.

Average coorientation indexes of rRNA (23S, 16S, and 5S) and tRNA genes in major prokaryotic groupsa Taxon nb Coorientation indexes of rRNAs Coorientation indexes of tRNAs
Archaea
Euryarchaeota 9 0.94±0.11c 0.51±0.15
Bacteria 77 0.98±0.13 0.71±0.20
Alpha-proteobacteria 13 1±0 0.58±0.11
Beta-proteobacteria 3 0.97±0.05c 0.72±0.06
Gamma-proteobacteria 18 1±0 0.68±0.10
Firmicutes (low-G+C Gram-positive) 23 0.96±0.21d 0.90±0.10
Actinobacteria (high-G+C Gram-positive) 6 1±0 0.58±0.06
Chlamydiales 5 1±0 0.56±0.02
Spirochaetales 3 1±0 0.48±0.12
a A table with coorientation indexes for 100 chromosomes used in this contribution is available at the address http://www.unil.ch/comparativegenometrics/guy_and_roten_2004/table1.htm.
b Number of chromosomes considered in the group.
c All antioriented rRNA genes are 5S rRNA genes not included in a rRNA operon.
d In the case of Mycoplasma pneumoniae M129, the origin of replication found by the skew methods is very close to the only rRNA operon, considered as antioriented in our analysis. Discarding this case leads to a coorientation index of 1±0 for firmicutes.



Three different genometric methods—GC, TA, and ORF orientation skews—predict the origin of replication of nine archaeal chromosomes (Table 1). These, representing more than half of the presently available archaeal genome sequences, belong without exception to euryarchaeotes. Again, the predicted archaeal origins of replication are correlated with the position of orc1/cdc6, a recently uncovered gene functionary similar to the bacterial dnaA (Giraldo, 2003).

3.2. Coorientation of RNA genes transcription and DNA replication
Coorientation analysis is performed on prokaryotic chromosomes along which a putative origin of replication has been detected by nucleotide skew analyses. It clearly appears that rRNA genes are predominantly cooriented with DNA replication (Table 1). Moreover, inspection of supplementary material (http://www2.unil.ch/comparativegenometrics/guy_and_roten_2004/table1.htm) reveals that in four organisms (Methanopyrus kandleri AV19, Pyrococcus furiosus DSM 3638, Neisseria meningitidis MC58 ser. B and Helicobacter pylori 26695) a 5S rRNA gene, which do not belong to a ribosomal operon, is antioriented, while in M. pneumoniae M129, the locus of the antioriented rRNA operon is close to the predicted origin of replication. Therefore, its orientation with respect to DNA replication cannot be accurately predicted. In conclusion, it would appear that 16S and 23S rRNA genes are 100% cooriented with DNA replication.

In bacteria, tRNA genes exhibit a similar although less pronounced trend with an average 71±20% coorientation, while the situation in archaeal genomes—51±11% coorientation—is not conclusive. However, inspection of coorientation indexes (Table 1) reveals that in both bacteria and archaea, coorientation of tRNA is distributed in a taxon-dependent manner.

Coorientation analyses of eukaryotic rRNA genes are performed only on the chromosome of the human mitochondrion along which the origin of replication was localized experimentally. It appears that all of the rRNA genes and 14 out of 22 tRNA genes are antioriented and encoded by the light (L)-strand (see Fig. 3); that is, their orientation is opposite to that of DNA replication. It is most likely that this conclusion can be extended to the 198 so far sequenced deuterostomian mitochondria, because they all have near identical genetic maps. (http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/mztrna.cgi?tax=33208; Wolfsberg et al., 2001).

3.3. Nucleotidic composition of structural RNA genes
In this section, sequences of structural RNA genes are characterized by four parameters: intragenic GC and TA skews, G+C content, and purine loading.

3.3.1. Transfer RNA
The G+C content of tRNA genes is variable. Differences between taxons are considerable, closely resembling those observed for rRNA genes (see Section 3.3.2 and Fig. 2). Purine loading of tRNA genes is almost exactly 0.5 (Fig. 1), a figure corresponding to the highest number of freedom degrees (Forsdyke and Mortimer, 2000), a property intrinsically associated with the functions of tRNAs. Average intragenic GC and TA skews of tRNA genes of all major taxons are positive and strongly correlated (p<0.005).


(59K)

Fig. 1. Genometric patterns of RNA genes. Genometric measures (purine loading, G+C content, GC and TA skews) are represented in histograms. Four structural RNA gene types (5S rRNA, ssu rRNA, lsu rRNA, and tRNA) of each major taxon (Archaea, Bacteria, Plastida, other Mitochondria and deuterostomian mitochondria) are represented.



3.3.2. Ribosomal RNA
G+C content, purine loading, and intragenic skew analyses of the three rRNA species, i.e., 5S, small subunit (ssu) rRNA, and large subunit (lsu) rRNA, reveal a very similar taxonomic distribution (Fig. 1) and a highly similar behavior between ssu and lsu rRNA genes. In most organisms, intragenic nucleotide skews reveal an enrichment in Gs and As and a depletion in Cs and Ts. Deuterostomian mitochondria rRNA genes represent the only so far known example of a negative intragenic GC skew.

3.4. Phylogenetic analyses on intragenic nucleotide skews of ssu and lsu rRNA
To analyze mutational events on the widest possible scale, we focus on small and large ribosome subunits (1568±278 and 2243±848 nt, respectively), and abandon 5S rRNA genes (mean=119±4 nt). In particular, we devote our attention to ssu rRNA, an extensively used taxonomic marker, whose nucleotide sequences in databases are five times more abundant than those of lsu rRNA.

When intragenic ssu rRNA GC and TA skews are plotted one against the other, they reveal a distribution into clusters compatible with accepted taxons (Fig. 2). Due to negative intragenic GC and TA skews, the cluster corresponding to genes of the ssu rRNA of deuterostomian mitochondria appears to be well separated from rRNA genes of other organisms.


(45K)

Fig. 2. Intragenic GC skew vs. intragenic TA skew dot-plot for major taxonomic groups. Each organism is represented by a dot in a GC skew vs. TA skew plot. Selected taxons are Archaea (blue squares), Bacteria (orange circles), Plastida (green circles), other mitochondria (brown diamonds), deuterostomian mitochondria (yellow diamonds), and Eukaryota (grey triangles).



4. Discussion
Cumulative nucleotide skews—GC, TA, and ORF orientation skews—have enabled us to determine putative origins of replication in a large majority of published sequences of prokaryotic chromosomes. In many instances, these skew analyses also uncovered the locus of the terminus of replication. The latter was localized by genometric methods in the region facing the origin, an observation confirming earlier reports on near ubiquitous occurrence of bidirectional chromosome replication in prokaryotes (Grigoriev, 1998, McLean et al., 1998 and Tillier and Collins, 2000). The output of these analyses is supported by strong correlations between predictions obtained by each of the three nucleotide skew methods. In addition, the predicted location of the origin (Ori site) of replication was validated by its close proximity to dnaA and orc1/cdc6 in bacteria and archaea, respectively.

Relatively accurate knowledge of the origin of bidirectional DNA replication in the majority of sequenced prokaryotes offers the possibility to examine the orientation relative to DNA replication of different categories of genes, in particular the structural RNA genes. A clear-cut answer is obtained for bacterial as well as archaeal ssu and lsu rRNA genes, which are apparently all cooriented with DNA replication, a situation significantly reducing the possibility of collision between the DNA polymerase and a relatively frequent rRNA transcription (French, 1992). This rule, first established for Escherichia coli (Nomura et al., 1977 and Morgan et al., 1978), turns out to be generally valid for prokaryotes.

This observation, which is independent from nucleotide skew analyses, does not only represent a validation of skew-predicted positions of origin of replication, but it also offers an intrinsic and reliable tool for marking the boundaries of the regions, which encompass the origin and terminus of replication.

Almost all prokaryotes have a positive GC skew on their leading strands, i.e., they have more Gs than Cs on these strands. As discussed in the introduction, this may have several causes: a direct effect from the chromosome replication, or an indirect effect from the major coorientation of genes with DNA replication. The only so far known exception to this rule in prokaryotes is S. coelicolor (see http://www.unil.ch/comparativegenometrics/), that has an excess of Cs relative to Gs on its leading strand, that might be due to its extremely high G+C content (Bentley et al., 2002). Nevertheless, like in all other prokaryotes, the intragenic GC skew of S. coelicolor rRNAs—encoded on the leading strand—is positive, contrarily to what would be expected from the global nucleotide composition of its genome. At the opposite, deuterostomian mitochondria seem to be the only so far known exception to the coorientation rule, because the orientation of their structural RNA genes is opposite to that of DNA replication (Fig. 3). This observation can be related to the very small size of this mitochondrial genome [16.5 kilobase (kb)], which is only about 10 times longer than an Okazaki fragment. Therefore, because only a few Okazaki fragments can be synthesized during one replication cycle, a simultaneous replication of both strands would appear uneconomical, especially should the replication time be very long. In conclusion, the strand that remains single-stranded during replication and thus subject to cytosine deamination is not the one encoding the majority of proteins and structural ribosomal genes. Therefore, Lobry's-type mutational pressure would be exerted on the lagging strand and generate a negative GC skew on the leading strand.


(22K)

Fig. 3. Schematic representation of human mitochondrial genome replication. As shown in this paper, deuterostomia mitochondria and other mitochondria reveal very distinct patterns in genometric analyses. Deuterostomia is a taxonomic group which differs from other Metazoa by (i) the presence of a coeloma and (ii) its formation; unlike Protostomia where it is derived form a cleft in the mesoderm, in Deuterostomia the coeloma it is formed by evagination of the archenteron. From the extreme similarity of the genetic maps of the deuterostomian mitochondrial DNA (mtDNA), we suppose that the human mtDNA can be used as a model for all other deuterostomian ones. Mammalian mtDNA is about 16.5 kb long, circular, compact, and encodes 13 proteins involved in mitochondrial oxidative respiration and electron transport, 22 tRNAs, and 2 rRNAs. In contrast to other mitochondria, mammalian mtDNA do not have a 5S rRNA gene. Most of the genes (i.e., 12 proteins, 14 tRNAs and both rRNAs) are encoded on the light (L)-strand, the remaining genes being located on the heavy (H)-strand. The mtDNA strands were designated as heavy or light strand according to their buoyant density in a CsCl gradient. It appeared later that the H-strand was G-rich (more than two Gs for one C), and the L-strand depleted in Gs. The replication mode is particular; the H-strand is unidirectionaly replicated from the H-strand origin (OH) until the replication machinery reaches the L-strand origin (OL); next, the L-strand replication starts from OL. Because mtDNA replication lasts about two hours, a large part of the H-strand remains single-stranded longer than the L-strand. Thus, the H-strand is probably more prone to cytosine deamination during DNA replication than the L-strand. This figure is adapted from Lecrenier and Foury (2000).



Our observations suggest that coorientation indexes of tRNA genes (Table 1) are to some extent related to accepted prokaryotic taxons. To include as many organisms as possible, intragenic nucleotide skews, G+C content, and purine loading analyses were performed on structural small and large rRNA genes, which have been sequenced in 12,420 and 1322 organisms, respectively. Differences between major taxons were significant (Fig. 1) and plotting of GC vs. TA skews revealed several distinct domains consisting of species belonging to major taxons (Fig. 2). We believe that combinations of higher numbers of genometric parameters may substantially refine such pictures.

Transfer and ribosomal RNA genes, whose immediate products have to obey major structural constraints, seem to respond to mutational pressures in a rather specific way. As evidenced by coorientation analyses, structural RNAs are predominantly encoded on the leading strand and therefore experience cytosine deaminations leading to a positive GC skew. However, properties inherent to the structure and/or the function of structural RNAs seem to generate a lesser GC skew than the one observed by Lobry at the genome level (Lobry, 1996a) and, in rRNA genes, a negative TA skew—although with cytosine deaminations on the leading strand, one expects a positive TA skew. Indeed, mutations presumably responsible for the skew are less abundant in rRNAs and even more so in tRNAs. In spite of their different functions within the ribosome, 5S, ssu and lsu rRNA genes have a very similar skew pattern, an observation suggesting that all rRNA genes are subject to similar constraints and have similarly responded to pressures postulated in Lobry's model of cytosine deamination directly related to DNA replication. In deuterostomian mitochondria, the unusual negative GC skew resulting from cytosine deamination during replication shows that the effects of replication are probably stronger than those of transcription.

In conclusion, analysis of the orientation, relative to chromosome replication, of ribosomal RNA genes reveals an almost universal rule, the only so far exception to this rule being the deuterostomian mitochondria. Analysis of biases in nucleotide composition shows that the orientation of the rRNA genes can be predicted from their composition. This observation could be useful to locate origins of replication in eukaryotes. Moreover, our results suggest that nucleotide skews analyses are promising phylogenetic tools.


Acknowledgments

This contribution was presented by LG as MSc thesis at Lausanne University. We thank Jean-Luc Barblan for posting data on the Comparative Genometrics website, and Philippe Moreillon and Noboru Sueoka for constructive discussion. Finally, we warmly thank Jean Lobry and Dimitri Karamata for the critical reading.


References
Backert et al., 1996 S. Backert, P. Dorfel, R. Lurz and T. Borner, Rolling-circle replication of mitochondrial DNA in the higher plant Chenopodium album (L.), Mol. Cell. Biol. 16 (1996), pp. 6285–6294. Abstract-EMBASE | Abstract-Elsevier BIOBASE | Abstract-MEDLINE | $Order Document

Bentley et al., 2002 S.D. Bentley, K.F. Chater, A.M. Cerdeno-Tarraga, G.L. Challis, N.R. Thomson, K.D. James, D.E. Harris, M.A. Quail, H. Kieser, D. Harper, A. Bateman, S. Brown, G. Chandra, C.W. Chen, M. Collins, A. Cronin, A. Fraser, A. Goble, J. Hidalgo, T. Hornsby, S. Howarth, C.H. Huang, T. Kieser, L. Larke, L. Murphy, K. Oliver, S. O'Neil, E. Rabbinowitsch, M.A. Rajandream, K. Rutherford, S. Rutter, K. Seeger, D. Saunders, S. Sharp, R. Squares, S. Squares, K. Taylor, T. Warren, A. Wietzorrek, J. Woodward, B.G. Barrell, J. Parkhill and D.A. Hopwood, Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2), Nature 417 (2002), pp. 141–147. Abstract-MEDLINE | Abstract-Elsevier BIOBASE | Abstract-GEOBASE | Abstract-EMBASE | $Order Document | Full Text via CrossRef

Chargaff, 1950 E. Chargaff, Chemical specificity of nucleic acids and mechanism of their enzymatic degradation, Experientia 6 (1950), pp. 201–209.

De Rijk et al., 2000 P. De Rijk, J. Wuyts, Y. Van de Peer, T. Winkelmans and R. De Wachter, The European large subunit ribosomal RNA database, Nucleic Acids Res. 28 (2000), pp. 177–178. Abstract-MEDLINE | Abstract-EMBASE | Abstract-Elsevier BIOBASE | $Order Document | Full Text via CrossRef

Elson and Chargaff, 1955 D. Elson and E. Chargaff, Evidence of common regularities in the composition of pentose nucleic acids, Biochim. Biophys. Acta 17 (1955), pp. 367–376.

Forsdyke and Mortimer, 2000 D.R. Forsdyke and J.R. Mortimer, Chargaff's legacy, Gene 261 (2000), pp. 127–137. SummaryPlus | Full Text + Links | PDF (312 K)

Francino and Ochman, 1997 M.P. Francino and H. Ochman, Strand asymmetries in DNA evolution, Trends Genet. 13 (1997), pp. 240–245. Abstract | PDF (613 K)

Frederico et al., 1990 L.A. Frederico, T.A. Kunkel and B.R. Shaw, A sensitive genetic assay for the detection of cytosine deamination—determination of rate constants and the activation-energy, Biochemistry 29 (1990), pp. 2532–2537. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

French, 1992 S. French, Consequences of replication fork movement through transcription units in vivo, Science 258 (1992), pp. 1362–1365. Abstract-EMBASE | Abstract-INSPEC | Abstract-MEDLINE | $Order Document

Giraldo, 2003 R. Giraldo, Common domains in the initiators of DNA replication in Bacteria, Archaea and Eukarya: combined structural, functional and phylogenetic perspectives, FEMS Microbiol. Rev. 26 (2003), pp. 533–554. SummaryPlus | Full Text + Links | PDF (1958 K)

Grigoriev, 1998 A. Grigoriev, Analyzing genomes with cumulative skew diagrams, Nucleic Acids Res. 26 (1998), pp. 2286–2290. Abstract-EMBASE | Abstract-Elsevier BIOBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef

Gutell et al., 2000 R.R. Gutell, J.J. Cannone, Z. Shang, Y. Du and M.J. Serra, A story: unpaired adenosine bases in ribosomal RNAs, J. Mol. Biol. 304 (2000), pp. 335–354. SummaryPlus | Full Text + Links | PDF (814 K)

Karkas et al., 1968 J.D. Karkas, R. Rudner and E. Chargaff, Separation of B. subtilis DNA into complementary strands: II. Template functions and composition as determined by transcription with RNA polymerase, Proc. Natl. Acad. Sci. U. S. A. 60 (1968), pp. 915–920. Abstract-MEDLINE | $Order Document

Kolodner and Tewari, 1975 R.D. Kolodner and K.K. Tewari, Chloroplast DNA from higher-plants replicates by both cairns and rolling circle mechanism, Nature 256 (1975), pp. 708–711. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Lao and Forsdyke, 2000 P.J. Lao and D.R. Forsdyke, Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine, Genome Res. 10 (2000), pp. 228–236. Abstract-Elsevier BIOBASE | Abstract-EMBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef

Lecrenier and Foury, 2000 N. Lecrenier and F. Foury, New features of mitochondrial DNA replication system in yeast and man, Gene 246 (2000), pp. 37–48. SummaryPlus | Full Text + Links | PDF (1232 K)

Lobry, 1996a J.R. Lobry, Asymmetric substitution patterns in the two DNA strands of bacteria, Mol. Biol. Evol. 13 (1996), pp. 660–665. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Lobry, 1996b J.R. Lobry, Origin of replication of Mycoplasma genitalium, Science 272 (1996), pp. 745–746. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Lowe and Eddy, 1997 T.M. Lowe and S.R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence, Nucleic Acids Res. 25 (1997), pp. 955–964. Abstract-MEDLINE | Abstract-EMBASE | $Order Document | Full Text via CrossRef

McLean et al., 1998 M.J. McLean, K.H. Wolfe and K.M. Devine, Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes, J. Mol. Evol. 47 (1998), pp. 691–696. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Messer, 2002 W. Messer, The bacterial replication initiator DnaA. DnaA and oriC, the bacterial mode to initiate DNA replication, FEMS Microbiol. Rev. 26 (2002), pp. 355–374. SummaryPlus | Full Text + Links | PDF (614 K)

Morgan et al., 1978 E.A. Morgan, T. Ikemura, L. Lindahl, A.M. Fallon and M. Nomura, Some ribosomal-RNA operons in Escherichia coli have transfer-RNA genes at their distal ends, Cell 13 (1978), pp. 335–344. Abstract | Full Text + Links | PDF (2657 K)

Nomura et al., 1977 M. Nomura, E.A. Morgan and S.R. Jaskunas, Genetics of bacterial ribosomes, Annu. Rev. Genet. 11 (1977), pp. 297–347. Abstract-MEDLINE | $Order Document

Peterson et al., 2001 J.D. Peterson, L.A. Umayam, T. Dickinson, E.K. Hickey and O. White, The comprehensive microbial resource, Nucleic Acids Res. 29 (2001), pp. 123–125. Abstract-MEDLINE | Abstract-Elsevier BIOBASE | Abstract-EMBASE | $Order Document | Full Text via CrossRef

Rocha, 2002 E.P.C. Rocha, Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes?, Trends Microbiol. 10 (2002), pp. 393–395. Abstract | Full Text + Links | PDF (35 K)

Roten et al., 2002 C.A. Roten, P. Gamba, J.L. Barblan and D. Karamata, Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes, Nucleic Acids Res. 30 (2002), pp. 142–144. Abstract-MEDLINE | Abstract-EMBASE | Abstract-Elsevier BIOBASE | $Order Document | Full Text via CrossRef

Sueoka, 1995 N. Sueoka, Intrastrand parity rules of DNA base composition and usage biases of synonymous codons, J. Mol. Evol. 40 (1995), pp. 318–325. Abstract-MEDLINE | Abstract-EMBASE | $Order Document

Szybalski et al., 1966 W. Szybalski, H. Kubinski and O. Sheldrick, Pyrimidine clusters on the transcribing strand of DNA and their possible role in the initiation of RNA synthesis, Cold Spring Harbor Symp. Quant. Biol. 31 (1966), pp. 123–127. Abstract-MEDLINE | $Order Document

Szymanski et al., 2002 M. Szymanski, M.Z. Barciszewska, V.A. Erdmann and J. Barciszewski, 5S ribosomal RNA database, Nucleic Acids Res. 30 (2002), pp. 176–178. Abstract-EMBASE | Abstract-Elsevier BIOBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef

Tillier and Collins, 2000 E.R.M. Tillier and R.A. Collins, The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes, J. Mol. Evol. 50 (2000), pp. 249–257. Abstract-Elsevier BIOBASE | Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Trifonov, 1987 E.N. Trifonov, Translation framing code and frame-monitoring mechanism as suggested by the analysis of messenger-RNA and 16S ribosomal RNA nucleotide sequences, J. Mol. Biol. 194 (1987), pp. 643–652. Abstract-EMBASE | Abstract-MEDLINE | $Order Document

Wang and Hickey, 2002 H.C. Wang and D.A. Hickey, Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes, Nucleic Acids Res. 30 (2002), pp. 2501–2507. Abstract-EMBASE | Abstract-Elsevier BIOBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef

Watson and Crick, 1953 J.D. Watson and F.H.C. Crick, A structure for deoxyribose nucleic acid, Nature 171 (1953), pp. 737–738.

Wheeler et al., 2001 D.L. Wheeler, D.M. Church, A.E. Lash, D.D. Leipe, T.L. Madden, J.U. Pontius, G.D. Schuler, L.M. Schriml, T.A. Tatusova, L. Wagner and B.A. Rapp, Database resources of the National Center for Biotechnology Information, Nucleic Acids Res. 29 (2001), pp. 11–16. Abstract-MEDLINE | Abstract-Elsevier BIOBASE | Abstract-EMBASE | $Order Document | Full Text via CrossRef

Wolfsberg et al., 2001 T.G. Wolfsberg, S. Schafer, R.L. Tatusov and T.A. Tatusova, Organelle genome resources at NCBI, Trends Biochem. Sci. 26 (2001), pp. 199–203. Abstract | Full Text + Links | PDF (378 K)

Yamashita et al., 1997 M. Yamashita, Y. Hori, T. Shinomiya, C. Obuse, T. Tsurimoto, H. Yoshikawa and K. Shirahige, The efficiency and timing of initiation of replication of multiple replicons of Saccharomyces cerevisiae chromosome VI, Genes Cells 2 (1997), pp. 655–665. Abstract-MEDLINE | $Order Document | Full Text via CrossRef
 

Facebook

Thống kê diễn đàn

Threads
11,649
Messages
71,548
Members
56,922
Latest member
188bettone
Back
Top