FUNCTIONAL ROLE OF BACTERIOPHAGE TRANSFER RNAS : CODON USAGE ANALYSIS OF GENOMIC SEQUENCES STORED IN THE GENBANK / EMBL / DDBJ DATABASES

Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The dataanalysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.


INTRODUCTION
Recognizing the importance of establishing an international database system that can be freely accessible by any researcher, molecular biologists have created a public database system (GenBank/EMBL/DDBJ) through the collaborative efforts of many workers.In this system, a researcher who has experimentally determined a nucleic acid sequence of DNA sends the data to the international database so that other researchers can make use of his data.At the same time the contributing researcher can compare his sequence data with those already stored in the database, making further development of his research possible.Effective use of such an open data-sharing system could lead the researcher to novel insights into fundamental problems that might not otherwise be gained from the researcher's own original data alone.In this report the author would like to introduce such an attempt.
Bacteriophages utilize their host's cellular translational machinery in the synthesis of their own proteins.So far there are no phage genomes known to encode ribosomal proteins, translation factors, or aminoacyl-tRNA synthetases.The presence of transfer RNA-specifying genes in the phage T4 genome, however, was reported about thirty years ago (McClain, Guthrie, & Barrell, 1972;Broida & Abelson, 1985).Since then, the functional role of the phage-encoded tRNAs has attracted some interest.One suggestion is that the presence of tRNA genes in phage genomes may provide a suitable phage integration sequence by homologous recombination (reviewed in Cheethman & Katz, 1995).Another is that phage tRNAs may boost cellular tRNA populations to enhance translational efficiency (Kan, Kano-Sueoka, & Sueoka, 1968;Daniel, Sarid, & Littauer, 1970;Scherberg & Weiss, 1972).The latter idea was put forward following the discovery of phage-encoded tRNAs but this has not been verified.In 1992, the advantage of a phage having its own tRNAs was examined in the light of the synonymous codon-usage data of phage T4 and its host Escherichia coli; the phage tRNA species could serve to supplement the host's isoacceptor tRNA species that are present in minor cellular amounts and recognize those synonymous codons that are rare in the host genes but are frequent in the phage genes (Kunisawa, 1992).At that time, T4 presented a unique example in which complete data on phage-encoded tRNAs were available.
Currently, however, more than 90 phage genomic sequences are stored in the GenBank/ EMBL/DDBJ databases (GenBank, n.d.; EMBL, n.d.; DDBJ, n.d.).In this report phage genomes encoding tRNA genes are searched for in the databases, and their significance in translation is examined using codon usage analysis.This is to obtain more comprehensive insights into the hostphage translation system.Through this project, the author would like to emphasize the importance of open and fully accessible database system.

PHAGE-ENCODED TRANSFER RNAS
Table 1 summarizes results of a database search for bacteriophage genomes that encode tRNA genes.All the phage genomes and their host genomes listed here have been completely sequenced.Gene orders of phage tRNAs are shown in the direction of 5' to 3' in Table 1.Two general points can be made.Phage tRNA genes almost always exist in a cluster on the genome and they are not sufficient for the translation of the 61 sense codons.Coliphage T4 harbors eight tRNA species, the anticodons of which are also shown in the table within the square brackets (Broida & Abelson, 1985).The Streptomyces phage φC31 (Smith, Burns, Wilson & Gregory, 1999) harbors a Thr-tRNA with the anticodon ACG.The genome of the Pseudomonas phage D3 encodes four tRNA genes for Met, Gly, Asn, and Thr (Kropinski & Sibbald, 1999).From sequence similarity, the Haemophilus influenzae phage HP1's genome (Esposito, Fitzmaurice, Benjamin, Goodman, Waldman & Scocca, 1996) is reported to code for Lys-and Leu-tRNA genes, although the latter Leu-tRNA is a pseudogene (Kropinski & Sibbald, 1999).The virulent mycobacteriophage D29 genome encodes five tRNA genes for Asn, Trp, Gln, Glu, and Tyr in a cluster (Ford, Sarkis, Belanger, Hendrix & Hatfull, 1998), while its close relative L5 lacks the last two tRNAs in the D29 cluster (Hatfull & Sarkis, 1993).Phage 933W and its close relative VT2-Sa, both of which infect enterohemorrphagic Escherichia coli O157:H7 (Perna, Plunkett III, Burland, Mau, Glasner, Rose, et al., 2001), harbor one tRNA for Ile and two isoacceptor tRNAs for Arg (Plunkett III, Rose, Durfee & Blattner, 1999;Miyamoto, Nakai, Yajima, Fujibayashi, Higuchi, Sato et al., 1999).
Table 1.Bacteriophage tRNA genes found in completely sequenced genomes.
In addition to these, the coliphage T5, for which the complete genomic sequence is not known, is reported to harbor at least 24 tRNAs (McCordquodale & Warner, 1988).Furthermore, the Vibrio eltol phage e4 genome is known to encode 5 tRNAs in a cluster (Chattopadhyay & Ghosh, 1988).However, their sequence data are not available.

TRNA USAGE AND SYNONYMOUS CODON USAGE OF T4
In Table 2, the anticodon species of tRNAs from E. coli and from phage T4 are listed, together with codon usage data.The cellular amounts of host E. coli tRNAs relative to that of Leu-tRNA with anticodon CAG are indicated within the square brackets (Ikemura, 1985;1992).The gene copy number of respective tRNAs is also tabulated.As a first approximation, the relative cellular amount of tRNA is proportional to the gene copy number (see Fig. 3, Ikemura 1992).It is interesting to note that in Table 2 all the eight anticodon species of T4 tRNAs can be found in the host E. coli's tRNAs, and the phage tRNAs do not carry any novel anticodon species.In E. coli and in most other bacteria, there are multiple isoacceptor tRNAs for the eight types of amino acids, Gln, Leu, Gly, Pro, Ser, Thr, Ile and Arg that can be charged by T4 tRNAs.It is remarkable to observe that phage tRNAs do not correspond in every case to the major E. coli tRNA isoacceptors present in larger cellular amounts.Rather, they correspond to the minor isoacceptors.An extreme case is seen with Ile; the cellular amount of the minor isoacceptor having the CAU anticodon (the base C at the first anticodon position may be modified to cytidine; for details of anticodon modification, see Osawa (1995)) is only 0.05 compared to that of the major isoacceptors with the GAU anticodon, and the phage's Ile-tRNA corresponds to the minor one.Thus, phage tRNAs seem to supplement host tRNAs that are present in small amounts.

Table 2. (Continued)
In Table 2 the codon usage of T4's protein-coding sequences are compared with that of its host E. coli's genes (Blattner, Plunkett III, Bloch, Perna, Burland, Riley et al., 1997).The occurrences per thousand codons are shown in the table.The amino acid frequencies are shown in italic for an easy comparison of amino acid usage.It is worth noting that the frequency of synonymous codons read by phage tRNAs is always higher in phage genes than in host genes.For example, the frequency of two Arg codons, AGA and AGG, both of which are read by the phage tRNA with anticodon UCU (wobble rules, see Osawa (1995)), is 13.1 per thousand codons for phage genes, while only 3.3 for host genes.Similarly, the frequency of AUA codon for Ile in phage genes is about three times as high as that in host genes.From these observations, we can hypothesize that phage tRNAs could serve to supplement host tRNAs present in minor amounts and thus enhance the efficiency of translation of phage genes (Kunisawa, 1992).
Based on the synonymous codon usage patterns of E. coli genes, Ikemura (1985;1992)  a correlation between heavily used synonymous codons and isoacceptor tRNA species that are abundant in the cell.Thus, the synonymous codon choice is constrained by the tRNA availability and seems to be designed to optimize the translational efficiency of E. coli genes, in particular its highly expressed genes.Therefore, foreign genes such as phage genes are not translated efficiently unless their synonymous codon choice is similar to that of the highly expressed genes of E. coli.The synonymous codon usage pattern of T4 genes, however, is clearly different from that of E. coli, as can be seen in Figure 1 (Kunisawa, Kanaya & Kutter, 1998).In this figure, for each of the 4290 E. coli (shown in gray) or 271 T4 (in red) protein-coding genes, the average G+C content, fgc, is plotted against the fraction, fop, of optimal codons, which are thought to be translationally optimal (Ikemura, 1985;1992).The fraction of optimal codons in each gene is known to serve as a surrogate measure of the gene expression level.The optimal codon species defined by Ikemura (1985;1992) are underlined in Table 2.As can be seen in Figure 1, the G+C contents of T4 genes are much lower than those of E. coli genes.Table 2 shows a tendency for T4 genes to frequently use codons ending in U or A owing to the low G+C content, while E. coli genes tend to use codons that are recognized by the abundant tRNA species and thus often use codons ending in G or C rather than codons ending in U or A. Figure 1 also indicates that the majority of T4 genes exhibit low fop values centered around 0.4, suggesting their lower translational efficiency.Thus, it is reasonable to speculate that T4 supplies its own tRNAs to the host tRNA populations that are present in minor amounts for more efficient production of phage proteins.
Deletions of all eight T4 encoded tRNAs have shown a decreased burst size and a decreased rate of synthesis for several proteins when grown on laboratory host strains, suggesting the existence of a selective pressure to maintain the T4 tRNA genes (Wilson, 1973).In accordance with this reasoning, a T4 mutant that grows normally on most laboratory strains but is unable to grow on a specific strain of E. coli (CT439) has been isolated (Guthrie & McClain, 1973).On the other hand, the tRNA gene repertoire appears to be variable among phages T2, T6 and RB69, which are closely related to T4 (Moen, Seidman, & McClain, 1978).The genome of RB69 has recently been completely sequenced (Karam & Krisch, 2002), and comparisons of codon usage among these T4 related phages would be very interesting future studies.

MYCOBACTERIOPHAGE TRNAS
As listed in Table 1, mycobacteriophage D29 harbors 5 tRNAs for Asn, Trp, Gln, Glu, Tyr (Ford et al., 1998).The presence of Asn-, Trp-and Tyr-tRNAs is somewhat surprising, as there is a single anticodon species for each of these D29 tRNAs, in contrast to the T4 tRNAs case.Therefore, the supply of isoacceptor tRNAs seems improbable in the case of D29 tRNAs.The host bacterium, M. tuberculosis, encodes tRNA genes in a single copy on the genome except for the Met-tRNA genes (Cole, Brosch, Parkhill, Garnier, Churcher, Harris et al., 1998).This is in sharp contrast to tRNA genes in E. coli where the tRNA-gene copy number varies from one to six for Lys-tRNA (see Table 2), and the cellular content of tRNA molecules is approximately proportional to its gene copy number (Ikemura, 1992).Therefore, constraints on synonymous codon usage from the tRNA availability may not be significant in M. tuberculosis, and the genomic G+C content seems a principal factor in determining synonymous codon preference.However, the host and phage show almost identically high genomic G+C contents, 65.6% for the host and 63.5% for phage D29.Thus, synonymous codon usage patterns are also similar, as can be seen in Table 2. Despite the similarity in synonymous codon choice, Figure 2 shows that the frequency of amino acid occurrences slightly increases in D29 for all five amino acid species, Asn, Gln, Glu, Tyr and Trp, which correspond to D29 tRNAs.In this figure, amino acid usage is compared between the host (shown in gray) and phage (in green), where a red asterisk indicates if phage tRNA exists.The fraction of Asn, for instance, increases to 32.5 per thousand amino acids in D29 from 25.3 in M. tuberculosis.In the E. coli -T4 system, corresponding increases in amino acid usage cannot be recognized; conversely, the fraction of amino acid usage for Arg, Lys, Gly, Pro, and Gln decreases in T4 (see Table 2).These observations suggest that D29 tRNAs may serve to tune host tRNA populations to the amino acid usage of phage proteins (Kunisawa, 2000).However, this tuning seems imperfect, since the frequency of Lys increases from 20.3 in the host to 45.3 in D29 but there is no phage tRNAs for this kind of amino acid.Furthermore, the phage tRNAs may not be required for growth, since another mycobacteriophage L5, which is closely related to D29, lacks the last two tRNAs in the D29 gene cluster.
A comparison of codon usage between the mycobacteriophage D29 and Mycobacterium leprae was also carried out, since D29 may also infect M. leprae, which has a lower genomic G+C content of 57.8%.As with M. tuberculosis, our analysis showed an almost identical codon (amino acid) usage in M. leprae.The host range of mycobacteriophage D29 is broad and this phage is known to grow on both fast-and slow-growing mycobacteria.Although available data are currently confined to M. tuberculosis and M. leprae, future comparisons of codon usage data among a broad range of host bacteria would provide a new horizon for phage evolution.

CODON USAGE OF D3, φC31, HP1 AND 933W/VT2-Sa
As listed in Table 1, the Pseudomonas phage D3 harbors four tRNAs for Met, Gly, Asn, and Thr (Kropinski & Sibbald, 1999).The genomic G+C content of the host bacterium Pseudomonas aeruginosa (Stover, Pahm, Erwin, Mizoguchi, Warrener, Hickey et al., 2000) is 66.6%, while the phage genome is 57.8%.Although the difference in G+C content is not large, codon usage shows that the occurrences of either synonymous codons or amino acids corresponding to the phage's tRNAs are greater in the phage genes compared to the host genes; e.g. for Met it is 20.2 (per thousand codons) in the host and 24.2 in the phage, for Gly it is 14.1 and 28.6, for Asn it is 24.8 and 33.8, and for Thr it is 7.2 and 17.9 (see Table 3), being consistent with the hypothesis of the translational role of phage tRNAs (Kropinski & Sibbald, 1999).Another example is seen with Streptomyces phage φC31 (Smith, Burns, Wilson, & Gregory, 1999) harboring a Thr-tRNA with the CGU anticodon, which reads the ACG codon.In the host S. coelicolor (Bentley, Chater, Cerdeno-Tarraga, Challis, Thomson, James et al., 2002) genes, the Thr codon ACC is most heavily used (32.8 occurrences per thousand codons, see Table 3) and ACG codon is less frequently used (6.3).In the phage φC31 genes, the frequency of the ACG codon increases to 11.8.Similarly, in the case of the Lys-tRNA found in the Heamophilus influenzae phage HP1 (Esposito, Fitzmaurice, Benjamin, Goodman, Waldman, & Scocca, 1995), the occurrence of Lys per thousand amino acids slightly increases to 77.4 in HP1 from 63.1 in its host genes (Fleischmann, Adams, White, Clayton, Kirkness, Kerlavage et al., 1995).Furthermore, an increased frequency of codons that are read by phage 933W (Plunkett III, Rose, Durfee, & Blattner, 1999) Ile-tRNA or Arg-tRNA with anticodon UCU can also be seen in Table 3.In the Shiga toxin II converting phage VT2-Sa, which is closely related to phage 933W and encodes an identical tRNA gene cluster, it has been reported that phage encoded Arg-tRNAs recognize rare codons in the host but these rare codons occur in the toxin genes at a high frequency (Kanjo & Inokuchi, 1999).All these examples suggest a strategy in which phage tRNAs may adjust tRNA populations in infected cells to produce the efficient production of phage proteins.The phage 933W's Arg-tRNA with the UCG anticodon is very unusual; a search of the tRNA database (Sprinzl, Horn, Brown, Ioudovitch, & Steinberg, 1998) shows no other examples of the UCG anticodon in bacteria, although this anticodon can be found in animals.Therefore, it is likely that this unusual anticodon species was laterally transferred into the phage genome during evolution.
As mentioned earlier, phage-encoded tRNA genes tend to appear in a cluster on the phage's genome.Was the gene cluster horizontally transferred?Or, was the cluster formed during evolution by disrupting unnecessary tRNA genes and retaining necessary ones?The evolutionary origins and variations of phage tRNAs are interesting issues.

CONCLUSION
The presently available data on the cellular concentrations of tRNAs in E. coli, T4 phage-encoded tRNA species, codon usage of other phages harboring their own tRNAs, suggest a strategy for phage survival.Phage tRNAs supplement host tRNA populations to enable efficient production of phage proteins.The author would like to emphasize the importance of open and fully accessible databases.

Figure 1 .
Figure 1.Codon usage patterns of E. coli and phage T4

Figure 2 .
Figure 2. Comparison of amino acid usage between M. tuberculosis (shown by gray bars) and its phage D29 (green bars).

Table 2 .
Codon usage and tRNA availability The total number of codons is 1363498 (4290 protein-coding genes) for E. coli , 53183 (271 genes) for T4, 1335687 (3924 genes) for M. tuberculosis , and 14789 (77 genes) for D29.Sums over synonymous codons are shown in italic.Optimal codons of E coli are underlined..

Table 3 .
Codon usage of tRNA-harboring phages and their host bacteria.