Table Of ContentsNext Page

Large Scale Analysis of Expressed Sequence Tags from Indica Rice

Jianwei Zhang1, Qi Feng2, Caoqing Jin2, Lida Zhang1, Dejun Yuan1, Bin Han2, Qifa Zhang1 and Shiping Wang1

1National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China (swang@mail.hzau.edu.cn)
2
Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China

Abstract

Expressed sequence tags (ESTs) and full-length cDNAs of rice in public databases represent less than 40,000 genes. This number is far smaller than the upper limit of the predicted genes of the rice genome. In addition, the majority of the expressed rice sequences (62%) in the databases are from japonica subspecies. Divergence in the coding regions by single-nucleotide polymorphism, insertions and deletions between japonica and indica cultivars has been observed. Thus, a large-scale collection of ESTs from indica cultivars is a prerequisite for genetic studies, gene identification and isolation, and the comparative study of the two subspecies of cultivated rice. In this study we collected 22,409 unique ESTs, including 2,643 new ones, from the indica rice cultivar Minghui 63, an elite restorer line for a number of rice hybrids that have occupied more than 20% of the total rice production area in China for the last two decades. A total of 18,844 (84.1%) of the ESTs were mapped in the rice molecular linkage map by BLASTN searches against rice genomic sequences with known chromosome locations. Another 730 ESTs had sequence homology with rice genomic sequences of unknown chromosomal location. More than half (10,116) of the mapped ESTs are from single-copy genes. The rest of the mapped ESTs (8,728) detected two or more loci in the rice genome. Thus, an EST map containing 66,623 loci was constructed. These data provide information for comparative studies of gene expression, gene structure and genome evolution of the two subspecies of cultivated rice.

Media summary

Indica rice ESTs will provide information for comparative studies of gene expression, gene structure and genome evolution of the two subspecies of cultivated rice.

Keywords

Transcript map, EST database

Introduction

A total of 201,794 ESTs (http://www.ncbi.nlm.nih.gov, release 062003) and 28,469 full-length cDNAs (The Rice Full-Length cDNA Consortium, 2003) of rice are currently available at the public nucleotide database GenBank. The EST and cDNA sequences in the database represent an estimated 39,787 rice genes, based on our clustering analysis using the program ESTClustering (Zhang et al. 2003). The number of rice genes represented by the EST clusters and full-length cDNA sequences is far smaller than the number of rice genes predicted by whole-genome sequence analysis (Goff et al, 2002; Yu et al, 2002). Cultivated rice (Oryza sativa L.) is divided into two subspecies, indica and japonica. Most of the ESTs (61.9%) (http://www.ncbi.nlm.nih.gov, release 062003) and all the 28,469 full-length cDNAs (The Rice Full-Length cDNA Consortium, 2003) of rice released in the public databases are from japonica cultivars. Comparison of a 2.3-Mbp region between the two rice subspecies revealed divergence at the level of total gene numbers, single-nucleotide polymorphisms in coding regions, and insertions and deletions in coding regions (Feng et al. 2002). These results indicate that it is necessary to carry out a large-scale analysis of expressed sequences from indica cultivars.

To obtain expressed sequences of rice, especially those from rarely expressed genes, we constructed a normalized whole-life-cycle cDNA library of rice. This library consisted of cDNA from 15 tissues of 9 developmental stages (Chu et al. 2003). Inverse Northern blotting showed that this cDNA library included many rarely expressed sequences. Thus it is a valuable source for identification of new ESTs. Furthermore, the normalized cDNA library was constructed using tissues from the indica cultivar Minghui 63, an elite restorer line for a number of rice hybrids that are widely cultivated in China. The hybrids produced with Minghui 63 have many important virtues, such as high yield and wide adaptability, which enable these hybrids to account for more than 20% of the total rice production area in China during the last two decades. Characterization of the genome of Minghui 63 at the mRNA level will facilitate the identification of novel genes controlling important agricultural traits.

The present study was undertaken to analyze the sequences of the cDNA clones in the normalized whole-life-cycle cDNA library constructed with Minghui 63, and to determine the chromosomal locations of the ESTs.

Material and methods

Analysis of cDNA sequences

The clones from a normalized whole-life-cycle cDNA library (Chu et al. 2003), constructed with rice cultivar Minghui 63 (O. sativa ssp. indica), were sequenced from the 5’ ends using T7 primer. The ESTs were obtained by clustering and assembling the cDNA sequences using a modified version of the ESTClustering program (Zhang et al. 2003). The cDNA sequences were assembled, using the criteria that the overlapping regions of two sequences must be larger than 40 bp and the nucleotide identity of the overlapping region must be larger than 94%.

Mapping of ESTs

The ESTs were mapped onto the rice chromosomes by BLASTN (Altschul et al. 1997) searches for homologous rice genomic sequences with known chromosomal locations (http://rgp.dna.affrc.go.jp and http://www.genome.clemson.edu/projects/rice/fpc), using a threshold of E ≤ 10-5 and overlapping region larger than 40 bp. The latest high-density molecular linkage map (JRGP RFLP 2000) of rice containing 3,267 RFLP markers (http://rgp.dna.affrc.go.jp/publicdata/geneticmap2000/index.html) was used as the framework map for mapping of the ESTs.

Results and discussion

Sequence analysis of the cDNA library

The normalized whole-life-cycle cDNA library consisted of 62,000 clones with an average insert length of 1.4 kb (Chu et al. 2003). Random sequencing of about 40,000 cDNA clones from this library generated 39,208 readable sequences ranging from 100 to 1,495 bp, with an average length of 609 bp. Clustering and assembling of these sequences with a modified version of the ESTclustering program produced 22,409 unique ESTs. The ESTs ranged from 100 to 2,996 bp, with average length of 612 bp. All the sequences are available to the public through our web site: Rice EST Database, REDB (http://redb.ricefgchina.org or http://bioinformatics.hzau.edu.cn). Most of the ESTs were from singletons (77.8%). Approximately 16% and 5% of the ESTs were generated from 2 to 3 and 4 to 8 homologous sequences, respectively. Only less than 2% of the ESTs were conjoined with more than 10 homologous sequences.

Analysis of the 22,409 ESTs using the BLASTN program revealed 2,643 (11.8%) new rice ESTs that had no match or a poor match (E > 10-5) to the 201,794 rice EST entries and 28,469 full-length rice cDNA entries at GenBank. Since the ESTs were from a cDNA library constructed with tissues from 9 developmental stages of plants challenged with abiotic and biotic stresses (Chu et al. 2003), these novel ESTs may mainly represent tissue-specific genes or stress-responsive genes.

Construction of a high-density EST map

A total of 18,844 (84.1%) of the ESTs from the indica cultivar Minghui 63 could be mapped on the rice molecular linkage map by BLASTN searches against rice genome sequences, which were from the japonica cultivar Nipponbare (E ≤ 10-5; overlapping regions larger than 40 bp) (Table 1). The numbers of ESTs anchored on the chromosomes decrease gradually when increasing the stringency of the mapping threshold successively (Fig. 1). For example, only 28.4% of the ESTs could be mapped on the molecular linkage map when setting the mapping threshold of E value at 0. This suggests that large portion of the genes from indica and japonica cultivars have various degrees of sequence divergence.

More than half (53.7%) of the 18,844 ESTs mapped on the molecular linkage map are from single-copy genes (Table 1). The rest of the ESTs (8,728) detected two or more loci in the rice genome. Thus, a total of 66,623 loci were assigned. The multiple loci detected by one EST are marked by a lower case letter (a to z) following the EST name, with the ‘a’ locus having the highest sequence identity to the EST. When more than 26 loci are detected by one EST, the loci after 26 are marked by ‘*’ following the EST name. The EST map can be obtained from our EST Database, REDB (http://redb.ricefgchina.org or http://bioinformatics.hzau.edu.cn). Among the 2,643 new rice ESTs, 710 were assigned to the molecular linkage map of rice. Five hundreds and thirty six of the 710 mapped novel ESTs each detected one locus and the remaining 174 novel ESTs each detected two or more loci in the rice genome. A striking feature of the distribution of the multiple-copy ESTs is that the multiple loci detected by 7,507 (86%) ESTs locate on different chromosomes (Table 1). It has been reported that the multiple loci detected by the same EST or genomic DNA probes, such as retrotransposon and RFLP markers, distributed frequently to similar locations of different rice chromosomes (Wang et al. 1999 and 2000; Xiong et al. 2002; Chu et al. 2004). These findings lead to the hypothesis that chromosome duplication followed by diversification may be a mechanism for the origin and evolution of the rice chromosomes. Recent studies suggest that partial (Vandepoele et al. 2003) or complete (Paterson et al. 2003) genome duplication of rice predate the divergence of rice and other cereals. The distribution patterns of 7,507 multi-loci ESTs in rice genome may be another evidence to support the hypothesis of chromosomal duplication followed by diversification.

Figure 1. Proportion of the ESTs anchored on the rice molecular linkage map through sequence homology analysis at different thresholds of the expect value. The number of mapped ESTs represented by each bar is 18,844, 18,658, 18,169, 17,621, 12,388 and 6,369 from left to right.

Number of locus

Number of EST

Percentage

(%)

Total

Multi-loci on different chromosomes

Multi-loci on the same chromosome

1

10116

   

53.7

2

3915

3574

341

20.8

3

1711

1687

24

9.1

4

803

803

 

4.3

5

514

514

 

2.7

6

321

321

 

1.7

7

213

213

 

1.1

8

156

156

 

0.8

9

132

132

 

0.7

10

107

107

 

0.6

>10

856

856

 

4.5

Total

18844

7507

365

100

Table 1. Distribution of loci detected by the ESTs in the rice genome

About 16% (3,565) of the ESTs studied could not be mapped onto the molecular linkage map by sequence homology analysis against the rice genome sequences with known chromosomal locations. Among these unmapped ESTs, 730 (including 127 new ESTs) had sequence homology with rice genomic sequences with unknown chromosomal locations released by GenBank and the Beijing Genomics Institute (http://btn.genomics.org.cn/rice/) (E ≤ 10-5; overlapping regions larger than 40 bp). The remaining non-matching 2,835 ESTs may be due to the following reasons. First, sequencing of the rice genome has not been completed. The finished sequences of rice chromosomes 1, 4 and 10 have only up to 97.3% of coverage of the chromosomes (Feng et al. 2002; Sasaki et al. 2002; The Rice Chromosome 10 Sequencing Consortium 2003). Second, the genomic sequences with known chromosomal locations, used for the homology analysis of the ESTs from the indica cultivar were from the japonica cultivar Nipponbare. Divergence at the level of gene structure and nucleotide sequence between indica and japonica subspecies has been observed (Feng et al. 2002; Sasaki et al. 2002; Yu et al. 2002; The Rice Chromosome 10 Sequencing Consortium 2003). Third, some of the ESTs might represent contamination from non-rice sources.

Conclusions

The indica subspecies of cultivated rice occupies the largest rice production area in the world. This is the first report of a large-scale collection, annotation and mapping of ESTs from an indica cultivar. This information will greatly facilitate the annotation of the indica rice genome and the comparison of the evolution between indica and japonica subspecies at the level of gene expression.

References

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402.

Chu Z, Peng K, Zhang L, Zhou B, Wei J, Wang S (2003) Construction and characterization of a normalized whole-life-cycle cDNA library of rice. Chinese Sci Buletin 48, 229-235.

Chu Z, Ouyang Y, Zhang J, Yang H, Wang S. 2004. Genome-wide analysis of defense-responsive genes in bacterial blight resistance of rice mediated by a recessive R gene, xa13. Mol Gen Genomics (in press)

Feng Q, Zhang Y, Hao P, Wang S, Fu G, Huang Y, Li Y, Zhu J, Liu Y, Hu X, Jia P, Zhang Y, Zhao Q, Ying K, Yu S, Tang Y, Weng Q, Zhang L, Lu Y, Mu J, Lu Y, Zhang LS, Yu Z, Fan D, Liu X, Lu T, Li C, Wu Y, Sun T, Lei H, Li T, Hu H, Guan J, Wu M, Zhang R, Zhou B, Chen Z, Chen L, Jin Z, Wang R, Yin H, Cai Z, Ren S, Lv G, Gu W, Zhu G, Tu Y, Jia J, Zhang Y, Chen J, Kang H, Chen X, Shao C, Sun Y, Hu Q, Zhang X, Zhang W, Wang L, Ding C, Sheng H, Gu J, Chen S, Ni L, Zhu F, Chen W, Lan L, Lai Y, Cheng Z, Gu M, Jiang J, Li J, Hong G, Xue Y, Han B (2002) Sequence and analysis of rice chromosome 4. Nature 420, 316-320.

Goff S A, Ricke D, Lan T H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W L, Chen L, Cooper B, Park S, Wood T C, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L, ssp. japonica). Science 296, 92-100.

Paterson AH, Bowers JE, Peterson DG, Estill JC, Chapman BA (2003) Structure and evolution of cereal genomes. Curr Opin Genet Dev 13, 644-650.

Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y, Antonio BA, Kanamori H, Hosokawa S, Masukawa M, Arikawa K, Chiden Y, Hayashi M, Okamoto M, Ando T, Aoki H, Arita K, Hamada M, Harada C, Hijishita S, Honda M, Ichikawa Y, Idonuma A, Iijima M, Ikeda M, Ikeno M, Ito S, Ito T, Ito Y, Ito Y, Iwabuchi A, Kamiya K, Karasawa W, Katagiri S, Kikuta A, Kobayashi N, Kono I, Machita K, Maehara T, Mizuno H, Mizubayashi T, Mukai Y, Nagasaki H, Nakashima M, Nakama Y, Nakamichi Y, Nakamura M, Namiki N, Negishi M, Ohta I, Ono N, Saji S, Sakai K, Shibata M, Shimokawa T, Shomura A, Song J, Takazaki Y, Terasawa K, Tsuji K, Waki K, Yamagata H, Yamane H, Yoshiki S, Yoshihara R, Yukawa K, Zhong H, Iwama H, Endo T, Ito H, Hahn JH, Kim HI, Eun MY, Yano M, Jiang J, Gojobori T (2002) The genome sequence and structure or rice chromosome 1. Nature 420, 312-316.

The Rice Chromosome 10 Sequencing Consortium (2003) In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300, 1566-1569.

The Rice Full-Length cDNA Consortium (2003) Collection, Mapping, and Annotation of over 28,000 cDNA clones from japonica rice. Science 301, 376-379.

Vandepoele K, Simillion C, Van de Peer Y. 2003. Evidence that rice and other cereals are ancient aneuploids. Plant Cell 15, 2192-2202.

Wang S, Liu N, Peng K, Zhang Q (1999) The distribution and copy number of copia-like retrotransposons in rice (Oryza sativa L.) and their implications in the organization and evolution of the rice genome. Proc Natl Acad Sci USA 96, 6824-6828.

Wang S, Liu K, Zhang Q (2000) Segmental duplications are common in rice genome. Acta Botanica Sinica 42, 1150-1155.

Xiong M, Wang S, Zhang Q (2002) Coincidence in map positions between pathogen-induced defense-responsive genes and quantitative resistance loci in rice. Science in China (Series C) 45, 518-526.

Yu J, Hu S, Wang J, Wong G K, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92.

Zhang LD, Yuan DJ, Zhang JW, Wang S, Zhang Q (2003) A new method for EST clustering. Acta Genetica Sinica 30, 147-153.

Top Of PageNext Page