Award Title:
Arabidopsis 2010: Large-scale fluorescent
tagging of full-length genes to characterize native expression patterns and
subcellular targeting of Arabidopsis
proteins of unknown function
Award Abstract:
This pilot project will develop a high-throughput strategy to analyze native expression patterns and subcellular localization of Arabidopsis gene products of unknown function. This strategy, fluorescent Tagging of Full-Length Proteins (FTFLP), will comprise five major steps: (1) Selection of “functionally unassigned” Arabidopsis genes and prediction of their protein structure and suitable site for fluorescent tag insertion (2) Amplification of each gene in two parts, with the junction between the two parts corresponding to our chosen insertion site for the fluorescent tag(3) Introduction of the fluorescent tag, yellow fluorescent protein (YFP) using a triple overlap PCR approach (4) Insertion of PCR products into binary vectors (5) Production of transgenic Arabidopsis lines and analysis of expression pattern and intracellular localization for each tagged protein. As a pilot approach, the project aims to analyze a statistically significant number of genes to support the applicability to a subsequent wider study. To this end, approximately 800 genes (listed at the already operational project website http://arabidopsis.org/info/2010_projects/proteintagging.html) were selected from a total of ca. 8,000 unknown genes. This pilot list was chosen based on the following sequentially-applied criteria: 1) have matching full-length cDNA, 2) are annotated as ‘unknown protein’ or ‘putative protein’, and 3) do not have any Gene Ontology annotations. The selected genes reflect the diversity of all the unknown Arabidopsis genes with respect to plant specificity, predicted domain and/or gene family information, and availability of matching full-length cDNA sequences.
FTFLP as a tool for functional proteomics offers
three significant advantages: it focuses on genes of unknown function, it
produces internally-tagged full length proteins that are more likely to exhibit
faithful intracellular localization, and it utilizes native promoters to allow
us to determine tissue specificity. Three deliverables will be offered to the
research community:
1) Expression vectors harboring full-length
sequences for each gene under its native promoter and tagged with YFP flanked
by unique restriction sites,
2) Arabidopsis
transgenic lines expressing each construct, and
3) A website and a searchable database containing
information about the lines and constructs, including the gene sequences
highlighted with positions of primers and tagging sites, vector construct
information, images and text descriptions of the protein expression pattern and
intracellular localization, and protocols and standard operation procedures in
experimentation, analysis, and interpretation. Also, a Reference Protein
Subcellular Localization Map will be constructed using fluorescently-tagged
proteins with known intracellular targeting.
These resources will be available to the public
through two unrestricted venues: DNA constructs and transgenic seeds will be
distributed through the Arabidopsis
Biological Resource Center (ABRC) whereas gene sequences and expression and
subcellular localization data, including fluorescence microscopy images, will
be disseminated via the project website integrated into The Arabidopsis Information Resource (TAIR).
Importantly, this sharing of the resources and results of this project through
ABRC and TAIR, respectively, will take place on a continuous basis as the
deliverables become available. Announcements on the availability of new
resources will be made through such electronic media as the Bionet USENET newsgroups and parallel e-mail lists.
This project significantly advances the overall objectives of the 2010 Project by characterizing on a large scale the expression and subcellular localization of unknown Arabidopsis genes. Our understanding of Arabidopsis biology will be glaringly incomplete without such knowledge. In addition, this project has a broader impact on the society and science. Once this pilot project demonstrates the feasibility of the proposed approach, it will serve a basis for developing a laboratory curriculum for use in cell biology training of high school students and teachers as well as beginning investigators at the CSHL DNA Learning center and the annual Arabidopsis Molecular Genetics Course, and at the biannual UCR Plant Cell Biology course. Finally, a teaching outreach program with community colleges will involve undergraduates in summer research. Thus, our program will bridge genomic approaches with cell biology in the laboratory and classroom, and generate important novel information and tools to characterize the Arabidopsis proteome.