Analyis of Genomes and Transcriptomes in terms of the Occurrence of Protein Parts and Features Mark Gerstein Molecular Biophysics & Biochemistry Department Yale University New Haven, CT 06520 My talk will focus on analyzing genomes and gene-expression data in terms of the finite list of protein "parts". Depending on context, a part could be a structural fold or sequence superfamily. I will touch on the following topics: * How one can compare different genomes in terms occurrence of various parts in them. And how this idea can be extended to compare the representation of parts in the genome versus the transcriptome. In particular, this allows one to see what protein features are enriched in highly expressed proteins. * How one can analyze the relationship between where a part is located and its transcriptome occurrence -- i.e. between a protein's subcellular localization and its level of gene expression. We extend this work to develope a formal Bayesian system for predicting subcellular localization, partially based on gene expression data. * To what degree is protein function and protein-protein interactions related to similarities in the level of gene expression. Based on developing a statistical significance formalism, I will argue that while there is a definite relationship for certain classes of protein functions and protein-protein interactions, the relationship is not general and global. The absence of correlation is principally due to the inconsistent way protein function is defined. REFERENCES http://bioinfo.mbb.yale.edu M Gerstein & R Jansen (2000). "The current excitement in bioinformatics, analysis of whole-genome expression data: How does it relate to protein structure and function?" Curr. Opin. Struc. Biol. (in press). A Drawid, R Jansen & M Gerstein (2000). "Gene Expression Levels are Correlated with Protein Subcellular Localization," Trends in Genetics 16: 426-430. A Drawid & M Gerstein (2000). "A Bayesian System Integrating Expression Data with Sequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome," J. Mol. Biol. 301:1059-75 R Jansen & M Gerstein (2000). "Analysis of the Yeast Transcriptome with Broad Structural and Functional Categories: Characterizing Highly Expressed Proteins," Nuc. Acids Res. 28:1481-1488 M Gerstein (1998). "Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census," Proteins 33: 518-534.