TITLE: Computational Proteomics: Genome-scale Analysis of Protein Structure, Function, & Evolution Mark Gerstein P Harrison, J Qian, R Jansen, V Alexandrov, P Bertone, R Das, D Greenbaum, W Krebs, Y Liu, H Hegyi, N Echols, J Lin, C Wilson, A Drawid, Z Zhang, Y Kluger, N Lan, N Luscombe, S Balasubramanian Molecular Biophysics & Biochemistry Department Yale University New Haven, CT http://bioinfo.mbb.yale.edu My talk will address two major post-genomic challenges: trying to predict protein function on a genomic scale and interpreting intergenic regions. I will approach both of these through analyzing the properties and attributes of proteins in a database framework. The work on predicting protein function will discuss the strengths and limitations of a number of approaches: (i) using sequence similarity; (ii) using structural similarity; (iii) clustering microarray experiments; and (iv) data integration. The last approach involves systematically combining information from the other three and holds the most promise for the future. For the sequence analysis, I will present a similarity threshold above which functional annotation can be transferred, and for the microarray analysis, I will present a new method of clustering expression timecourses that finds "time-shifted" relationships. In the second part of the talk, I will survey the occurrence of pseudogenes in several large eukaryotic genomes, concentrating on grouping them into families and functional categories and comparing these groupings with those of existing "living" genes. In particular, we have found that duplicated pseudogenes tend to have a very different distribution than one would expect if they were randomly derived from the population of genes in the genome. They tend to lie on the end of chromosomes, have an intermediate composition between that of genes and intergenic DNA, and, most importantly, have environmental-response functions. This suggests that they may be resurrectable protein parts, and there is a potential mechanism for this in yeast. References J Qian, B Stenger, C Wilson, J Lin, R Jansen, W Krebs, V Alexandrov, N Echols, S Teichmann, J Park, M Gerstein. "PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information." Nucleic Acids Res 29: 1750-64 (2001). P Harrison H Hegyi, P Bertone, N Echols, T Johnson, S Balasubramanian, N Luscombe, M Gerstein. "Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome." Nucleic Acids Res 29: 818-30 (2001). J Qian, M Dolled-Filhart, J Lin, H Yu, M Gerstein. "Beyond synexpression relationships: Local Clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions." J Mol Biol 314: 1053-1066 (2001). R Jansen, D Greenbaum, M Gerstein. "Relating whole-genome expression data with protein-protein interactions." Genome Research 12: 37-46 (2002). P Harrison, H Hegyi, P Bertone, N Echols, T Johnson, S Balasubramanian, N Luscombe, M Gerstein. "Molecular fossils in the human genome: Identification and analysis of pseudogenes in chromosomes 21 and 22." Genome Research 12: 273-281 (2002).