Below is an abstract for my general Spring 2000 talk. Mark Gerstein -- Title: Comparative Genomics: Surveys of a Finite Parts List Abstract: My talk will be deal with topics from comparative genomics, structural genomics, and large-scale analysis of gene-expression data. I will focus on how the "finite list of protein parts" can help simplify and interpret genome sequences and expression data. I'll try to touch on some of the specific points below: 1. A PARTS LIST OF FOLDS. An essential requirement for a structure survey is a library of folds, which groups the known structures into "fold families." I will describe various aspects of a fold library, including methods of structural alignment and how important objective statistical measures are for assessing similarities within the library. I will also describe how a library of folds can be used to precisely parameterize the degree to which structural annotation can be transferred as a function of sequence divergence. 2. THE OCCURRENCE OF PARTS IN GENOMES. One can use a fold library to count the number of folds in genomes, expressing the results in the form of Venn diagrams, "top-10" lists, and whole-genome fold trees for shared and common folds. One particular analysis shows that the common folds shared between very different microorganisms - i.e. in different kingdoms - have a remarkably similar super secondary structure, being comprised of repeated strand-helix-strand units. 3. FUNCTIONS OF THE PARTS. Next, I will turn to relating structure to function, looking comprehensively at how many functional roles a given structural part can have. This will focus on the occurrence of folds and functions in the yeast genome. Also, I will present a measure of to what degree functional annotation can transferred at given amounts of sequence identity. 4. USING PARTS TO ANALYZE EXPRESSION DATA. Finally, I will look at how protein parts and their attributes can be used to help analyze whole-genome expression data. I will relate look at the structural and functional characteristics of the most highly expressed proteins in the yeast transcriptome and develop a list of the most highly expressed folds. I will show how simple clustering of microarray data does not necessarily find proteins of similar function. I will also show how expression level is correlated wtih the subcellular localization of proteins and how this fact can be combined with sequence patterns to predict localization in a Bayesian framework. Continuously updated tables and further information pertinent to this talk is available over the web at http://bioinfo.mbb.yale.edu/genome. The talk is available from http://bioinfo.mbb.yale.edu/lectures, sublink http://bioinfo.mbb.yale.edu/lectures/spring2000 .