Human Genome Annotation, focussing on Pseudogenes Z Zhang, P Harrison, Y Liu, N Carriero, D Zhang, P Bertone, J Karro, D Milburn, N Echols, J Rinn, M Snyder, M Gerstein MB&B Dept. Yale University A central problem for 21st century science will be the analysis and understanding of the human genome. My talk will be concerned with topics within this area, in particular annotating pseudogenes (protein fossils) in the genome. I will discuss a comprehensive pseudogene identification pipeline and storage database we have built. This has enabled use to identify >10K pseudogenes in the human and mouse genomes and analyze their distribution with respect to age, protein family, chromosomal location. One interesting finding is the large number of ribosomal pseudogenes in the human genome, with 80 functional ribosomal proteins giving rise to ~2,000 ribosomal protein pseudogenes. At end I will talk broadly about pseudogenes, in terms of their composition and mutation rates and I will compare pseudogenes in the human with those in a number of other model organisms, including worm, fly, yeast, and various prokaryotes. I will also talk about the problem of identifying pseudogenes in relation to the overall problem of finding genes in genome. http://bioinfo.mbb.yale.edu http://pseudogene.org Comparative analysis of processed pseudogenes in the mouse and human genomes. Z Zhang, N Carriero, M Gerstein (2004) Trends Genet 20: 62-7. Identification of pseudogenes in the Drosophila melanogaster genome. PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003) Nucleic Acids Res 31: 1033-7. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003) Nucleic Acids Res 31: 5338-48. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. PM Harrison, M Gerstein (2002) J Mol Biol 318: 1155-74. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Z Zhang, P Harrison, M Gerstein (2002) Genome Res 12: 1466-82. Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. PM Harrison, N Echols, MB Gerstein (2001) Nucleic Acids Res 29: 818-30.