|
Navigation
Social insect genomes: dramatic evolution in gene composition & regulation, preserving regulatory features linked to socialitySimola DF, Wissler L, Donahue G, Waterhouse RM, Helmkampf M, Roux J, Nygaard S, Glastad K, Hagen DE, Viljakainen L, Reese JT, Hunt BG, Graur D, Elhaik E, Kriventseva E, Wen J, Parker BJ, Cash E, Privman E, Childers CP, Munos-Torres MC, Boomsma JJ, Bornberg-Bauer E, Currie C, Elsik CG, Suen G, Goodisman MA, Keller L, Liebig J, Rawls A, Reinberg D, Smith CD, Smith CR, Tsutsui N, Wurm Y, Zdobnov EM, Berger SL, Gadau J. Genome Research PMID: 23636946 Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ~4,000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared to Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of non-coding regulatory elements, however extant conserved regions are enriched for novel non-coding RNAs and transcription factor binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., CREB) and trans (e.g., Forkhead) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, as two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared to other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.
OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologsWaterhouse RM, Tegenfeldt F, Li J, Zdobnov EM, Kriventseva EV. Nucleic Acids Res. PMID: 23180791 The concept of orthology provides a foundation for formulating hypotheses on gene and genome evolution, and thus forms the cornerstone of comparative genomics, phylogenomics and metagenomics. We present the update of OrthoDB-the hierarchical catalog of orthologs (http://www.orthodb.org). From its conception, OrthoDB promoted delineation of orthologs at varying resolution by explicitly referring to the hierarchy of species radiations, now also adopted by other resources. The current release provides comprehensive coverage of animals and fungi representing 252 eukaryotic species, and is now extended to prokaryotes with the inclusion of 1115 bacteria. Functional annotations of orthologous groups are provided through mapping to InterPro, GO, OMIM and model organism phenotypes, with cross-references to major resources including UniProt, NCBI and FlyBase. Uniquely, OrthoDB provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and now extended with exon-intron architectures, syntenic orthologs and parent-child trees. The interactive web interface allows navigation along the species phylogenies, complex queries with various identifiers, annotation keywords and phrases, as well as with gene copy-number profiles and sequence homology searches. With the explosive growth of available data, OrthoDB also provides mapping of newly sequenced genomes and transcriptomes to the current orthologous groups.
Identification of Site-Specific Adaptations Conferring Increased Neural Cell Tropism during Human Enterovirus 71 Infection.Cordey S, Petty TJ, Schibler M, Martinez Y, Gerlach D, van Belle S, Turin L, Zdobnov EM, Kaiser L, Tapparel C. PLoS Pathog PMID: 22910880 Enterovirus 71 (EV71) is one of the most virulent enteroviruses, but the specific molecular features that enhance its ability to disseminate in humans remain unknown. We analyzed the genomic features of EV71 in an immunocompromised host with disseminated disease according to the different sites of infection. Comparison of five full-length genomes sequenced directly from respiratory, gastrointestinal, nervous system, and blood specimens revealed three nucleotide changes that occurred within a five-day period: a non-conservative amino acid change in VP1 located within the BC loop (L97R), a region considered as an immunogenic site and possibly important in poliovirus host adaptation; a conservative amino acid substitution in protein 2B (A38V); and a silent mutation in protein 3D (L175). Infectious clones were constructed using both BrCr (lineage A) and the clinical strain (lineage C) backgrounds containing either one or both non-synonymous mutations. In vitro cell tropism and competition assays revealed that the VP1(97) Leu to Arg substitution within the BC loop conferred a replicative advantage in SH-SY5Y cells of neuroblastoma origin. Interestingly, this mutation was frequently associated in vitro with a second non-conservative mutation (E167G or E167A) in the VP1 EF loop in neuroblastoma cells. Comparative models of these EV71 VP1 variants were built to determine how the substitutions might affect VP1 structure and/or interactions with host cells and suggest that, while no significant structural changes were observed, the substitutions may alter interactions with host cell receptors. Taken together, our results show that the VP1 BC loop region of EV71 plays a critical role in cell tropism independent of EV71 lineage and, thus, may have contributed to dissemination and neurotropism in the immunocompromised patient.
A remarkably stable TipE gene cluster: evolution of insect Para sodium channel auxiliary subunitsLi J, Waterhouse RM and Zdobnov EM BMC Evolutionary Biology 2011, 11:337 (18 November 2011) PMID: 22098672 Background
First identified in fruit flies with temperature-sensitive paralysis phenotypes, the Drosophila melanogaster TipE locus encodes four voltage-gated sodium (NaV) channel auxiliary subunits. This cluster of TipE-like genes on chromosome 3L, and a fifth family member on chromosome 3R, are important for the optional expression and functionality of the Para NaV channel but appear quite distinct from auxiliary subunits in vertebrates. Here, we exploited available arthropod genomic resources to trace the origin of TipE-like genes by mapping their evolutionary histories and examining their genomic architectures.
Results
We identified a remarkably conserved synteny block of TipE-like orthologues with well-maintained local gene arrangements from 21 insect species. Homologues in the water flea, Daphnia pulex, suggest an ancestral pancrustacean repertoire of four TipE-like genes; a subsequent gene duplication may have generated functional redundancy allowing gene losses in the silk moth and mosquitoes. Intronic nesting of the insect TipE gene cluster probably occurred following the divergence from crustaceans, but in the flour beetle and silk moth genomes the clusters apparently escaped from nesting. Across Pancrustacea, TipE gene family members have experienced intronic nesting, escape from nesting, retrotransposition, translocation, and gene loss events while generally maintaining their local gene neighbourhoods. D. melanogaster TipE-like genes exhibit coordinated spatial and temporal regulation of expression distinct from their host gene but well-correlated with their regulatory target, the Para NaV channel, suggesting that functional constraints may preserve the TipE gene cluster. We identified homology between TipE-like NaV channel regulators and vertebrate Slo-beta auxiliary subunits of big-conductance calcium-activated potassium (BKCa) channels, which suggests that ion channel regulatory partners have evolved distinct lineage-specific characteristics.
Conclusions
TipE-like genes form a remarkably conserved genomic cluster across all examined insect genomes. This study reveals likely structural and functional constraints on the genomic evolution of insect TipE gene family members maintained in synteny over hundreds of millions of years of evolution. The likely common origin of these NaV channel regulators with BKCa auxiliary subunits highlights the evolutionary plasticity of ion channel regulatory mechanisms.
Rhinovirus genome variation during chronic upper and lower respiratory tract infections.Tapparel C, Cordey S, Junier T, Farinelli L, Van Belle S, Soccal PM, Aubert JD, Zdobnov EM, Kaiser L. PLoS One PMID: 21713005 Routine screening of lung transplant recipients and hospital patients for respiratory virus infections allowed to identify human rhinovirus (HRV) in the upper and lower respiratory tracts, including immunocompromised hosts chronically infected with the same strain over weeks or months. Phylogenetic analysis of 144 HRV-positive samples showed no apparent correlation between a given viral genotype or species and their ability to invade the lower respiratory tract or lead to protracted infection. By contrast, protracted infections were found almost exclusively in immunocompromised patients, thus suggesting that host factors rather than the virus genotype modulate disease outcome, in particular the immune response. Complete genome sequencing of five chronic cases to study rhinovirus genome adaptation showed that the calculated mutation frequency was in the range observed during acute human infections. Analysis of mutation hot spot regions between specimens collected at different times or in different body sites revealed that non-synonymous changes were mostly concentrated in the viral capsid genes VP1, VP2 and VP3, independent of the HRV type. In an immunosuppressed lung transplant recipient infected with the same HRV strain for more than two years, both classical and ultra-deep sequencing of samples collected at different time points in the upper and lower respiratory tracts showed that these virus populations were phylogenetically indistinguishable over the course of infection, except for the last month. Specific signatures were found in the last two lower respiratory tract populations, including changes in the 5'UTR polypyrimidine tract and the VP2 immunogenic site 2. These results highlight for the first time the ability of a given rhinovirus to evolve in the course of a natural infection in immunocompromised patients and complement data obtained from previous experimental inoculation studies in immunocompetent volunteers.
Loss of Dicer in Sertoli cells has a major impact on the testicular proteome of mice.Papaioannou MD, Lagarrigue M, Vejnar CE, Rolland AD, Kühne F, Aubry F, Schaad O, Fort A, Descombes P, Neerman-Arbez M, Guillou F, Zdobnov EM, Pineau C, Nef S. Molecular & Cellular Proteomics PMID: 20467044 Sertoli cells (SCs) are the central, essential coordinators of spermatogenesis, without which germ cell development cannot occur. We previously showed that Dicer, an RNaseIII endonuclease required for microRNA (miRNA) biogenesis, is absolutely essential for Sertoli cells to mature, survive, and ultimately sustain germ cell development. Here, using isotope-coded protein labeling, a technique for protein relative quantification by mass spectrometry, we investigated the impact of Sertoli cell-Dicer and subsequent miRNA loss on the testicular proteome. We found that, a large proportion of proteins (50 out of 130) are up-regulated by more that 1.3-fold in testes lacking Sertoli cell-Dicer, yet that this protein up-regulation is mild, never exceeding a 2-fold change, and is not preceeded by alterations of the corresponding mRNAs. Of note, the expression levels of six proteins of interest were further validated using the Absolute Quantification (AQUA) peptide technology. Furthermore, through 3'UTR luciferase assays we identified one up-regulated protein, SOD-1, a Cu/Zn superoxide dismutase whose overexpression has been linked to enhanced cell death through apoptosis, as a likely direct target of three Sertoli cell-expressed miRNAs, miR-125a-3p, miR-872 and miR-24. Altogether, our study, which is one of the few in vivo analyses of miRNA effects on protein output, suggests that, at least in our system, miRNAs play a significant role in translation control.
Silencing of c-Fos expression by microRNA-155 is critical for dendritic cell maturation and function.Dunand-Sauthier I, Santiago-Raber ML, Capponi L, Vejnar CE, Schaad O, Irla M, Seguín-Estévez Q, Descombes P, Zdobnov EM, Acha-Orbea H, Reith W. Blood PMID: 21385848 MicroRNAs (miRNAs) are small, noncoding RNAs that regulate target mRNAs by binding to their 3' untranslated regions. There is growing evidence that microRNA-155 (miR155) modulates gene expression in various cell types of the immune system and is a prominent player in the regulation of innate and adaptive immune responses. To define the role of miR155 in dendritic cells (DCs) we performed a detailed analysis of its expression and function in human and mouse DCs. A strong increase in miR155 expression was found to be a general and evolutionarily conserved feature associated with the activation of DCs by diverse maturation stimuli in all DC subtypes tested. Analysis of miR155-deficient DCs demonstrated that miR155 induction is required for efficient DC maturation and is critical for the ability of DCs to promote antigen-specific T-cell activation. Expression-profiling studies performed with miR155(-/-) DCs and DCs overexpressing miR155, combined with functional assays, revealed that the mRNA encoding the transcription factor c-Fos is a direct target of miR155. Finally, all of the phenotypic and functional defects exhibited by miR155(-/-) DCs could be reproduced by deregulated c-Fos expression. These results indicate that silencing of c-Fos expression by miR155 is a conserved process that is required for DC maturation and function.
The ecoresponsive genome of Daphnia pulexColbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, Tokishita S, Aerts A, Arnold GJ, Basu MK, Bauer DJ, Cáceres CE, Carmel L, Casola C, Choi JH, Detter JC, Dong Q, Dusheyko S, Eads BD, Fröhlich T, Geiler-Samerotte KA, Gerlach D, Hatcher P, Jogdeo S, Krijgsveld J, Kriventseva EV, Kültz D, Laforsch C, Lindquist E, Lopez J, Manak JR, Muller J, Pangilinan J, Patwardhan RP, Pitluck S, Pritham EJ, Rechtsteiner A, Rho M, Rogozin IB, Sakarya O, Salamov A, Schaack S, Shapiro H, Shiga Y, Skalitzky C, Smith Z, Souvorov A, Sung W, Tang Z, Tsuchiya D, Tu H, Vos H, Wang M, Wolf YI, Yamagata H, Yamada T, Ye Y, Shaw JR, Andrews J, Crease TJ, Tang H, Lucas SM, Robertson HM, Bork P, Koonin EV, Zdobnov EM, Grigoriev IV, Lynch M, Boore JL. Science PMID: 21292972 We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.
OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV. Nucleic Acids Res. PMID: 20972218 The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb.
Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi.Waterhouse RM, Zdobnov EM, Kriventseva EV. Genome Biol Evol PMID: 21148284 Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups, and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multi-copy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the 'knockout-rate prediction' of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multi-copy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously under-appreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of "single-copy control" versus "multi-copy license" may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space.
Pathogenomics of Culex quinquefasciatus and meta-analysis of infection responses to diverse pathogensBartholomay LC, Waterhouse RM, Mayhew GF, Campbell CL, Michel K, Zou Z, Ramirez JL, Das S, Alvarez K, Arensburger P, Bryant B, Chapman SB, Dong Y, Erickson SM, Karunaratne SH, Kokoza V, Kodira CD, Pignatelli P, Shin SW, Vanlandingham DL, Atkinson PW, Birren B, Christophides GK, Clem RJ, Hemingway J, Higgs S, Megy K, Ranson H, Zdobnov EM, Raikhel AS, Christensen BM, Dimopoulos G, Muskavitch MA. Science PMID: 20929811 The mosquito Culex quinquefasciatus poses a substantial threat to human and veterinary health as a primary vector of West Nile virus (WNV), the filarial worm Wuchereria bancrofti, and an avian malaria parasite. Comparative phylogenomics revealed an expanded canonical C. quinquefasciatus immune gene repertoire compared with those of Aedes aegypti and Anopheles gambiae. Transcriptomic analysis of C. quinquefasciatus genes responsive to WNV, W. bancrofti, and non-native bacteria facilitated an unprecedented meta-analysis of 25 vector-pathogen interactions involving arboviruses, filarial worms, bacteria, and malaria parasites, revealing common and distinct responses to these pathogen types in three mosquito genera. Our findings provide support for the hypothesis that mosquito-borne pathogens have evolved to evade innate immune responses in three vector mosquito species of major medical importance.
Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomicsArensburger P, Megy K, Waterhouse RM, Abrudan J, Amedeo P, Antelo B, Bartholomay L, Bidwell S, Caler E, Camara F, Campbell CL, Campbell KS, Casola C, Castro MT, Chandramouliswaran I, Chapman SB, Christley S, Costas J, Eisenstadt E, Feschotte C, Fraser-Liggett C, Guigo R, Haas B, Hammond M, Hansson BS, Hemingway J, Hill SR, Howarth C, Ignell R, Kennedy RC, Kodira CD, Lobo NF, Mao C, Mayhew G, Michel K, Mori A, Liu N, Naveira H, Nene V, Nguyen N, Pearson MD, Pritham EJ, Puiu D, Qi Y, Ranson H, Ribeiro JM, Roberston HM, Severson DW, Shumway M, Stanke M, Strausberg RL, Sun C, Sutton G, Tu ZJ, Tubio JM, Unger MF, Vanlandingham DL, Vilella AJ, White O, White JR, Wondji CS, Wortman J, Zdobnov EM, Birren B, Christensen BM, Collins FH, Cornel A, Dimopoulos G, Hannick LI, Higgs S, Lanzaro GC, Lawson D, Lee NH, Muskavitch MA, Raikhel AS, Atkinson PW. Science PMID: 20929810 Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.
Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyleKirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E, Gerlach D, Kriventseva EV, Elsik CG, Graur D, Hill CA, Veenstra JA, Walenz B, Tubío JM, Ribeiro JM, Rozas J, Johnston JS, Reese JT, Popadic A, Tojo M, Raoult D, Reed DL, Tomoyasu Y, Krause E, Mittapalli O, Margam VM, Li HM, Meyer JM, Johnson RM, Romero-Severson J, Vanzee JP, Alvarez-Ponce D, Vieira FG, Aguadé M, Guirao-Rico S, Anzola JM, Yoon KS, Strycharz JP, Unger MF, Christley S, Lobo NF, Seufferheld MJ, Wang N, Dasch GA, Struchiner CJ, Madey G, Hannick LI, Bidwell S, Joardar V, Caler E, Shao R, Barker SC, Cameron S, Bruggner RV, Regier A, Johnson J, Viswanathan L, Utterback TR, Sutton GG, Lawson D, Waterhouse RM, Venter JC, Strausberg RL, Berenbaum MR, Collins FH, Zdobnov EM, Pittendrigh BR Proc Natl Acad Sci U S A. 2010 Jun 21. [Epub ahead of print] PMID: 20566863 As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
The Newick Utilities: High-throughput Phylogenetic tree Processing in the UNIX Shell Junier T, Zdobnov EM Bioinformatics. 2010 May 13 PMID: 20472542 Summary: We present a suite of UNIX shell programs for processing any number of phylogenetic trees of any size. They perform frequently-used tree operations without requiring user interaction. They also allow tree drawing as scalable vector graphics (SVG), suitable for high-quality presentations and further editing, and as ASCII graphics for command-line inspection. As an example we include an implementation of bootscanning, a procedure for finding recombination breakpoints in viral genomes.
Availability: C source code, Python bindings, and executables for various platforms are available from http://cegg.unige.ch/newick_utils. The distribution includes a manual and example data. The package is distributed under the BSD License.
Rhinovirus Genome Evolution during Experimental Human InfectionCordey S, Junier T, Gerlach D, Gobbini F, Farinelli L, Zdobnov EM, Winther B, Tapparel C, Kaiser L PLoS One. 2010 May 11;5(5):e10588 PMID: 20485673 Human rhinoviruses (HRVs) evolve rapidly due in part to their error-prone RNA polymerase. Knowledge of the diversity of HRV populations emerging during the course of a natural infection is essential and represents a basis for the design of future potential vaccines and antiviral drugs. To evaluate HRV evolution in humans, nasal wash samples were collected daily for five days from 15 immunocompetent volunteers experimentally infected with a reference stock of HRV-39. In parallel, HeLa-OH cells were inoculated to compare HRV evolution in vitro. Nasal wash in vivo assessed by real-time PCR showed a viral load that peaked at 48-72 h. Ultra-deep sequencing was used to compare the low-frequency mutation populations present in the HRV-39 inoculum in two human subjects and one HeLa-OH supernatant collected 5 days post-infection. The analysis revealed hypervariable mutation locations in VP2, VP3, VP1, 2C and 3C genes and conserved regions in VP4, 2A, 2B, 3A, 3B and 3D genes. These results were confirmed by classical sequencing of additional samples, both from inoculated volunteers and independent cell infections, and suggest that HRV inter-host transmission is not associated with a strong bottleneck effect. A specific analysis of the VP1 capsid gene of 15 human cases confirmed the high mutation incidence in this capsid region, but not in the antiviral drug-binding pocket. We could also estimate a mutation frequency in vivo of 3.4x10(-4) mutations/nucleotides and 3.1x10(-4) over the entire ORF and VP1 gene, respectively. In vivo, HRV generate new variants rapidly during the course of an acute infection due to mutations that accumulate in hot spot regions located at the capsid level, as well as in 2C and 3C genes.
A Teratocarcinoma-Like Human Embryonic Stem Cell (hESC) Line and Four hESC Lines Reveal Potentially Oncogenic Genomic ChangesHovatta O, Jaconi M, Töhönen V, Béna F, Gimelli S, Bosman A, Holm F, Wyder S, Zdobnov EM, Irion O, Andrews PW, Antonarakis SE, Zucchelli M, Kere J, Feki A PLoS ONE 5(4): e10263 PMID: 20428235 The first Swiss human embryonic stem cell (hESC) line, CH-ES1, has shown features of a malignant cell line. It originated from the only single blastomere that survived cryopreservation of an embryo, and it more closely resembles teratocarcinoma lines than other hESC lines with respect to its abnormal karyotype and its formation of invasive tumors when injected into SCID mice. The aim of this study was to characterize the molecular basis of the oncogenicity of CH-ES1 cells, we looked for abnormal chromosomal copy number (by array Comparative Genomic Hybridization, aCGH) and single nucleotide polymorphisms (SNPs). To see how unique these changes were, we compared these results to data collected from the 2102Ep teratocarcinoma line and four hESC lines (H1, HS293, HS401 and SIVF-02) which displayed normal G-banding result. We identified genomic gains and losses in CH-ES1, including gains in areas containing several oncogenes. These features are similar to those observed in teratocarcinomas, and this explains the high malignancy. The CH-ES1 line was trisomic for chromosomes 1, 9, 12, 17, 19, 20 and X. Also the karyotypically (based on G-banding) normal hESC lines were also found to have several genomic changes that involved genes with known roles in cancer. The largest changes were found in the H1 line at passage number 56, when large 5 Mb duplications in chromosomes 1q32.2 and 22q12.2 were detected, but the losses and gains were seen already at passage 22. These changes found in the other lines highlight the importance of assessing the acquisition of genetic changes by hESCs before their use in regenerative medicine applications. They also point to the possibility that the acquisition of genetic changes by ESCs in culture may be used to explore certain aspects of the mechanisms regulating oncogenesis.
Functional Characterization of Transcription Factor Motifs Using Cross-species Comparison across Large Evolutionary DistancesKim J, Cunningham R, James B, Wyder S, Gibson JD, Niehuis O, Zdobnov EM, Robertson HM, Robinson GE, Werren JH, Sinha S PLoS Computational Biology 6(1):e1000652 PMID: 20126523 Abstract
We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif–function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.
Author Summary
We develop a computational pipeline for predicting the functions of transcription factor motifs, through DNA sequence analysis. The pipeline is applied to the newly sequenced genome of the jewel wasp, Nasonia vitripennis. It exploits the wealth of molecular data available in another insect species, the fruitfly Drosophila melanogaster, and uses cross-species comparison to its advantage. Our main contribution is to show how this can be done despite the large evolutionary divergence between the two species. The methodology presented here may be applied more generally to other scenarios (genomes) where comparative regulatory genomics must deal with large evolutionary divergences.
Sociality is linked to rates of protein evolution in a highly social insect Hunt BG, Wyder S, Elango N, Werren JH, Zdobnov EM, Yi SY, Goodisman MAD Journal of Molecular Biology and Evolution 27(3):497-500 PMID: 20110264 Eusocial insects exhibit unparalleled levels of cooperation and dominate terrestrial ecosystems. The success of eusocial insects stems from the presence of specialized castes that undertake distinct tasks. We investigated whether the evolutionary transition to societies with discrete castes was associated with changes in protein evolution. We predicted that proteins with caste-biased gene expression would evolve rapidly due to reduced antagonistic pleiotropy. We found that queen-biased proteins of the honeybee Apis mellifera did indeed evolve rapidly, as predicted. However, worker-biased proteins exhibited slower evolutionary rates than queen-biased or non-biased proteins. We suggest that distinct selective pressures operating on caste-biased genes, rather than a general reduction in pleiotropy, explain the observed differences in evolutionary rates. Our study highlights, for the first time, the interaction between highly social behavior and dynamics of protein evolution.
Functional and evolutionary insights from the genomes of three parasitoid Nasonia speciesThe Nasonia Genome Working Group (incl. Junier T, Gerlach D, Waterhouse RM, Kriventseva EV, Wyder S, Zdobnov EM) Science. 2010 Jan 15;327(5963):343-8. PMID: 20075255 We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
Integration of microRNA miR-122 in hepatic circadian gene expressionGatfield D, Le Martelot G, Vejnar CE, Gerlach D, Schaad O, Fleury-Olela F, Ruskeepää AL, Oresic M, Esau CC, Zdobnov EM, Schibler U Genes Dev. 2009 June 1;23(11):1313-1326 PMID: 19487572 In liver, most metabolic pathways are under circadian control, and hundreds of protein-encoding genes are thus transcribed in a cyclic fashion. Here we show that rhythmic transcription extends to the locus specifying miR-122, a highly abundant, hepatocyte-specific microRNA. Genetic loss-of-function and gain-of-function experiments have identified the orphan nuclear receptor REV-ERBα as the major circadian regulator of mir-122 transcription. Although due to its long half-life mature miR-122 accumulates at nearly constant rates throughout the day, this miRNA is tightly associated with control mechanisms governing circadian gene expression. Thus, the knockdown of miR-122 expression via an antisense oligonucleotide (ASO) strategy resulted in the up- and down-regulation of hundreds of mRNAs, of which a disproportionately high fraction accumulates in a circadian fashion. miR-122 has previously been linked to the regulation of cholesterol and lipid metabolism. The transcripts associated with these pathways indeed show the strongest time point-specific changes upon miR-122 depletion. The identification of Pparβ/δ and the peroxisome proliferator-activated receptor α (PPARα) coactivator Smarcd1/Baf60a as novel miR-122 targets suggests an involvement of the circadian metabolic regulators of the PPAR family in miR-122-mediated metabolic control.
The impact of transmission clusters on primary drug resistance in newly diagnosed HIV-1 infectionYerly S, Junier T, Gayet-Ageron A, Amari EB, von Wyl V, Günthard HF, Hirschel B, Zdobnov EM, Kaiser L, and the Swiss HIV Cohort Study. AIDS. 2009 May 29. [Epub ahead of print] PMID: 19487906 OBJECTIVES::To monitor HIV-1 transmitted drug resistance (TDR) in a well defined urban area with large access to antiretroviral therapy and to assess the potential source of infection of newly diagnosed HIV individuals. METHODS:: All individuals resident in Geneva, Switzerland, with a newly diagnosed HIV infection between 2000 and 2008 were screened for HIV resistance. An infection was considered as recent when the positive test followed a negative screening test within less than 1 year. Phylogenetic analyses were performed by using the maximum likelihood method on pol sequences including 1058 individuals with chronic infection living in Geneva.
RESULTS:: Of 637 individuals with newly diagnosed HIV infection, 20% had a recent infection. Mutations associated with resistance to at least one drug class were detected in 8.5% [nucleoside reverse transcriptase inhibitors (NRTIs), 6.3%; non-nucleoside reverse transcriptase inhibitors (NNRTIs), 3.5%; protease inhibitors, 1.9%]. TDR (P-trend = 0.015) and, in particular, NNRTI resistance (P = 0.002) increased from 2000 to 2008. Phylogenetic analyses revealed that 34.9% of newly diagnosed individuals, and 52.7% of those with recent infection were linked to transmission clusters. Clusters were more frequent in individuals with TDR than in those with sensitive strains (59.3 vs. 32.6%, respectively; P < 0.0001). Moreover, 84% of newly diagnosed individuals with TDR were part of clusters composed of only newly diagnosed individuals.
CONCLUSION:: Reconstruction of the HIV transmission networks using phylogenetic analysis shows that newly diagnosed HIV infections are a significant source of onward transmission, particularly of resistant strains, thus suggesting an important self-fueling mechanism for TDR.
The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and EvolutionThe Bovine Genome Sequencing and Analysis Consortium (incl. Gerlach D, Junier T, Kriventseva EV, Zdobnov EM) Science. 2009 Apr 24;324(5926):522-528 PMID: 19390049 To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
The bovine lactation genome: insights into the evolution of mammalian milkLemay DG, Lynn DJ, Martin WF, Neville MC, Casey TM, Rincon G, Kriventseva EV, Barris WC, Hinrichs AS, Molenaar AJ, Pollard KS, Maqbool NJ, Singh K, Murney R, Zdobnov EM, Tellam RL, Medrano JF, German JB, Rijnkels M. Genome Biol. 2009;10(4):R43. Epub 2009 Apr 24. PMID: 19393040 BACKGROUND: The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. RESULTS: Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome. Intersection of these genes with 238 milk production quantitative trait loci curated from the literature decreased the search space for milk trait effectors by more than an order of magnitude. Genome location analysis revealed a tendency for milk protein genes to be clustered with other mammary genes. Using the genomes of a monotreme (platypus), a marsupial (opossum), and five placental mammals (bovine, human, dog, mice, rat), gene loss and duplication, phylogeny, sequence conservation, and evolution were examined. Compared with other genes in the bovine genome, milk and mammary genes are: more likely to be present in all mammals; more likely to be duplicated in therians; more highly conserved across Mammalia; and evolving more slowly along the bovine lineage. The most divergent proteins in milk were associated with nutritional and immunological components of milk, whereas highly conserved proteins were associated with secretory processes. CONCLUSIONS: Although both copy number and sequence variation contribute to the diversity of milk protein composition across species, our results suggest that this diversity is primarily due to other mechanisms. Our findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago.
New respiratory enterovirus genotype and rhinovirus strains identified by genotyping circulating picornavirusesTapparel C, Junier T, Gerlach D, Van Belle S, Turin L, Cordey S, Muehlemann K, Regamey N, Aubert JD, Soccal PM, Eigenmann P, Zdobnov EM, Kaiser L Emerg Infect Dis. 2009 May;15(5):719-726 PMID: 19402957 Rhinoviruses and enteroviruses are leading causes of respiratory infections. To evaluate genotypic diversity and identify forces shaping picornavirus evolution, we screened persons with respiratory illnesses by using rhinovirus-specific or generic real-time PCR assays. We then sequenced the 5 untranslated region, capsid protein VP1, and protease precursor 3CD regions of virus-positive samples. Subsequent phylogenetic analysis identified the large genotypic diversity of rhinoviruses circulating in humans. We identified and completed the genome sequence of a new enterovirus genotype associated with respiratory symptoms and acute otitis media, confirming the close relationship between rhinoviruses and enteroviruses and the need to detect both viruses in respiratory specimens. Finally, we identified recombinants among circulating rhinoviruses and mapped their recombination sites, thereby demonstrating that rhinoviruses can recombine in their natural host. This study clarifies the diversity and explains the reasons for evolution of these viruses.
Expression profiles of Urbilaterian genes uniquely shared between honey bee and vertebratesMatsui T, Yamamoto T, Wyder S, Zdobnov EM, Kadowaki T BMC Genomics 2009, 10:17 PMID: 19138430
Background
Large-scale comparison of metazoan genomes has revealed that a significant fraction of genes of the last common ancestor of Bilateria (Urbilateria) is lost in each animal lineage. This event could be one of the underlying mechanisms involved in generating metazoan diversity. However, the present functions of these ancient genes have not been addressed extensively. To understand the functions and evolutionary mechanisms of such ancient Urbilaterian genes, we carried out comprehensive expression profile analysis of genes shared between vertebrates and honey bees but not with the other sequenced ecdysozoan genomes (honey bee-vertebrate specific, HVS genes) as a model.
Results
We identified 30 honey bee and 55 mouse HVS genes. Many HVS genes exhibited tissue-selective expression patterns; intriguingly, the expression of 60% of honey bee HVS genes was found to be brain enriched, and 24% of mouse HVS genes were highly expressed in either or both the brain and testis. Moreover, a minimum of 38% of mouse HVS genes demonstrated neuron-enriched expression patterns, and 62% of them exhibited expression in selective brain areas, particularly the forebrain and cerebellum. Furthermore, gene ontology (GO) analysis of HVS genes predicted that 35% of genes are associated with DNA transcription and RNA processing.
Conclusions
These results suggest that HVS genes include genes that are biased towards expression in the brain and gonads. They also demonstrate that at least some of Urbilaterian genes retained in the specific animal lineage may be selectively maintained to support the species-specific phenotypes.
Sertoli cell Dicer is essential for spermatogenesis in micePapaioannou MD, Pitetti JL, Ro S, Park C, Aubry F, Schaad O, Vejnar CE, Kühne F, Descombes P, Zdobnov EM, McManus MT, Guillou F, Harfe BD, Yan W, Jégou B, Nef S Dev Biol. 2008 Nov 28. Epub ahead of print PMID: 19071104 Spermatogenesis requires intact, fully competent Sertoli cells. Here, we investigate the functions of Dicer, an RNaseIII endonuclease required for microRNA and small interfering RNA biogenesis, in mouse Sertoli cell function. We show that selective ablation of Dicer in Sertoli cells leads to infertility due to complete absence of spermatozoa and progressive testicular degeneration. The first morphological alterations appear already at postnatal day 5 and correlate with a severe impairment of the prepubertal spermatogenic wave, due to defective Sertoli cell maturation and incapacity to properly support meiosis and spermiogenesis. Importantly, we find several key genes known to be essential for Sertoli cell function to be significantly down-regulated in neonatal testes lacking Dicer in Sertoli cells. Overall, our results reveal novel essential roles played by the Dicer-dependent pathway in mammalian reproductive function, and thus pave the way for new insights into human infertility.
miROrtho: computational survey of microRNA genesGerlach D, Kriventseva EV, Rahman N, Vejnar CE, Zdobnov EM Nucleic Acids Res. 2009 Jan;37(Database issue):D111-D117. Epub 2008 Oct 15 PMID: 18927110 MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature approximately 22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho)presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
The cis-acting replication elements define human enterovirus and rhinovirus speciesCordey S, Gerlach D, Junier T, Zdobnov EM, Kaiser L, Tapparel C. RNA. 2008 Aug;14(8):1568-78 PMID: 18541697 Replication of picornaviruses is dependent on VPg uridylylation, which is linked to the presence of the internal cis-acting replication element (cre). Cre are located within the sequence encoding polyprotein, yet at distinct positions as demonstrated for poliovirus and coxsackievirus-B3, cardiovirus, and human rhinovirus (HRV-A and HRV-B), overlapping proteins 2C, VP2, 2A, and VP1, respectively. Here we report a novel distinct cre element located in the VP2 region of the recently reported HRV-A2 species and provide evolutionary evidence of its functionality. We also experimentally interrogated functionality of recently identified HRV-B cre in the 2C region that is orthologous to the human enterovirus (HEV) cre and show that it is dispensable for replication and appears to be a nonfunctional evolutionary relic. In addition, our mutational analysis highlights two amino acids in the 2C protein that are crucial for replication. Remarkably, we conclude that each genetic clade of HRV and HEV is characterized by a unique functional cre element, where evolutionary success of a new genetic lineage seems to be associated with an invention of a novel cre motif and decay of the ancestral one. Therefore, we propose that cre element could be considered as an additional criterion for human rhinovirus and enterovirus classification.
The genome of the model beetle and pest Tribolium castaneum.Tribolium Genome Sequencing Consortium; Project leader, Richards S; Principal investigators, Gibbs RA, Weinstock GM; White paper, Brown SJ, Denell R, Beeman RW, Gibbs R; Analysis leaders, Beeman RW, Brown SJ, Bucher G, Friedrich M, Grimmelikhuijzen CJ, Klingler M, Lorenzen M, Richards S, Roth S, Schröder R, Tautz D, Zdobnov EM; DNA sequence and global analysis: DNA sequencing, Muzny D, Gibbs RA, Weinstock GM, Attaway T, Bell S, Buhay CJ, Chandrabose MN, Chavez D, Clerk-Blankenburg KP, Cree A, Dao M, Davis C, Chacko J, Dinh H, Dugan-Rocha S, Fowler G, Garner TT, Garnes J, Gnirke A, Hawes A, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Jackson L, Kovar C, Kowis A, Lee S, Lewis LR, Margolis J, Morgan M, Nazareth LV, Nguyen N, Okwuonu G, Parker D, Richards S, Ruiz SJ, Santibanez J, Savard J, Scherer SE, Schneider B, Sodergren E, Tautz D, Vattahil S, Villasana D, White CS, Wright R; EST sequencing, Park Y, Beeman RW, Lord J, Oppert B, Lorenzen M, Brown S, Wang L, Savard J, Tautz D, Richards S, Weinstock G, Gibbs RA; genome assembly, Liu Y, Worley K, Weinstock G; G+C content, Elsik CG, Reese JT, Elhaik E, Landan G, Graur D; repetitive DNA, transposons and telomeres, Arensburger P, Atkinson P, Beeman RW, Beidler J, Brown SJ, Demuth JP, Drury DW, Du YZ, Fujiwara H, Lorenzen M, Maselli V, Osanai M, Park Y, Robertson HM, Tu Z, Wang JJ, Wang S; gene prediction and consensus gene set, Richards S, Song H, Zhang L, Sodergren E, Werner D, Stanke M, Morgenstern B, Solovyev V, Kosarev P, Brown G, Chen HC, Ermolaeva O, Hlavina W, Kapustin Y, Kiryutin B, Kitts P, Maglott D, Pruitt K, Sapojnikov V, Souvorov A, Mackey AJ, Waterhouse RM, Wyder S, Zdobnov EM; global gene content analysis, Zdobnov EM, Wyder S, Kriventseva EV, Kadowaki T, Bork P; Developmental processes and signalling pathways, Aranda M, Bao R, Beermann A, Berns N, Bolognesi R, Bonneton F, Bopp D, Brown SJ, Bucher G, Butts T, Chaumot A, Denell RE, Ferrier DE, Friedrich M, Gordon CM, Jindra M, Klingler M, Lan Q, Lattorff HM, Laudet V, von Levetsow C, Liu Z, Lutz R, Lynch JA, da Fonseca RN, Posnien N, Reuter R, Roth S, Savard J, Schinko JB, Schmitt C, Schoppmeier M, Schröder R, Shippy TD, Simonnet F, Marques-Souza H, Tautz D, Tomoyasu Y, Trauner J, Van der Zee M, Vervoort M, Wittkopp N, Wimmer EA, Yang X; Pest biology, senses, Medea and RNAi: ligand gated ion channels, Jones AK, Sattelle DB; oxidative phosphorylation, Ebert PR; P450 genes, Nelson D, Scott JG, Beeman RW; chitin and cuticular proteins, Muthukrishnan S, Kramer KJ, Arakane Y, Beeman RW, Zhu Q, Hogenkamp D, Dixit R; digestive proteinases, Oppert B, Jiang H, Zou Z, Marshall J, Elpidina E, Vinokurov K, Oppert C; immunity, Zou Z, Evans J, Lu Z, Zhao P, Sumathipala N, Altincicek B, Vilcinskas A, Williams M, Hultmark D, Hetru C, Jiang H; neurohormones and GPCRs, Grimmelikhuijzen CJ, Hauser F, Cazzamali G, Williamson M, Park Y, Li B, Tanaka Y, Predel R, Neupert S, Schachtner J, Verleyen P; neuropeptide processing enzymes, Raible F, Bork P; opsins, Friedrich M; odorant receptors and gustatory receptors, Walden KK, Robertson HM; odorant binding and chemosensory proteins, Angeli S, Forêt S, Bucher G, Schuetz S, Maleszka R, Wimmer EA; Medea, Beeman RW, Lorenzen M; systemic RNAi, Tomoyasu Y, Miller SC, Grossmann D, Bucher G. Nature. 2008 Mar 23 PMID: 18362917 Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.
The Aedes aegypti genome: a comparative perspectiveWaterhouse RM, Wyder S, Zdobnov EM Insect Mol Biol. 2008 Feb;17(1):1-8 PMID: 18237279 The sequencing of the second mosquito genome, Aedes aegypti, in addition to Anopheles gambiae, is a major milestone that will drive molecular-level and genome-wide high-throughput studies of not only these but also other mosquito vectors of human pathogens. Here we overview the ancestry of the mosquito genes, list the major expansions of gene families that may relate to species adaptation processes, as exemplified by CYP9 cytochrome P450 genes, and discuss the conservation of chromosomal gene arrangements among the two mosquitoes and fruit fly. Many more invertebrate genomes are expected to be sequenced in the near future, including additional vectors of human pathogens (see http://www.vectorbase.org), and further comparative analyses will become increasingly refined and informative, hopefully improving our understanding of the genetic basis of phenotypical differences among these species, their vectorial capacity, and ultimately leading to the development of novel disease control strategies.
OrthoDB: the hierarchical catalog of eukaryotic orthologsKriventseva EV, Rahman N, Espinosa O, Zdobnov EM Nucleic Acids Res. 2008 Jan;36(Database issue):D271-5. Epub 2007 Oct 18 PMID: 17947323 The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at orthodb.
Quantification of ortholog losses in insects and vertebratesWyder S, Kriventseva EV, Schroder R, Kadowaki T and Zdobnov EM Genome Biol. 2007 Nov 16;8(11):R242 PMID: 18021399 BackgroundThe increasing number of sequenced insect and vertebrate genomes of variable divergence enables refined comparative analyses to quantify the major modes of animal genome evolution and allows tracing of gene genealogy (orthology) and pinpointing of gene extinctions (losses), which can reveal lineage-specific traits. Results We compared the gene repertoires of 5 vertebrates and 5 insects, including honeybee and Tribolium beetle that represent insect orders outside the previously sequenced Diptera, to consistently quantify losses of orthologous groups of genes. We found hundreds of lost Urbilateria genes in each of the lineages and assessed their phylogenetic origin. The rate of losses correlates well with the species' rates of molecular evolution and radiation times, without distinction between insects and vertebrates, indicating their stochastic nature. Remarkably, this extends to the universal single-copy orthologs, losses of dozens of which have been tolerated in each species. Nevertheless, the propensity for loss differs substantially among genes, where roughly 20% of the orthologs have an 8-fold higher chance of becoming extinct. Extrapolation of our data also suggests that the Urbilateria genome contained more than 7,000 genes.
Conclusions Our results indicate that the seemingly higher number of observed gene losses in insects can be explained by their 2-3 fold higher evolutionary rate. Despite the profound effect of many losses on cellular machinery, overall, they seem to be guided by neutral evolution.
Evolution of genes and genomes on the Drosophila phylogenyDrosophila 12 Genomes Consortium (Zdobnov EM) Nature. 2007 Nov 8;450(7167):203-18. PMID: 17994087 Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic featuresTapparel C, Junier T, Gerlach D, Cordey S, Van Belle S, Perrin L, Zdobnov EM and Kaiser L BMC Genomics. 2007 Jul 10; 8(1):224 PMID: 17623054 Background
Human rhinoviruses (HRV), the most frequent cause of respiratory infections, include 99 different serotypes segregating into two species, A and B. Rhinoviruses share extensive genomic sequence similarity with enteroviruses and both are part of the picornavirus family. Nevertheless they differ significantly at the phenotypic level. The lack of HRV full-length genome sequences and the absence of analysis comparing picornaviruses at the whole genome level limit our knowledge of the genomic features supporting these differences.
Results
Here we report complete genome sequences of 12 HRV-A and HRV-B serotypes, more than doubling the current number of available HRV sequences. The whole-genome maximum-likelihood phylogenetic analysis suggests that HRV-B and human enteroviruses (HEV) diverged from the last common ancestor after their separation from HRV-A. On the other hand, compared to HEV, HRV-B are more related to HRV-A in the capsid and 3B-C regions. We also identified the presence of a 2C cis-acting replication element (cre) in HRV-B that is not present in HRV-A, and that had been previously characterized only in HEV. In contrast to HEV viruses, HRV-A and HRV-B share also markedly lower GC content along the whole genome length.
Conclusions
Our findings provide basis to speculate about both the biological similarities and the differences (e.g. tissue tropism, temperature adaptation or acid lability) of these three groups of viruses.
Evolutionary dynamics of immune-related genes and pathways in disease vector mosquitoesWaterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, Carolina Barillas-Mury C, Bian G, Blandin S, Bruce M. Christensen BM, Dong Y, Jiang H, Kanost MR, Koutsos AC, Levashina EA, Li J, Ligoxygakis P, MacCallum RM, Mayhew GF, Mendes A, Michel K, Osta MA, Paskewitz S, Shin SW, Vlachou D, Wang L, Wei W, Zheng L, Zou Z, Severson DW, Raikhel AS, Kafatos FC, Dimopoulos G, Zdobnov EM George K. Christophides GK Science. 2007 Jun 22;316(5832):1738-43. PMID: 17588928 Mosquitoes are vectors of parasitic and viral diseases of immense importance for public health.
The genome sequence of the yellow fever and Dengue vector, Aedes aegypti (Aa), has enabled a
comparative phylogenomic analysis of the insect immune repertoire: in Aa, the malaria vector
Anopheles gambiae (Ag) and the fruitfly Drosophila melanogaster (Dm). Analysis of immune
signaling pathways and response modules reveals both conservative and rapidly evolving features
associated with different functional gene categories and particular aspects of immune reactions.
These dynamics reflect in part continuous readjustment between accommodation and rejection of
pathogens and suggest how innate immunity may have evolved.
Life cycle transcriptome of the malaria mosquito Anopheles gambiae and comparison with the fruitfly Drosophila melanogasterKoutsos AC, Blass C, Meister S, Schmidt S, Maccallum RM, Soares MB, Collins FH, Benes V, Zdobnov EM, Kafatos FC, Christophides GK Proc Natl Acad Sci USA. 2007 Jun 11 PMID: 17563388 The African mosquito Anopheles gambiae is the major vector of human malaria. We report a genome-wide survey of mosquito gene expression profiles clustered temporally into developmental programs and spatially into adult tissue-specific patterns. Global expression analysis shows that genes that belong to related functional categories or that encode the same or functionally linked protein domains are associated with characteristic developmental programs or tissue patterns. Comparative analysis of our data together with data published from Drosophila melanogaster reveal an overall strong and positive correlation of developmental expression between orthologous genes. The degree of correlation varies, depending on association of orthologs with certain developmental programs or functional groups. Interestingly, the similarity of gene expression is not correlated with the coding sequence similarity of orthologs, indicating that expression profiles and coding sequences evolve independently. In addition to providing a comprehensive view of temporal and spatial gene expression during the A. gambiae life cycle, this large-scale comparative transcriptomic analysis has detected important evolutionary features of insect transcriptomes.
Computational and transcriptional evidence for microRNAs in the honey bee genomeWeaver DB, Anzola JM, Evans JD, Reid JG, Reese JT, Childs KL, Zdobnov EM, Samanta MP, Miller J, Elsik CG Genome Biol. 2007 Jun 1;8(6):R97 PMID: 17543122 BACKGROUND: Noncoding microRNAs (miRNAs) are key regulators of gene expression in eukaryotes. Insect miRNAs help regulate the levels of proteins involved with development, metabolism, and other life history traits. The recently sequenced honey bee genome provides an opportunity to detect novel miRNAs in both this species and others, and begin to infer the roles of miRNAs in honey bee development.
RESULTS: Three independent computational surveys of the assembled honey bee genome identified a total of 68 non-redundant candidate miRNAs, several of which appear to have previously unrecognized orthologs in the Drosophila genome. A subset of these candidate miRNAs were screened for expression by qRT-PCR and/or genome tiling arrays and most predicted miRNA's were confirmed as being expressed in at least one honey bee tissue. Interestingly, the transcript abundance for several known and novel miRNAs displayed caste or age-related differences in honey bees. Genes in proximity to miRNAs in the bee genome are disproportionately associated with the GO terms "physiological process", "nucleus" and "response to stress".
CONCLUSIONS: Computational approaches successfully identified miRNAs in the honey bee and indicated previously unrecognized miRNAs in the well-studied Drosophila melanogaster genome despite the 280MYA distance between these insects. Differentially transcribed miRNAs are likely to be involved in regulating honey bee development, and arguably in the extreme developmental switch between sterile worker bees and highly fertile queens.
Genome Sequence of Aedes aegypti, a Major Arbovirus VectorNene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, Loftus B, Xi Z, Megy K, Grabherr M, Ren Q, Zdobnov EM, Lobo NF, Campbell KS, Brown SE, Bonaldo MF, Zhu J, Sinkins SP, Hogenkamp DG, Amedo P, Arsenburger P, Atkinson PW, Bidwell S, Biedler J, Birney E, Bruggner RV, Costas J, Coy MR, Crabtree J, Crawford M, Debruyn B, Decaprio D, Eiglmeier K, Eisenstadt E, El-Dorry H, Gelbart WM, Gomes SL, Hammond M, Hannick LI, Hogan JR, Holmes MH, Jaffe D, Johnston SJ, Kennedy RC, Koo H, Kravitz S, Kriventseva EV, Kulp D, Labutti K, Lee E, Li S, Lovin DD, Mao C, Mauceli E, Menck CF, Miller JR, Montgomery P, Mori A, Nascimento AL, Naveira HF, Nusbaum C, O'leary SB, Orvis J, Pertea M, Quesneville H, Reidenbach KR, Rogers YH, Roth CW, Schneider JR, Schatz M, Shumway M, Stanke M, Stinson EO, Tubio JM, Vanzee JP, Verjovski-Almeida S, Werner D, White O, Wyder S, Zeng Q, Zhao Q, Zhao Y, Hill CA, Raikhel AS, Soares MB, Knudson DL, Lee NH, Galagan J, Salzberg SL, Paulsen IT, Dimopoulos G, Collins FH, Bruce B, Fraser-Liggett CM, Severson DW. Science. 2007 Jun 22;316(5832):1718-23. Epub 2007 May 17. PMID: 17510324 We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ~1.38 Gbp is ~5-fold larger in size than the genome of the malaria vector, Anopheles gambiae. Nearly 50% of the Aedes aegypti genome consists of transposable elements. These contribute to a ~4-6 fold increase in average gene length and the size of intergenic regions relative to Anopheles gambiae and Drosophila melanogaster. Nevertheless, chromosomal synteny is generally maintained between all three insects although conservation of orthologous gene order is higher (~2-fold) between the mosquito species than between either of them and fruit fly. An increase in genes encoding odorant binding, cytochrome P450 and cuticle domains relative to Anopheles gambiae suggests that members of these protein families underpin some of the biological differences between them.
Quantification of insect genome divergenceZdobnov EM, Bork P Trends Genet. 2007 Jan;23(1):16-20. Epub 2006 Nov 9. PMID: 17097187 The recent sequencing of twelve insect genomes has enabled us to quantify their divergence using synteny conservation and sequence identity of single-copy orthologs. Protein identity correlates well with synteny and is about three times more conserved, an observation consistent with comparisons among vertebrates. The observed distribution of the lengths of synteny blocks follows a power law and differs from the expectations of the currently accepted random breakage model. Our results show that there is only limited selection for conservation of gene order and reveal a few hundred genes, proximity among which seems to be vital.
Overgrowth caused by misexpression of a microRNA with dispensable wild-type functionNairz K, Rottig C, Rintelen F, Zdobnov EM, Moser M, Hafen E. Dev Biol. 2006 Mar 15;291(2):314-24. Epub 2006 Jan 27 PMID: 16443211 MicroRNAs (miRNAs) represent an abundant class of non-coding RNAs that negatively regulate gene expression, primarily at the post-transcriptional level. miRNA genes are frequently located in proximity to fragile chromosomal sites associated with cancers and amplification of a miRNA cluster has been correlated with the etiology of lymphomas and solid tumors. The oncogenic potential of a miRNA polycistron has recently been demonstrated in vivo. Here, we show that misexpression of the Drosophila miRNA mirvana/mir-278 in the developing eye causes massive overgrowth, in part due to inhibition of apoptosis. A single base substitution affecting the mature miRNA blocks the gain-of-function phenotype but is not associated with a detectable reduction-of-function phenotype when homozygous. This result demonstrates that misexpressed miRNAs may acquire novel functions that cause unscheduled proliferation in vivo and thus exemplifies the potential of miRNAs to promote tumor formation.
AnoEST: toward A. gambiae functional genomicsKriventseva EV, Koutsos AC, Blass C, Kafatos FC, Christophides GK, Zdobnov EM Genome Res. 2005 Jun;15(6):893-9. Epub 2005 May 17 PMID: 15899967 Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria Anopheles gambiae structured into the AnoEST database. The expressed sequences are grouped into clusters using genomic sequence as template and associated with inferred functional annotation, including the following: corresponding Ensembl gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups. AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, we have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, we found that clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology to known proteins. Consequently, we expect that many as yet unconfirmed clusters are likely to be actual A. gambiae genes. [AnoEST is publicly available at http://komar.embl.de, and is also accessible as a Distributed Annotation Service (DAS).].
Consistency of genome-based methods in measuring Metazoan evolutionZdobnov EM, von Mering C, Letunic I, Bork P FEBS Lett. 2005 Jun 13;579(15):3355-61. Epub 2005 Apr 18 PMID: 15943981 Seven distinct genome-wide divergence measures were applied pairwise to the nine sequenced animal genomes of human, mouse, rat, chicken, pufferfish, fruit fly, mosquito, and two nematode worms (Caenorhabditis briggsae and Caenorhabditis elegans). Qualitatively, all of these divergence measures are found to correlate with the estimated time since speciation; however, marked deviations are observed in a few lineages. The distinct genome divergence measures also correlate well among themselves, indicating that most of the processes shaping genomes are dominated by neutral events. The deviations from the clock-like scenario in some lineages are observed consistently by several measures, implicitly confirming their reliability.
Protein coding potential of retroviruses and other transposable elements in vertebrate genomesZdobnov EM, Campillos M, Harrington ED, Torrents D, Bork P Nucleic Acids Res. 2005 Feb 16;33(3):946-54. Print 2005 PMID: 15716312 We suggest an annotation strategy for genes encoded by retroviruses and transposable elements (RETRA genes) based on a set of marker protein domains. Usually RETRA genes are masked in vertebrate genomes prior to the application of automated gene prediction pipelines under the assumption that they provide no selective advantage to the host. Yet, we show that about 1000 genes in four vertebrate gene sets analyzed contain at least one RETRA gene marker domain. Using the conservation of genomic neighborhood (synteny), we were able to discriminate between RETRA genes with putative functionality in the vertebrates and those that probably function only in the context of mobile elements. We identified 35 such genes in human, along with their corresponding mouse and rat orthologs; which included almost all known human genes with similarity to mobile elements. The results also imply that the vast majority of the remaining RETRA genes in current gene sets are unlikely to encode vertebrate functions. To automatically annotate RETRA genes in other vertebrate genomes, we provide as a tool a set of marker protein domains and a manually refined list of domesticated or ancestral RETRA genes for rescuing genes with vertebrate functions.
Genome evolution reveals biochemical networks and functional modulesvon Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, Ouzounis CA, Bork P Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15428-33. Epub 2003 Dec 12 PMID: 14673105 The analysis of completely sequenced genomes uncovers an astonishing variability between species in terms of gene content and order. During genome history, the genes are frequently rear-ranged, duplicated, lost, or transferred horizontally between genomes. These events appear to be stochastic, yet they are under selective constraints resulting from the functional interactions between genes. These genomic constraints form the basis for a variety of techniques that employ systematic genome comparisons to predict functional associations among genes. The most powerful techniques to date are based on conserved gene neighborhood, gene fusion events, and common phylogenetic distributions of gene families. Here we show that these techniques, if integrated quantitatively and applied to a sufficiently large number of genomes, have reached a resolution which allows the characterization of function at a higher level than that of the individual gene: global modularity becomes detectable in a functional protein network. In Escherichia coli, the predicted modules can be bench-marked by comparison to known metabolic pathways. We found as many as 74% of the known metabolic enzymes clustering together in modules, with an average pathway specificity of at least 84%. The modules extend beyond metabolism, and have led to hundreds of reliable functional predictions both at the protein and pathway level. The results indicate that modularity in protein networks is intrinsically encoded in present-day genomes.
The InterPro Database, 2003 brings increased coverage and new featuresMulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM Nucleic Acids Res. 2003 Jan 1;31(1):315-8 PMID: 12520011 InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Initial sequencing and comparative analysis of the mouse genomeMouse Genome Sequencing Consortium; Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES Nature. 2002 Dec 5;420(6915):520-62 PMID: 12466850 The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogasterZdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM, Mueller HM, Dimopoulos G, Law JH, Wells MA, Birney E, Charlab R, Halpern AL, Kokoza E, Kraft CL, Lai Z, Lewis S, Louis C, Barillas-Mury C, Nusskern D, Rubin GM, Salzberg SL, Sutton GG, Topalis P, Wides R, Wincker P, Yandell M, Collins FH, Ribeiro J, Gelbart WM, Kafatos FC, Bork P Science. 2002 Oct 4;298(5591):149-59 PMID: 12364792 Comparison of the genomes and proteomes of the two diptera Anopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small "microsyntenic" clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.
The genome sequence of the malaria mosquito Anopheles gambiaeHolt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL Science. 2002 Oct 4;298(5591):129-49 PMID: 12364791 Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
The EBI SRS server-new featuresZdobnov EM, Lopez R, Apweiler R, Etzold T Bioinformatics. 2002 Aug;18(8):1149-50 PMID: 12176845 MOTIVATION: Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. AVAILABILITY: SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.
Comparative genomic analysis in the region of a major Plasmodium-refractoriness locus of Anopheles gambiaeThomasova D, Ton LQ, Copley RR, Zdobnov EM, Wang X, Hong YS, Sim C, Bork P, Kafatos FC, Collins FH Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):8179-84 PMID: 12060762 We have sequenced six overlapping clones from a library of bacterial artificial chromosome (BAC) clones derived from a laboratory strain of the mosquito, Anopheles gambiae, the major vector of human malaria in Africa. The resulting uninterrupted 528-kb sequence is from the 8C region of the mosquito 2R chromosome, at or very near the major refractoriness locus associated with melanotic encapsulation of parasites. This sequence represents the first extensive view of the mosquito genome structure encompassing 48 genes. Genomic comparison reveals that the majority of the orthologues are found in six microsyntenic clusters in Drosophila melanogaster. A BAC clone that is wholly contained within this region demonstrates the existence of a remarkable degree of local polymorphism in this species, which may prove important for its population structure and vectorial capacity.
The EBI SRS server--recent developmentsZdobnov EM, Lopez R, Apweiler R, Etzold T Bioinformatics. 2002 Feb;18(2):368-73 PMID: 11847095 MOTIVATION: Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. AVAILABILITY: SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.
Experimental evidence for slipped loop DNA, a novel folding type for polynucleotide chainMinyat EE, Khomyakova EB, Petrova MV, Zdobnov EM, Ivanov VI. J Biomol Struct Dyn. 1995 Dec;13(3):523-7 PMID: 8825732 DNA regions with short direct repeats (5-7bp) with a spacer in between, when under super-helical stress, are known to become susceptible to single-strand specific nuclease S1. This is in accord with formation of two shifted loops protruding from the opposite chains. Such type of folding could have been additionally stabilized by base pairing between the complementary parts of the loops that explains existence of the protected from S1 moieties of the loops. To test this possibility we designed and synthesized an oligonucleotide of 56 bases, so that it forms a hairpin with a stem which fails to acquire a traditional helix due to a special sequence but may favor the formation of the proposed Slipped Loop Structure (SLS). The oligonucleotide folding was studied by a chemical modification method at one nucleotide level resolution. Three zones, protected from the used probes were found: the one that forms the stem, and the others that are located within the two by-loops in those moieties which have the base pairing potential. Proceeding from the data obtained and stereochemical analysis a 3-D scheme for the SLS form of DNA is suggested.
|