Viewport Size Code:
Login | Create New Account


About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot


Bibliography Options Menu

Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.


ESP: PubMed Auto Bibliography 29 May 2023 at 01:32 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2023-05-27

Lobb B, Shapter A, Doxey AC, et al (2023)

Functional Profiling and Evolutionary Analysis of a Marine Microalgal Virus Pangenome.

Viruses, 15(5): pii:v15051116.

Phycodnaviridae are large double-stranded DNA viruses, which facilitate studies of host-virus interactions and co-evolution due to their prominence in algal infection and their role in the life cycle of algal blooms. However, the genomic interpretation of these viruses is hampered by a lack of functional information, stemming from the surprising number of hypothetical genes of unknown function. It is also unclear how many of these genes are widely shared within the clade. Using one of the most extensively characterized genera, Coccolithovirus, as a case study, we combined pangenome analysis, multiple functional annotation tools, AlphaFold structural modeling, and literature analysis to compare the core and accessory pangenome and assess support for novel functional predictions. We determined that the Coccolithovirus pangenome shares 30% of its genes with all 14 strains, making up the core. Notably, 34% of its genes were found in at most three strains. Core genes were enriched in early expression based on a transcriptomic dataset of Coccolithovirus EhV-201 algal infection, were more likely to be similar to host proteins than the non-core set, and were more likely to be involved in vital functions such as replication, recombination, and repair. In addition, we generated and collated annotations for the EhV representative EhV-86 from 12 different annotation sources, building up information for 142 previously hypothetical and putative membrane proteins. AlphaFold was further able to predict structures for 204 EhV-86 proteins with a modelling accuracy of good-high. These functional clues, combined with generated AlphaFold structures, provide a foundational framework for the future characterization of this model genus (and other giant viruses) and a further look into the evolution of the Coccolithovirus proteome.

RevDate: 2023-05-27

Xia L, Wang H, Zhao X, et al (2023)

Chloroplast Pan-Genomes and Comparative Transcriptomics Reveal Genetic Variation and Temperature Adaptation in the Cucumber.

International journal of molecular sciences, 24(10): pii:ijms24108943.

Although whole genome sequencing, genetic variation mapping, and pan-genome studies have been done on a large group of cucumber nuclear genomes, organelle genome information is largely unclear. As an important component of the organelle genome, the chloroplast genome is highly conserved, which makes it a useful tool for studying plant phylogeny, crop domestication, and species adaptation. Here, we have constructed the first cucumber chloroplast pan-genome based on 121 cucumber germplasms, and investigated the genetic variations of the cucumber chloroplast genome through comparative genomic, phylogenetic, haplotype, and population genetic structure analysis. Meanwhile, we explored the changes in expression of cucumber chloroplast genes under high- and low-temperature stimulation via transcriptome analysis. As a result, a total of 50 complete chloroplast genomes were successfully assembled from 121 cucumber resequencing data, ranging in size from 156,616-157,641 bp. The 50 cucumber chloroplast genomes have typical quadripartite structures, consisting of a large single copy (LSC, 86,339-86,883 bp), a small single copy (SSC, 18,069-18,363 bp), and two inverted repeats (IRs, 25,166-25,797 bp). Comparative genomic, haplotype, and population genetic structure results showed that there is more genetic variation in Indian ecotype cucumbers compared to other cucumber cultivars, which means that many genetic resources remain to be explored in Indian ecotype cucumbers. Phylogenetic analysis showed that the 50 cucumber germplasms could be classified into 3 types: East Asian, Eurasian + Indian, and Xishuangbanna + Indian. The transcriptomic analysis showed that matK were significantly up-regulated under high- and low-temperature stresses, further demonstrating that cucumber chloroplasts respond to temperature adversity by regulating lipid metabolism and ribosome metabolism. Further, accD has higher editing efficiency under high-temperature stress, which may contribute to the heat tolerance. These studies provide useful insight into genetic variation in the chloroplast genome, and established the foundation for exploring the mechanisms of temperature-stimulated chloroplast adaptation.

RevDate: 2023-05-27

Dey S, Gaur M, Sykes EME, et al (2023)

Unravelling the Evolutionary Dynamics of High-Risk Klebsiella pneumoniae ST147 Clones: Insights from Comparative Pangenome Analysis.

Genes, 14(5): pii:genes14051037.

BACKGROUND: The high prevalence and rapid emergence of antibiotic resistance in high-risk Klebsiella pneumoniae (KP) ST147 clones is a global health concern and warrants molecular surveillance.

METHODS: A pangenome analysis was performed using publicly available ST147 complete genomes. The characteristics and evolutionary relationships among ST147 members were investigated through a Bayesian phylogenetic analysis.

RESULTS: The large number of accessory genes in the pangenome indicates genome plasticity and openness. Seventy-two antibiotic resistance genes were found to be linked with antibiotic inactivation, efflux, and target alteration. The exclusive detection of the blaOXA-232 gene within the ColKp3 plasmid of KP_SDL79 suggests its acquisition through horizontal gene transfer. The association of seventy-six virulence genes with the acrAB efflux pump, T6SS system and type I secretion system describes its pathogenicity. The presence of Tn6170, a putative Tn7-like transposon in KP_SDL79 with an insertion at the flanking region of the tnsB gene, establishes its transmission ability. The Bayesian phylogenetic analysis estimates ST147's initial divergence in 1951 and the most recent common ancestor for the entire KP population in 1621.

CONCLUSIONS: Present study highlights the genetic diversity and evolutionary dynamics of high-risk clones of K. pneumoniae. Further inter-clonal diversity studies will help us understand its outbreak more precisely and pave the way for therapeutic interventions.

RevDate: 2023-05-25

Jha UC, Nayyar H, Chattopadhyay A, et al (2023)

Major viral diseases in grain legumes: designing disease resistant legumes from plant breeding and OMICS integration.

Frontiers in plant science, 14:1183505.

Grain legumes play a crucial role in human nutrition and as a staple crop for low-income farmers in developing and underdeveloped nations, contributing to overall food security and agroecosystem services. Viral diseases are major biotic stresses that severely challenge global grain legume production. In this review, we discuss how exploring naturally resistant grain legume genotypes within germplasm, landraces, and crop wild relatives could be used as promising, economically viable, and eco-environmentally friendly solution to reduce yield losses. Studies based on Mendelian and classical genetics have enhanced our understanding of key genetic determinants that govern resistance to various viral diseases in grain legumes. Recent advances in molecular marker technology and genomic resources have enabled us to identify genomic regions controlling viral disease resistance in various grain legumes using techniques such as QTL mapping, genome-wide association studies, whole-genome resequencing, pangenome and 'omics' approaches. These comprehensive genomic resources have expedited the adoption of genomics-assisted breeding for developing virus-resistant grain legumes. Concurrently, progress in functional genomics, especially transcriptomics, has helped unravel underlying candidate gene(s) and their roles in viral disease resistance in legumes. This review also examines the progress in genetic engineering-based strategies, including RNA interference, and the potential of synthetic biology techniques, such as synthetic promoters and synthetic transcription factors, for creating viral-resistant grain legumes. It also elaborates on the prospects and limitations of cutting-edge breeding technologies and emerging biotechnological tools (e.g., genomic selection, rapid generation advances, and CRISPR/Cas9-based genome editing tool) in developing virus-disease-resistant grain legumes to ensure global food security.

RevDate: 2023-05-25

Groza C, Chen X, Pacis A, et al (2023)

Genome graphs detect human polymorphisms in active epigenomic state during influenza infection.

Cell genomics, 3(5):100294 pii:S2666-979X(23)00060-5.

Genetic variants, including mobile element insertions (MEIs), are known to impact the epigenome. We hypothesized that genome graphs, which encapsulate genetic diversity, could reveal missing epigenomic signals. To test this, we sequenced the epigenome of monocyte-derived macrophages from 35 ancestrally diverse individuals before and after influenza infection, allowing us to investigate the role of MEIs in immunity. We characterized genetic variants and MEIs using linked reads and built a genome graph. Mapping epigenetic data revealed 2.3%-3% novel peaks for H3K4me1, H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq), and ATAC-seq. Additionally, the use of a genome graph modified some quantitative trait loci estimates and revealed 375 polymorphic MEIs in an active epigenomic state. Among these is an AluYh3 polymorphism whose chromatin state changed after infection and was associated with the expression of TRIM25, a gene that restricts influenza RNA synthesis. Our results demonstrate that graph genomes can reveal regulatory regions that would have been overlooked by other approaches.

RevDate: 2023-05-25

Tonkin-Hill G, Corander J, J Parkhill (2023)

Challenges in prokaryote pangenomics.

Microbial genomics, 9(5):.

Horizontal gene transfer (HGT) and the resulting patterns of gene gain and loss are a fundamental part of bacterial evolution. Investigating these patterns can help us to understand the role of selection in the evolution of bacterial pangenomes and how bacteria adapt to a new niche. Predicting the presence or absence of genes can be a highly error-prone process that can confound efforts to understand the dynamics of horizontal gene transfer. This review discusses both the challenges in accurately constructing a pangenome and the potential consequences errors can have on downstream analyses. We hope that by summarizing these issues researchers will be able to avoid potential pitfalls, leading to improved bacterial pangenome analyses.

RevDate: 2023-05-24

Wisecaver JH, Auber RP, Pendleton AL, et al (2023)

Extreme genome diversity and cryptic speciation in a harmful algal-bloom-forming eukaryote.

Current biology : CB pii:S0960-9822(23)00597-3 [Epub ahead of print].

Harmful algal blooms of the toxic haptophyte Prymnesium parvum are a recurrent problem in many inland and estuarine waters around the world. Strains of P. parvum vary in the toxins they produce and in other physiological traits associated with harmful algal blooms, but the genetic basis for this variation is unknown. To investigate genome diversity in this morphospecies, we generated genome assemblies for 15 phylogenetically and geographically diverse strains of P. parvum, including Hi-C guided, near-chromosome-level assemblies for two strains. Comparative analysis revealed considerable DNA content variation between strains, ranging from 115 to 845 Mbp. Strains included haploids, diploids, and polyploids, but not all differences in DNA content were due to variation in genome copy number. Haploid genome size between strains of different chemotypes differed by as much as 243 Mbp. Syntenic and phylogenetic analyses indicate that UTEX 2797, a common laboratory strain from Texas, is a hybrid that retains two phylogenetically distinct haplotypes. Investigation of gene families variably present across the strains identified several functional categories associated with metabolic and genome size variation in P. parvum, including genes for the biosynthesis of toxic metabolites and proliferation of transposable elements. Together, our results indicate that P. parvum comprises multiple cryptic species. These genomes provide a robust phylogenetic and genomic framework for investigations into the eco-physiological consequences of the intra- and inter-specific genetic variation present in P. parvum and demonstrate the need for similar resources for other harmful algal-bloom-forming morphospecies.

RevDate: 2023-05-24

Tchan BGO, Ngazoa-Kakou S, Aka N, et al (2023)

PPE Barcoding Identifies Biclonal Mycobacterium ulcerans Buruli Ulcer, Côte d'Ivoire.

Microbiology spectrum [Epub ahead of print].

Mycobacterium ulcerans, an environmental opportunistic pathogen, causes necrotic cutaneous and subcutaneous lesions, named Buruli ulcers, in tropical countries. PCR-derived tests used to detect M. ulcerans in environmental and clinical samples do not allow one-shot detection, identification, and typing of M. ulcerans among closely related Mycobacterium marinum complex mycobacteria. We established a 385-member M. marinum/M. ulcerans complex whole-genome sequence database by assembling and annotating 341 M. marinum/M. ulcerans complex genomes and added 44 M. marinum/M. ulcerans complex whole-genome sequences already deposited in the NCBI database. Pangenome, core genome, and single-nucleotide polymorphism (SNP) distance-based comparisons sorted the 385 strains into 10 M. ulcerans taxa and 13 M. marinum taxa, correlating with the geographic origin of strains. Aligning conserved genes identified one PPE (proline-proline-glutamate) gene sequence to be species and intraspecies specific, thereby genotyping the 23 M. marinum/M. ulcerans complex taxa. PCR sequencing of the PPE gene correctly genotyped nine M. marinum/M. ulcerans complex isolates among one M. marinum taxon and three M. ulcerans taxa in the African taxon (T2.4). Further, successful PPE gene PCR sequencing in 15/21 (71.4%) swabs collected from suspected Buruli ulcer lesions in Côte d'Ivoire exhibited positive M. ulcerans IS2404 real-time PCR and identified the M. ulcerans T2.4.1 genotype in eight swabs and M. ulcerans T2.4.1/T2.4.2 mixed genotypes in seven swabs. PPE gene sequencing could be used as a proxy for whole-genome sequencing for the one-shot detection, identification, and typing of clinical M. ulcerans strains, offering an unprecedented tool for identifying M. ulcerans mixed infections. IMPORTANCE We describe a new targeted sequencing approach that characterizes the PPE gene to disclose the simultaneous presence of different variants of a single pathogenic microorganism. This approach has direct implications on the understanding of pathogen diversity and natural history and potential therapeutic implications when dealing with obligate and opportunistic pathogens, such as Mycobacterium ulcerans presented here as a prototype.

RevDate: 2023-05-23

Drott MT, Park SC, Wang YW, et al (2023)

Pangenomics of the death cap mushroom Amanita phalloides, and of Agaricales, reveals dynamic evolution of toxin genes in an invasive range.

The ISME journal [Epub ahead of print].

The poisonous European mushroom Amanita phalloides (the "death cap") is invading California. Whether the death caps' toxic secondary metabolites are evolving as it invades is unknown. We developed a bioinformatic pipeline to identify the MSDIN genes underpinning toxicity and probed 88 death cap genomes from an invasive Californian population and from the European range, discovering a previously unsuspected diversity of MSDINs made up of both core and accessory elements. Each death cap individual possesses a unique suite of MSDINs, and toxin genes are significantly differentiated between Californian and European samples. MSDIN genes are maintained by strong natural selection, and chemical profiling confirms MSDIN genes are expressed and result in distinct phenotypes; our chemical profiling also identified a new MSDIN peptide. Toxin genes are physically clustered within genomes. We contextualize our discoveries by probing for MSDINs in genomes from across the order Agaricales, revealing MSDIN diversity originated in independent gene family expansions among genera. We also report the discovery of an MSDIN in an Amanita outside the "lethal Amanitas" clade. Finally, the identification of an MSDIN gene and its associated processing gene (POPB) in Clavaria fumosa suggest the origin of MSDINs is older than previously suspected. The dynamic evolution of MSDINs underscores their potential to mediate ecological interactions, implicating MSDINs in the ongoing invasion. Our data change the understanding of the evolutionary history of poisonous mushrooms, emphasizing striking parallels to convergently evolved animal toxins. Our pipeline provides a roadmap for exploring secondary metabolites in other basidiomycetes and will enable drug prospecting.

RevDate: 2023-05-22

Leonard AS, Crysnanto D, Mapel XM, et al (2023)

Graph construction method impacts variation representation and analyses in a bovine super-pangenome.

Genome biology, 24(1):124.

BACKGROUND: Several models and algorithms have been proposed to build pangenomes from multiple input assemblies, but their impact on variant representation, and consequently downstream analyses, is largely unknown.

RESULTS: We create multi-species super-pangenomes using pggb, cactus, and minigraph with the Bos taurus taurus reference sequence and eleven haplotype-resolved assemblies from taurine and indicine cattle, bison, yak, and gaur. We recover 221 k nonredundant structural variations (SVs) from the pangenomes, of which 135 k (61%) are common to all three. SVs derived from assembly-based calling show high agreement with the consensus calls from the pangenomes (96%), but validate only a small proportion of variations private to each graph. Pggb and cactus, which also incorporate base-level variation, have approximately 95% exact matches with assembly-derived small variant calls, which significantly improves the edit rate when realigning assemblies compared to minigraph. We use the three pangenomes to investigate 9566 variable number tandem repeats (VNTRs), finding 63% have identical predicted repeat counts in the three graphs, while minigraph can over or underestimate the count given its approximate coordinate system. We examine a highly variable VNTR locus and show that repeat unit copy number impacts the expression of proximal genes and non-coding RNA.

CONCLUSIONS: Our findings indicate good consensus between the three pangenome methods but also show their individual strengths and weaknesses that need to be considered when analysing different types of variants from multiple input assemblies.

RevDate: 2023-05-22

Anonymous (2023)

Combining reference genomes into a pangenome graph improves accuracy and reduces bias.

Nature biotechnology [Epub ahead of print].

RevDate: 2023-05-22

Geoffroy V, Lamouche JB, Guignard T, et al (2023)

The AnnotSV webserver in 2023: updated visualization and ranking.

Nucleic acids research pii:7175348 [Epub ahead of print].

Much of the human genetics variant repertoire is composed of single nucleotide variants (SNV) and small insertion/deletions (indel) but structural variants (SV) remain a major part of our modified DNA. SV detection has often been a complex question to answer either because of the necessity to use different technologies (array CGH, SNP array, Karyotype, Optical Genome Mapping…) to detect each category of SV or to get an appropriate resolution (Whole Genome Sequencing). Thanks to the deluge of pangenomic analysis, Human geneticists are accumulating SV and their interpretation remains time consuming and challenging. The AnnotSV webserver ( aims at being an efficient tool to (i) annotate and interpret SV potential pathogenicity in the context of human diseases, (ii) recognize potential false positive variants from all the SV identified and (iii) visualize the patient variants repertoire. The most recent developments in the AnnotSV webserver are: (i) updated annotations sources and ranking, (ii) three novel output formats to allow diverse utilization (analysis, pipelines), as well as (iii) two novel user interfaces including an interactive circos view.

RevDate: 2023-05-22

Fan J, Singh NP, Khan J, et al (2023)

Fulgor: A fast and compact k -mer index for large-scale matching and color queries.

bioRxiv : the preprint server for biology pii:2023.05.09.539895.

UNLABELLED: The problem of sequence identification or matching - determining the subset of references from a given collection that are likely to contain a query nucleotide sequence - is relevant for many important tasks in Computational Biology, such as metagenomics and pan-genome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. The reference collection should therefore be pre-processed into an index for fast queries. This poses the threefold challenge of designing an index that is efficient to query, has light memory usage, and scales well to large collections. To solve this problem, we describe how recent advancements in associative, order-preserving, k -mer dictionaries can be combined with a compressed inverted index to implement a fast and compact colored de Bruijn graph data structure. This index takes full advantage of the fact that unitigs in the colored de Bruijn graph are monochromatic (all k -mers in a unitig have the same set of references of origin, or "color"), leveraging the order-preserving property of its dictionary. In fact, k -mers are kept in unitig order by the dictionary, thereby allowing for the encoding of the map from k -mers to their inverted lists in as little as 1 + o (1) bits per unitig. Hence, one inverted list per unitig is stored in the index with almost no space/time overhead. By combining this property with simple but effective compression methods for inverted lists, the index achieves very small space. We implement these methods in a tool called Fulgor. Compared to Themisto, the prior state of the art, Fulgor indexes a heterogeneous collection of 30,691 bacterial genomes in 3.8 × less space, a collection of 150,000 Salmonella enterica genomes in approximately 2 × less space, and is at least twice as fast for color queries.

Applied computing → Bioinformatics.

RevDate: 2023-05-22

Ferrero-Serrano Á, Chakravorty D, Kirven KJ, et al (2023)

Oryza CLIMtools: An Online Portal for Investigating Genome-Environment Associations in Rice.

bioRxiv : the preprint server for biology pii:2023.05.10.540241.

Elite crop varieties display an evident mismatch between their current distributions and the suitability of the local climate for their productivity. To this end, we present Oryza CLIMtools, ( the first pan-genome prediction of climate-associated genetic variants in a crop species. This resource consists of interactive web-based databases that allow the user to: i) explore the local environment and its interaction with natural existing genetic variation in local rice varieties (landraces) in South-Eastern Asia, and; ii) investigate the environment × genome associations for 658 Indica and 283 Japonica rice landrace accessions included in the 3K Rice Genomes Project and previously collected from their geo-referenced local environments. We exemplify the value of these resources, identifying an interplay between flowering time and temperature in the local environment that is facilitated by adaptive natural variation in OsHD2 and disrupted by maladaptive variation in OsSOC1 . Prior QTL analysis has suggested the importance of heterotrimeric G proteins in the control of agronomic traits. Accordingly, we analyzed the climate associations of the different heterotrimeric G protein subunits. We identified a coordinated role of G proteins in adaptation to the prevailing Potential Evapotranspiration gradient and their regulation of key agronomic traits including plant height, seed, and panicle length. We conclude by highlighting the prospect of targeting heterotrimeric G proteins to produce crops that are climate-change-ready.

RevDate: 2023-05-22

Zachariasen T, Petersen AØ, Brejnrod A, et al (2023)

Identification of representative species-specific genes for abundance measurements.

Bioinformatics advances, 3(1):vbad060.

MOTIVATION: Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relative abundance and used as markers of each metagenomic species with high accuracy.

RESULTS: An initial set of the 100 genes that correlate with the median gene abundance profile of the entity is selected. A variant of the coupon collector's problem was utilized to evaluate the probability of identifying a certain number of unique genes in a sample. This allows us to reject the abundance measurements of strains exhibiting a significantly skewed gene representation. A rank-based negative binomial model is employed to assess the performance of different gene sets across a large set of samples, facilitating identification of an optimal signature gene set for the entity. When benchmarked the method on a synthetic gene catalog, our optimized signature gene sets estimate relative abundance significantly closer to the true relative abundance compared to the starting gene sets extracted from the metagenomic species. The method was able to replicate results from a study with real data and identify around three times as many metagenomic entities.

The code used for the analysis is available on GitHub:

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.

RevDate: 2023-05-22

Youngblom MA, Shockey AC, Callaghan MM, et al (2023)

The Gonococcal Genetic Island defines distinct sub-populations of Neisseria gonorrhoeae.

Microbial genomics, 9(5):.

The incidence of gonorrhoea is increasing at an alarming pace, and therapeutic options continue to narrow as a result of worsening drug resistance. Neisseria gonorrhoeae is naturally competent, allowing the organism to adapt rapidly to selection pressures including antibiotics. A sub-population of N. gonorrhoeae carries the Gonococcal Genetic Island (GGI), which encodes a type IV secretion system (T4SS) that secretes chromosomal DNA. Previous research has shown that the GGI increases transformation efficiency in vitro, but the extent to which it contributes to horizontal gene transfer (HGT) during infection is unknown. Here we analysed genomic data from clinical isolates of N. gonorrhoeae to better characterize GGI+ and GGI- sub-populations and to delineate patterns of variation at the locus itself. We found the element segregating at an intermediate frequency (61%), and it appears to act as a mobile genetic element with examples of gain, loss, exchange and intra-locus recombination within our sample. We further found evidence suggesting that GGI+ and GGI- sub-populations preferentially inhabit distinct niches with different opportunities for HGT. Previously, GGI+ isolates were reported to be associated with more severe clinical infections, and our results suggest this could be related to metal-ion trafficking and biofilm formation. The co-segregation of GGI+ and GGI- isolates despite mobility of the element suggests that both niches inhabited by N. gonorrhoeae remain important to its overall persistence as has been demonstrated previously for cervical- and urethral-adapted sub-populations. These data emphasize the complex population structure of N. gonorrhoeae and its capacity to adapt to diverse niches.

RevDate: 2023-05-19

Qanmber G, You Q, Yang Z, et al (2023)

Transcriptional and translational landscape fine-tune genome annotation and explores translation control in cotton.

Journal of advanced research pii:S2090-1232(23)00142-X [Epub ahead of print].

INTRODUCTION: The unavailability of intergenic region annotation in whole genome sequencing and pan-genomics hinders efforts to enhance crop improvement.

OBJECTIVES: Despite advances in research, the impact of post-transcriptional regulation on fiber development and translatome profiling at different stages of fiber growth in cotton (G. hirsutum) remains unexplored.

METHODS: We utilized a combination of reference-guided de novo transcriptome assembly and ribosome profiling techniques to uncover the hidden mechanisms of translational control in eight distinct tissues of upland cotton.

RESULTS: Our study identified P-site distribution at three-nucleotide periodicity and dominant ribosome footprint at 27 nucleotides. Specifically, we have detected 1,589 small open reading frames (sORFs), including 1,376 upstream ORFs (uORFs) and 213 downstream ORFs (dORFs), as well as 552 long non-coding RNAs (lncRNAs) with potential coding functions, which fine-tune the annotation of the cotton genome. Further, we have identified novel genes and lncRNAs with strong translation efficiency (TE), while sORFs were found to affect mRNA transcription levels during fiber elongation. The reliability of these findings was confirmed by the high consistency in correlation and synergetic fold change between RNA-sequencing (RNA-seq) and Ribosome-sequencing (Ribo-seq) analyses. Additionally, integrated omics analysis of the normal fiber ZM24 and short fiber pag1 cotton mutant revealed several differentially expressed genes (DEGs), and fiber-specific expressed (high/low) genes associated with sORFs (uORFs and dORFs). These findings were further supported by the overexpression and knockdown of GhKCS6, a gene associated with sORFs in cotton, and demonstrated the potential regulation of the mechanism governing fiber elongation on both the transcriptional and post-transcriptional levels.

CONCLUSION: Reference-guided transcriptome assembly and the identification of novel transcripts fine-tune the annotation of the cotton genome and predicted the landscape of fiber development. Our approach provided a high-throughput method, based on multi-omics, for discovering unannotated ORFs, hidden translational control, and complex regulatory mechanisms in crop plants.

RevDate: 2023-05-19

Zhang B, Huang H, Tibbs-Cortes LE, et al (2023)

Streamline unsupervised machine learning to survey and graph indel-based haplotypes from pan-genomes.

Molecular plant pii:S1674-2052(23)00139-9 [Epub ahead of print].

RevDate: 2023-05-18

Ahmed OY, Rossi M, Gagie T, et al (2023)

SPUMONI 2: improved classification using a pangenome index of minimizer digests.

Genome biology, 24(1):122.

Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.

RevDate: 2023-05-18

Anbazhagan S, Himani KM, Karthikeyan R, et al (2023)

Comparative genomics of Brucella abortus and Brucella melitensis unravels the gene sharing, virulence factors and SNP diversity among the standard, vaccine and field strains.

International microbiology : the official journal of the Spanish Society for Microbiology [Epub ahead of print].

Brucella abortus and Brucella melitensis are the primary etiological agents of brucellosis in large and small ruminants, respectively. There are limited comparative genomic studies involving Brucella strains that explore the relatedness among both species. In this study, we involved strains (n=44) representing standard, vaccine and Indian field origin for pangenome, single nucleotide polymorphism (SNP) and phylogenetic analysis. Both species shared a common gene pool representing 2884 genes out of a total 3244 genes. SNP-based phylogenetic analysis indicated higher SNP diversity among B. melitensis (3824) strains in comparison to B. abortus (540) strains, and a clear demarcation was identified between standard/vaccine and field strains. The analysis for virulence genes revealed that virB3, virB7, ricA, virB5, ipx5, wbkC, wbkB, and acpXL genes were highly conserved in most of the Brucella strains. Interestingly, virB10 gene was found to have high variability among the B. abortus strains. The cgMLST analysis revealed distinct sequence types for the standard/vaccine and field strains. B. abortus strains from north-eastern India fall within similar sequence type differing from other strains. In conclusion, the analysis revealed a highly shared core genome among two Brucella species. SNP analysis revealed B. melitensis strains exhibit high diversity as compared to B. abortus strains. Strains with absence or high polymorphism of virulence genes can be exploited for the development of novel vaccine candidates effective against both B. abortus and B. melitensis.

RevDate: 2023-05-17

Tian R, Xu S, Li P, et al (2023)

Characterization of G-type Clostridium perfringens bacteriophages and their disinfection effect on chicken meat.

Anaerobe pii:S1075-9964(23)00045-8 [Epub ahead of print].

OBJECTIVE: Clostridium perfringens is one of most important bacterial pathogens in the poultry industry and mainly causes necrotizing enteritis (NE). This pathogen and its toxins can cause foodborne diseases in humans through the food chain. In China, with the rise of antibiotic resistance and the banning of antibiotic growth promoters (AGPs) in poultry farming, food contamination and NE are becoming more prevalent. Bacteriophages are a viable technique to control C. perfringens as an alternative to antibiotics. We isolated Clostridium phage from the environment, providing a new method for the prevention of NE and C. perfringens contamination in meat.

METHODS: In this study, we selected C. perfringens strains from various regions and animal sources in China for phage isolation. The biological characteristics of Clostridium phage were studied in terms of host range, MOI, one-step curve, temperature and pH stability. We sequenced and annotated the genome of the Clostridium phage and performed phylogenetic and pangenomic analyses. Finally, we studied its antibacterial activity against bacterial culture and its disinfection effect against C. perfringens in meat.

RESULTS: A Clostridium phage, named ZWPH-P21 (P21), was isolated from chicken farm sewage in Jiangsu, China. P21 has been shown to specifically lyse C. perfringens type G. Further analysis of basic biological characteristics showed that P21 was stable under the conditions of pH 4-11 and temperature 4-60 °C, and the optimal multiple severity of infection (MOI) was 0.1. In addition, P21 could form a "halo" on agar plates, suggesting that the phage may encode depolymerase. Genome sequence analysis showed that P21 was the most closely related to Clostridium phage CPAS-15 belonging to the Myoviridae family, with a recognition rate of 97.24% and a query coverage rate of 98%. No virulence factors or drug resistance genes were found in P21. P21 showed promising antibacterial activity in vitro and in chicken disinfection experiments. In conclusion, P21 has the potential to be used for preventing and controlling C. perfringens in chicken food production.

RevDate: 2023-05-17

Tanwar AS, Shruptha P, Jnana A, et al (2023)

Emerging Pathogens in Planetary Health and Lessons from Comparative Genome Analyses of Three Clostridia Species.

Omics : a journal of integrative biology [Epub ahead of print].

Clostridioides difficile (CD) is a major planetary health burden. A Gram-positive opportunistic pathogen, CD, colonizes the large intestine and is implicated in sepsis, pseudomembranous colitis, and colorectal cancer. C. difficile infection typically following antibiotic exposure results in dysbiosis of the gut microbiome, and is one of the leading causes of diarrhea in the elderly population. While several studies have focused on the toxigenic strains of CD, gut commensals such as Clostridium butyricum (CB) and Clostridium tertium (CT) could harbor toxin/virulence genes, and thus pose a threat to human health. In this study, we sequenced and characterized three isolates, namely, CT (MALS001), CB (MALS002), and CD (MALS003) for their antimicrobial, cytotoxic, antiproliferative, genomic, and proteomic profiles. Although in vitro cytotoxic and antiproliferative potential were observed predominantly in CD MALS003, genome analysis revealed pathogenic potential of CB MALS002 and CT MALS001. Pangenome analysis revealed the presence of several accessory genes typically involved in fitness, virulence, and resistance characteristics in the core genomes of sequenced strains. The presence of an array of virulence and antimicrobial resistance genes in CB MALS002 and CT MALS001 suggests their potential role as emerging pathogens with significant impact on planetary health.

RevDate: 2023-05-17

Murik O, Zeevi DA, Mann T, et al (2023)

Whole-Genome Sequencing Reveals Differences among Kingella kingae Strains from Carriers and Patients with Invasive Infections.

Microbiology spectrum [Epub ahead of print].

As a result of the increasing use of sensitive nucleic acid amplification tests, Kingella kingae is being recognized as a common pathogen of early childhood, causing medical conditions ranging from asymptomatic oropharyngeal colonization to bacteremia, osteoarthritis, and life-threatening endocarditis. However, the genomic determinants associated with the different clinical outcomes are unknown. Employing whole-genome sequencing, we studied 125 international K. kingae isolates derived from 23 healthy carriers and 102 patients with invasive infections, including bacteremia (n = 23), osteoarthritis (n = 61), and endocarditis (n = 18). We compared their genomic structures and contents to identify genomic determinants associated with the different clinical conditions. The mean genome size of the strains was 2,024,228 bp, and the pangenome comprised 4,026 predicted genes, of which 1,460 (36.3%) were core genes shared by >99% of the isolates. No single gene discriminated between carried and invasive strains; however, 43 genes were significantly more frequent in invasive isolates, compared to asymptomatically carried organisms, and a few showed a significant differential distribution among isolates from skeletal system infections, bacteremia, and endocarditis. The gene encoding the iron-regulated protein FrpC was uniformly absent in all 18 endocarditis-associated strains but was present in one-third of other invasive isolates. Similar to other members of the Neisseriaceae family, the K. kingae differences in invasiveness and tropism for specific body tissues appear to depend on combinations of multiple virulence-associated determinants that are widely distributed throughout the genome. The potential role of the absence of the FrpC protein in the pathogenesis of endocardial invasion deserves further investigation. IMPORTANCE The wide range of clinical severities exhibited by invasive Kingella kingae infections strongly suggests that isolates differ in their genomic contents, and strains associated with life-threatening endocarditis may harbor distinct genomic determinants that result in cardiac tropism and severe tissue damage. The results of the present study show that no single gene discriminated between asymptomatically carried isolates and invasive strains. However, 43 putative genes were significantly more frequent among invasive isolates than among pharyngeal colonizers. In addition, several genes displayed a significant differential distribution among isolates from bacteremia, skeletal system infections, and endocarditis, suggesting that the virulence and tissue tropism of K. kingae are multifactorial and polygenic, depending on changes in the allele content and genomic organization. Further analysis of these putative genes may identify genomic determinants of the invasiveness of K. kingae and its affinity for specific body tissues and potential targets for a future protective vaccine.

RevDate: 2023-05-16

Kalaivanan NS, Ghoshal T, Lakshmi MA, et al (2023)

Complete genome resource unravels the close relation of an Indian Xanthomonas oryzae pv. oryzae strain IXOBB0003 with Philippines strain causing bacterial blight of rice.

3 Biotech, 13(6):187.

UNLABELLED: Xanthomonas oryzae pv. oryzae (Xoo) is a pathogen of concern for rice growers as it limits the production potential of rice varieties worldwide. Due to their high genomic plasticity, the pathogen continues to evolve, nullifying the deployed resistance mechanisms. It is pertinent to monitor the evolving Xoo population for the virulent novel stains, and the affordable sequencing technologies made the task feasible with an in-depth understanding of their pathogenesis arsenals. We present the complete genome of a highly virulent Indian Xoo strain IXOBB0003, predominantly found in northwestern parts of India, by employing next-generation sequencing and single-molecule sequencing in real-time technologies. The final genome assembly comprises 4,962,427 bp and has 63.96% GC content. The pan genome analysis reveals that strain IXOBB0003 houses total of 3655 core genes, 1276 accessory genes and 595 unique genes. Comparative analysis of the predicted gene clusters of coding sequences and protein count of strain IXOBB0003 depicts 3687 of almost 90% gene clusters shared by other Asian strains, 17 unique to IXOBB0003 and 139 CDSs of IXOBB0003 are shared with PXO99[A]. AnnoTALE-based studies revealed 16 TALEs conferred from the whole genome sequence. Prominent TALEs of our strain are found orthologous to TALEs of the Philippines strain PXO99[A]. The genomic features of Indian Xoo strain IXOBB0003 and in comparison with other Asian strains would certainly contribute significantly while formulating novel strategies for BB management.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-023-03596-x.

RevDate: 2023-05-16

Price RJ, Davik J, Fernandéz Fernandéz F, et al (2023)

Chromosome-scale genome sequence assemblies of the 'Autumn Bliss' and 'Malling Jewel' cultivars of the highly heterozygous red raspberry (Rubus idaeus L.) derived from long-read Oxford Nanopore sequence data.

PloS one, 18(5):e0285756 pii:PONE-D-22-31806.

Red raspberry (Rubus idaeus L.) is an economically valuable soft-fruit species with a relatively small (~300 Mb) but highly heterozygous diploid (2n = 2x = 14) genome. Chromosome-scale genome sequences are a vital tool in unravelling the genetic complexity controlling traits of interest in crop plants such as red raspberry, as well as for functional genomics, evolutionary studies, and pan-genomics diversity studies. In this study, we developed genome sequences of a primocane fruiting variety ('Autumn Bliss') and a floricane variety ('Malling Jewel'). The use of long-read Oxford Nanopore Technologies sequencing data yielded long read lengths that permitted well resolved genome sequences for the two cultivars to be assembled. The de novo assemblies of 'Malling Jewel' and 'Autumn Bliss' contained 79 and 136 contigs respectively, and 263.0 Mb of the 'Autumn Bliss' and 265.5 Mb of the 'Malling Jewel' assembly could be anchored unambiguously to a previously published red raspberry genome sequence of the cultivar 'Anitra'. Single copy ortholog analysis (BUSCO) revealed high levels of completeness in both genomes sequenced, with 97.4% of sequences identified in 'Autumn Bliss' and 97.7% in 'Malling Jewel'. The density of repetitive sequence contained in the 'Autumn Bliss' and 'Malling Jewel' assemblies was significantly higher than in the previously published assembly and centromeric and telomeric regions were identified in both assemblies. A total of 42,823 protein coding regions were identified in the 'Autumn Bliss' assembly, whilst 43,027 were identified in the 'Malling Jewel' assembly. These chromosome-scale genome sequences represent an excellent genomics resource for red raspberry, particularly around the highly repetitive centromeric and telomeric regions of the genome that are less complete in the previously published 'Anitra' genome sequence.

RevDate: 2023-05-15

Kuzmanović N, diCenzo GC, Bunk B, et al (2023)

Genomics of the "tumorigenes" clade of the family Rhizobiaceae and description of Rhizobium rhododendri sp. nov.

MicrobiologyOpen, 12(2):e1352.

Tumorigenic members of the family Rhizobiaceae, known as agrobacteria, are responsible for crown and cane gall diseases of various crops worldwide. Tumorigenic agrobacteria are commonly found in the genera Agrobacterium, Allorhizobium, and Rhizobium. In this study, we analyzed a distinct "tumorigenes" clade of the genus Rhizobium, which includes the tumorigenic species Rhizobium tumorigenes, as well as strains causing crown gall disease on rhododendron. Here, high-quality, closed genomes of representatives of the "tumorigenes" clade were generated, followed by comparative genomic and phylogenomic analyses. Additionally, the phenotypic characteristics of representatives of the "tumorigenes" clade were analyzed. Our results showed that the tumorigenic strains isolated from rhododendron represent a novel species of the genus Rhizobium for which the name Rhizobium rhododendri sp. nov. is proposed. This species also includes additional strains originating from blueberry and Himalayan blackberry in the United States, whose genome sequences were retrieved from GenBank. Both R. tumorigenes and R. rhododendri contain multipartite genomes, including a chromosome, putative chromids, and megaplasmids. Synteny and phylogenetic analyses indicated that a large putative chromid of R. rhododendri resulted from the cointegration of an ancestral megaplasmid and two putative chromids, following its divergence from R. tumorigenes. Moreover, gene clusters specific for both species of the "tumorigenes" clade were identified, and their biological functions and roles in the ecological diversification of R. rhododendri and R. tumorigenes were predicted and discussed.

RevDate: 2023-05-14

Pham HH, Kim DH, TL Nguyen (2023)

Wide-genome selection of lactic acid bacteria harboring genes that promote the elimination of antinutritional factors.

Frontiers in plant science, 14:1145041.

Anti-nutritional factors (ANFs) substances in plant products, such as indigestible non-starchy polysaccharides (α-galactooligosaccharides, α-GOS), phytate, tannins, and alkaloids can impede the absorption of many critical nutrients and cause major physiological disorders. To enhance silage quality and its tolerance threshold for humans as well as other animals, ANFs must be reduced. This study aims to identify and compare the bacterial species/strains that are potential use for industrial fermentation and ANFs reduction. A pan-genome study of 351 bacterial genomes was performed, and binary data was processed to quantify the number of genes involved in the removal of ANFs. Among four pan-genomes analysis, all 37 tested Bacillus subtilis genomes had one phytate degradation gene, while 91 out of 150 Enterobacteriacae genomes harbor at least one genes (maximum three). Although, no gene encoding phytase detected in genomes of Lactobacillus and Pediococcus species, they have genes involving indirectly in metabolism of phytate-derivatives to produce Myo-inositol, an important compound in animal cells physiology. In contrast, genes related to production of lectin, tannase and saponin degrading enzyme did not include in genomes of B. subtilis and Pediococcus species. Our findings suggest a combination of bacterial species and/or unique strains in fermentation, for examples, two Lactobacillus strains (DSM 21115 and ATCC 14869) with B. subtilis SRCM103689, would maximize the efficiency in reducing the ANFs concentration. In conclusion, this study provides insights into bacterial genomes analysis for maximizing nutritional value in plant-based food. Further investigations of gene numbers and repertories correlated to metabolism of different ANFs will help clarifying the efficiency of time consuming and food qualities.

RevDate: 2023-05-14

Meng X, Chen F, Xiong M, et al (2023)

A new pathogenic isolate of Kocuria kristinae identified for the first time in the marine fish Larimichthys crocea.

Frontiers in microbiology, 14:1129568.

In recent years, new emerging pathogenic microorganisms have frequently appeared in animals, including marine fish, possibly due to climate change, anthropogenic activities, and even cross-species transmission of pathogenic microorganisms among animals or between animals and humans, which poses a serious issue for preventive medicine. In this study, a bacterium was clearly characterized among 64 isolates from the gills of diseased large yellow croaker Larimichthys crocea that were raised in marine aquaculture. This strain was identified as K. kristinae by biochemical tests with a VITEK 2.0 analysis system and 16S rRNA sequencing and named K. kristinae_LC. The potential genes that might encode virulence-factors were widely screened through sequence analysis of the whole genome of K. kristinae_LC. Many genes involved in the two-component system and drug-resistance were also annotated. In addition, 104 unique genes in K. kristinae_LC were identified by pan genome analysis with the genomes of this strain from five different origins (woodpecker, medical resource, environment, and marine sponge reef) and the analysis results demonstrated that their predicted functions might be associated with adaptation to living conditions such as higher salinity, complex marine biomes, and low temperature. A significant difference in genomic organization was found among the K. kristinae strains that might be related to their hosts living in different environments. The animal regression test for this new bacterial isolate was carried out using L. crocea, and the results showed that this bacterium could cause the death of L. crocea and that the fish mortality was dose-dependent within 5 days post infection, indicating the pathogenicity of K. kristinae_LC to marine fish. Since K. kristinae has been reported as a pathogen for humans and bovines, in our study, we revealed a new isolate of K. kristinae_LC from marine fish for the first time, suggesting the potentiality of cross-species transmission among animals or from marine animals to humans, from which we would gain insight to help in future public prevention strategies for new emerging pathogens.

RevDate: 2023-05-13

An B, Cai H, Li B, et al (2023)

Molecular Evolution of Histone Methylation Modification Families in the Plant Kingdom and Their Genome-Wide Analysis in Barley.

International journal of molecular sciences, 24(9): pii:ijms24098043.

In this study, based on the OneKP database and through comparative genetic analysis, we found that HMT and HDM may originate from Chromista and are highly conserved in green plants, and that during the evolution from algae to land plants, histone methylation modifications gradually became complex and diverse, which is more conducive to the adaptation of plants to complex and variable environments. We also characterized the number of members, genetic similarity, and phylogeny of HMT and HDM families in barley using the barley pangenome and the Tibetan Lasa Goumang genome. The results showed that HMT and HDM were highly conserved in the domestication of barley, but there were some differences in the Lasa Goumang SDG subfamily. Expression analysis showed that HvHMTs and HvHDMs were highly expressed in specific tissues and had complex expression patterns under multiple stress treatments. In summary, the amplification and variation of HMT and HDM facilitate plant adaptation to complex terrestrial environments, while they are highly conserved in barley and play an important role in barley growth and development with abiotic stresses. In brief, our findings provide a novel perspective on the origin and evolutionary history of plant HvHMTs and HvHDMs, and lay a foundation for further investigation of their functions in barley.

RevDate: 2023-05-12

Abdella B, Abozahra NA, Shokrak NM, et al (2023)

Whole spectrum of Aeromonas hydrophila virulence determinants and the identification of novel SNPs using comparative pathogenomics.

Scientific reports, 13(1):7712.

Aeromonas hydrophila is a ubiquitous fish pathogen and an opportunistic human pathogen. It is mostly found in aquatic habitats, but it has also been isolated from food and bottled mineral waters. It causes hemorrhagic septicemia, ulcerative disease, and motile Aeromonas septicemia (MAS) in fish and other aquatic animals. Moreover, it might cause gastroenteritis, wound infections, and septicemia in humans. Different variables influence A. hydrophila virulence, including the virulence genes expressed, host susceptibility, and environmental stresses. The identification of virulence factors for a bacterial pathogen will help in the development of preventive and control measures. 95 Aeromonas spp. genomes were examined in the current study, and 53 strains were determined to be valid A. hydrophila. These genomes were examined for pan- and core-genomes using a comparative genomics technique. A. hydrophila has an open pan-genome with 18,306 total genes and 1620 genes in its core-genome. In the pan-genome, 312 virulence genes have been detected. The effector delivery system category had the largest number of virulence genes (87), followed by immunological modulation and motility genes (69 and 46, respectively). This provides new insight into the pathogenicity of A. hydrophila. In the pan-genome, a few distinctive single-nucleotide polymorphisms (SNPs) have been identified in four genes, namely: D-glycero-beta-D-manno-heptose-1,7-bisphosphate 7-phosphatase, chemoreceptor glutamine deamidase, Spermidine N (1)-acetyltransferase, and maleylpyruvate isomerase, which are present in all A. hydrophila genomes, which make them molecular marker candidates for precise identification of A. hydrophila. Therefore, for precise diagnostic and discrimination results, we suggest these genes be considered when designing primers and probes for sequencing, multiplex-PCR, or real-time PCR.

RevDate: 2023-05-12

Raza A, Bohra A, RK Varshney (2023)

Pan-genome for pearl millet that beats the heat.

Trends in plant science pii:S1360-1385(23)00156-5 [Epub ahead of print].

A better understanding of crop genomes reveals that structural variations (SVs) are crucial for genetic improvement. A graph-based pan-genome by Yan et al. uncovered 424 085 genomic SVs and provided novel insights into heat tolerance of pearl millet. We discuss how these SVs can fast-track pearl millet breeding under harsh environments.

RevDate: 2023-05-12

Büchler T, Olbrich J, E Ohlebusch (2023)

Efficient short read mapping to a pangenome that is represented by a graph of ED strings.

Bioinformatics (Oxford, England) pii:7160913 [Epub ahead of print].

MOTIVATION: A pangenome represents many diverse genome sequences of the same species. In order to cope with small variations as well as structural variations, recent research focused on the development of graph based models of pangenomes. Mapping is the process of finding the original location of a DNA read in a reference sequence, typically a genome. Using a pangenome instead of a (linear) reference genome can e.g. reduce mapping bias, the tendency to incorrectly map sequences that differ from the reference genome. Mapping reads to a graph, however, is more complex and needs more resources than mapping to a reference genome. Reducing the complexity of the graph by encoding simple variations like SNPs in a simple way can accelerate read mapping and reduce the memory requirements at the same time.

RESULTS: We introduce graphs based on elastic-degenerate strings (ED strings, EDS) and the linearised form of these EDS graphs as a new representation for pangenomes. In this representation, small variations are encoded directly in the sequence. Structural variations are encoded in a graph structure. This reduces the size of the representation in comparison to sequence graphs. In the linearised form, mapping techniques that are known from ordinary strings can be applied with appropriate adjustments. Since most variations are expressed directly in the sequence, the mapping process rarely has to take edges of the EDS graph into account. We developed a prototypical software tool GED-MAP that uses this representation together with a minimizer index to map short reads to the pangenome. Our experiments show that the new method works on a whole human genome scale, taking structural variants properly into account. The advantage of GED-MAP, compared with other pangenomic short read mappers, is that the new representation allows for a simple indexing method. This makes GED-MAP fast and memory efficient.

AVAILABILITY: Sources are available at:

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2023-05-11

Riborg A, Gulla S, Fiskebeck EZ, et al (2023)

Pan-genome survey of the fish pathogen Yersinia ruckeri links accessory- and amplified genes to virulence.

PloS one, 18(5):e0285257 pii:PONE-D-23-00078.

While both virulent and putatively avirulent Yersinia ruckeri strains exist in aquaculture environments, the relationship between the distribution of virulence-associated factors and de facto pathogenicity in fish remains poorly understood. Pan-genome analysis of 18 complete genomes, representing established virulent and putatively avirulent lineages of Y. ruckeri, revealed the presence of a number of accessory genetic determinants. Further investigation of 68 draft genome assemblies revealed that the distribution of certain putative virulence factors correlated well with virulence and host-specificity. The inverse-autotransporter invasin locus yrIlm was, however, the only gene present in all virulent strains, while absent in lineages regarded as avirulent. Strains known to be associated with significant mortalities in salmonid aquaculture display a combination of serotype O1-LPS and yrIlm, with the well-documented highly virulent lineages, represented by MLVA clonal complexes 1 and 2, displaying duplication of the yrIlm locus. Duplication of the yrIlm locus was further found to have evolved over time in clonal complex 1, where some modern, highly virulent isolates display up to three copies.

RevDate: 2023-05-10

Liao WW, Asri M, Ebler J, et al (2023)

A draft human pangenome reference.

Nature, 617(7960):312-324.

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals[1]. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

RevDate: 2023-05-10

Guarracino A, Buonaiuto S, de Lima LG, et al (2023)

Recombination between heterologous human acrocentric chromosomes.

Nature, 617(7960):335-343.

The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications[1,2]. Although the resolution of these regions in the first complete assembly of a human genome-the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13)-provided a model of their homology[3], it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium[4] (HPRC), we find that contigs from all of the SAACs form a community. A variation graph[5] constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination[6,7]. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations[8], and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago[9].

RevDate: 2023-05-10

Vollger MR, Dishuck PC, Harvey WT, et al (2023)

Increased mutation and gene conversion within human segmental duplications.

Nature, 617(7960):325-334.

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data[1,2]. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions[3,4]. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences[5,6].

RevDate: 2023-05-10

Massarat A, Gymrek M, McStay B, et al (2023)

Human pangenome supports analysis of complex genomic regions.

Nature, 617(7960):256-258.

RevDate: 2023-05-10

Liverpool L (2023)

First human 'pangenome' aims to catalogue genetic diversity.

RevDate: 2023-05-10

Petrić Howe N, S Bundell (2023)

'Pangenome' aims to capture the breadth of human diversity.

RevDate: 2023-05-10

Hickey G, Monlong J, Ebler J, et al (2023)

Pangenome graph construction from genome alignments with Minigraph-Cactus.

Nature biotechnology [Epub ahead of print].

Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.

RevDate: 2023-05-10

Porubsky D, Vollger MR, Harvey WT, et al (2023)

Gaps and complex structurally variant loci in phased genome assemblies.

Genome research pii:gr.277334.122 [Epub ahead of print].

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

RevDate: 2023-05-08

Castillo AI, RPP Almeida (2023)

The Multifaceted Role of Homologous Recombination in a Fastidious Bacterial Plant Pathogen.

Applied and environmental microbiology [Epub ahead of print].

Homologous recombination plays a key function in the evolution of bacterial genomes. Within Xylella fastidiosa, an emerging plant pathogen with increasing host and geographic ranges, it has been suggested that homologous recombination facilitates host switching, speciation, and the development of virulence. We used 340 whole-genome sequences to study the relationship between inter- and intrasubspecific homologous recombination, random mutation, and natural selection across individual X. fastidiosa genes. Individual gene orthologs were identified and aligned, and a maximum likelihood (ML) gene tree was generated. Each gene alignment and tree pair were then used to calculate gene-wide and branch-specific r/m values (relative effect of recombination to mutation), gene-wide and branch-site nonsynonymous over synonymous substitution rates (dN/dS values; episodic selection), and branch length (as a proxy for mutation rate). The relationships between these variables were evaluated at the global level (i.e., for all genes among and within a subspecies), among specific functional classes (i.e., COGs), and between pangenome components (i.e., accessory versus core genes). Our analysis showed that r/m varied widely among genes as well as across X. fastidiosa subspecies. While r/m and dN/dS values were positively correlated in some instances (e.g., core genes in X. fastidiosa subsp. fastidiosa and both core and accessory genes in X. fastidiosa subsp. multiplex), low correlation coefficients suggested no clear biological significance. Overall, our results indicate that, in addition to its adaptive role in certain genes, homologous recombination acts as a homogenizing and a neutral force across phylogenetic clades, gene functional groups, and pangenome components. IMPORTANCE There is ample evidence that homologous recombination occurs frequently in the economically important plant pathogen Xylella fastidiosa. Homologous recombination has been known to occur among sympatric subspecies and is associated with host-switching events and virulence-linked genes. As a consequence, is it generally assumed that recombinant events in X. fastidiosa are adaptive. This mindset influences expectations of how homologous recombination acts as an evolutionary force as well as how management strategies for X. fastidiosa diseases are determined. Yet, homologous recombination plays roles beyond that of a source for diversification and adaptation. Homologous recombination can act as a DNA repair mechanism, as a means to facilitate nucleotide compositional change, as a homogenization mechanism within populations, or even as a neutral force. Here, we provide a first assessment of long-held beliefs regarding the general role of recombination in adaptation for X. fastidiosa. We evaluate gene-specific variations in homologous recombination rate across three X. fastidiosa subspecies and its relationship to other evolutionary forces (e.g., natural selection, mutation, etc.). These data were used to assess the role of homologous recombination in X. fastidiosa evolution.

RevDate: 2023-05-08

Saroha T, Patil PP, Rana R, et al (2023)

Genomic features, antimicrobial susceptibility, and epidemiological insights into Burkholderia cenocepacia clonal complex 31 isolates from bloodstream infections in India.

Frontiers in cellular and infection microbiology, 13:1151594.

INTRODUCTION: Burkholderia cepacia complex (Bcc) clonal complex (CC) 31, the predominant lineage causing devastating outbreaks globally, has been a growing concern of infections in non-cystic fibrosis (NCF) patients in India. B. cenocepacia is very challenging to treat owing to its virulence determinants and antibiotic resistance. Improving the management of these infections requires a better knowledge of their resistance patterns and mechanisms.

METHODS: Whole-genome sequences of 35 CC31 isolates obtained from patient samples, were analyzed against available 210 CC31 genomes in the NCBI database to glean details of resistance, virulence, mobile elements, and phylogenetic markers to study genomic diversity and evolution of CC31 lineage in India.

RESULTS: Genomic analysis revealed that 35 isolates belonging to CC31 were categorized into 11 sequence types (ST), of which five STs were reported exclusively from India. Phylogenetic analysis classified 245 CC31 isolates into eight distinct clades (I-VIII) and unveiled that NCF isolates are evolving independently from the global cystic fibrosis (CF) isolates forming a distinct clade. The detection rate of seven classes of antibiotic-related genes in 35 isolates was 35 (100%) for tetracyclines, aminoglycosides, and fluoroquinolones; 26 (74.2%) for sulphonamides and phenicols; 7 (20%) for beta-lactamases; and 1 (2.8%) for trimethoprim resistance genes. Additionally, 3 (8.5%) NCF isolates were resistant to disinfecting agents and antiseptics. Antimicrobial susceptibility testing revealed that majority of NCF isolates were resistant to chloramphenicol (77%) and levofloxacin (34%). NCF isolates have a comparable number of virulence genes to CF isolates. A well-studied pathogenicity island of B. cenocepacia, GI11 is present in ST628 and ST709 isolates from the Indian Bcc population. In contrast, genomic island GI15 (highly similar to the island found in B. pseudomallei strain EY1) is exclusively reported in ST839 and ST824 isolates from two different locations in India. Horizontal acquisition of lytic phage ST79 of pathogenic B. pseudomallei is demonstrated in ST628 isolates Bcc1463, Bcc29163, and BccR4654 amongst CC31 lineage.

DISCUSSION: The study reveals a high diversity of CC31 lineages among B. cenocepacia isolates from India. The extensive information from this study will facilitate the development of rapid diagnostic and novel therapeutic approaches to manage B. cenocepacia infections.

RevDate: 2023-05-08

Aziz T, Naveed M, Jabeen K, et al (2023)

Integrated genome based evaluation of safety and probiotic characteristics of Lactiplantibacillus plantarum YW11 isolated from Tibetan kefir.

Frontiers in microbiology, 14:1157615.

The comparative genomic analysis of Lactiplantibacillus plantarum YW11 (L. plantarum YW11) isolated from Tibetan kefir involves comparison of the complete genome sequences of the isolated strain with other closely related L. plantarum strains. This type of analysis can be used to identify the genetic diversity among strains and to explore the genetic characteristics of the YW11 strain. The genome of L. plantarum YW11 was found to be composed of a circular single chromosome of 4,597,470 bp with a G + C content of 43.2%. A total of 4,278 open reading frames (ORFs) were identified in the genome and the coding density was found to be 87.8%. A comparative genomic analysis was conducted using two other L. plantarum strains, L. plantarum C11 and L. plantarum LMG21703. Genomic comparison revealed that L. plantarum YW11 shared 72.7 and 75.2% of gene content with L. plantarum C11 and L. plantarum LMG21703, respectively. Most of the genes shared between the three L. plantarum strains were involved in carbohydrate metabolism, energy production and conversion, amino acid metabolism, and transcription. In this analysis, 10 previously sequenced entire genomes of the species were compared using an in-silico technique to discover genomic divergence in genes linked with carbohydrate intake and their potential adaptations to distinct human intestinal environments. The subspecies pan-genome was open, which correlated with its extraordinary capacity to colonize several environments. Phylogenetic analysis revealed that the novel genomes were homogenously grouped among subspecies of l Lactiplantibacillus. L. plantarum was resistant to cefoxitin, erythromycin, and metronidazole, inhibited pathogens including Listeria monocytogenes, Clostridium difficile, Vibrio cholera, and others, and had excellent aerotolerance, which is useful for industrial operations. The comparative genomic analysis of L. plantarum YW11 isolated from Tibetan kefir can provide insights into the genetic characteristics of the strain, which can be used to further understand its role in the production of kefir.

RevDate: 2023-05-05

Mun T, Vaddadi NSK, B Langmead (2023)

Pangenomic genotyping with the marker array.

Algorithms for molecular biology : AMB, 18(1):2.

We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool rowbowt available at .

RevDate: 2023-05-05

Basharat Z, A Meshal (2023)

Pan-genome mediated therapeutic target mining in Kingella kingae and inhibition assessment using traditional Chinese medicinal compounds: an informatics approach.

Journal of biomolecular structure & dynamics [Epub ahead of print].

Kingella kingae causes bacteremia, endocarditis, osteomyelitis, septic arthritis, meningitis, spondylodiscitis, and lower respiratory tract infections in pediatric patients. Usually it demonstrates disease after inflammation of mouth, lips or infections of the upper respiratory tract. To date, therapeutic targets in this bacterium remain unexplored. We have utilized a battery of bioinformatics tools to mine these targets in this study. Core genes were initially inferred from 55 genomes of K. kingae and 39 therapeutic targets were mined using an in-house pipeline. We selected aroG product (KDPG aldolase) involved in chorismate pathway, for inhibition analysis of this bacterium using lead-like metabolites from traditional Chinese medicinal plants. Pharmacophore generation was done using control ZINC36444158 (1,16-bis[(dihydroxyphosphinyl)oxy]hexadecane), followed by molecular docking of top hits from a library of 36,000 compounds. Top prioritized compounds were ZINC95914016, ZINC33833283 and ZINC95914219. ADME profiling and simulation of compound dosing (100 mg tablet) was done to infer compartmental pharmacokinetics in a population of 300 individuals in fasting state. PkCSM based toxicity analysis revealed the compounds ZINC95914016 and ZINC95914219 as safe and with almost similar bioavailability. However, ZINC95914016 takes less time to reach maximum concentration in the plasma and shows several optimal parameters compared to other leads. In light of obtained data, we recommend this compound for further testing and induction in experimental drug design pipeline.Communicated by Ramaswamy H. Sarma.

RevDate: 2023-05-04

Gong Y, Li Y, Liu X, et al (2023)

A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?.

Journal of animal science and biotechnology, 14(1):73.

As large-scale genomic studies have progressed, it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level. While domestic animals tend to have complex routes of origin and migration, suggesting a possible omission of some population-specific sequences in the current reference genome. Conversely, the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals (core genome) and is also able to display sequence information unique to each individual (variable genome). The progress of pangenome research in humans, plants and domestic animals has proved that the missing genetic components and the identification of large structural variants (SVs) can be explored through pangenomic studies. Many individual specific sequences have been shown to be related to biological adaptability, phenotype and important economic traits. The maturity of technologies and methods such as third-generation sequencing, Telomere-to-telomere genomes, graphic genomes, and reference-free assembly will further promote the development of pangenome. In the future, pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals, providing better insights into animal domestication, evolution and breeding. In this review, we mainly discuss how pangenome analysis reveals genetic variations in domestic animals (sheep, cattle, pigs, chickens) and their impacts on phenotypes and how this can contribute to the understanding of species diversity. Additionally, we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.

RevDate: 2023-05-04

Sorouri B, Rodriguez CI, Gaut BS, et al (2023)

Variation in Sphingomonas traits across habitats and phylogenetic clades.

Frontiers in microbiology, 14:1146165.

Whether microbes show habitat preferences is a fundamental question in microbial ecology. If different microbial lineages have distinct traits, those lineages may occur more frequently in habitats where their traits are advantageous. Sphingomonas is an ideal bacterial clade in which to investigate how habitat preference relates to traits because these bacteria inhabit diverse environments and hosts. Here we downloaded 440 publicly available Sphingomonas genomes, assigned them to habitats based on isolation source, and examined their phylogenetic relationships. We sought to address whether: (1) there is a relationship between Sphingomonas habitat and phylogeny, and (2) whether there is a phylogenetic correlation between key, genome-based traits and habitat preference. We hypothesized that Sphingomonas strains from similar habitats would cluster together in phylogenetic clades, and key traits that improve fitness in specific environments should correlate with habitat. Genome-based traits were categorized into the Y-A-S trait-based framework for high growth yield, resource acquisition, and stress tolerance. We selected 252 high quality genomes and constructed a phylogenetic tree with 12 well-defined clades based on an alignment of 404 core genes. Sphingomonas strains from the same habitat clustered together within the same clades, and strains within clades shared similar clusters of accessory genes. Additionally, key genome-based trait frequencies varied across habitats. We conclude that Sphingomonas gene content reflects habitat preference. This knowledge of how environment and host relate to phylogeny may also help with future functional predictions about Sphingomonas and facilitate applications in bioremediation.

RevDate: 2023-05-04

Zhou Y, Jiang D, Yao X, et al (2023)

Pan-genome wide association study of Glaesserella parasuis highlights genes associated with virulence and biofilm formation.

Frontiers in microbiology, 14:1160433.

Glaesserella parasuis is a gram-negative bacterium that causes fibrotic polyserositis and arthritis in pig, significantly affecting the pig industry. The pan-genome of G. parasuis is open. As the number of genes increases, the core and accessory genomes may show more pronounced differences. The genes associated with virulence and biofilm formation are also still unclear due to the diversity of G. parasuis. Therefore, we have applied a pan-genome-wide association study (Pan-GWAS) to 121 strains G. parasuis. Our analysis revealed that the core genome consists of 1,133 genes associated with the cytoskeleton, virulence, and basic biological processes. The accessory genome is highly variable and is a major cause of genetic diversity in G. parasuis. Furthermore, two biologically important traits (virulence, biofilm formation) of G. parasuis were studied via pan-GWAS to search for genes associated with the traits. A total of 142 genes were associated with strong virulence traits. By affecting metabolic pathways and capturing the host nutrients, these genes are involved in signal pathways and virulence factors, which are beneficial for bacterial survival and biofilm formation. This research lays the foundation for further studies on virulence and biofilm formation and provides potential new drug and vaccine targets against G. parasuis.

RevDate: 2023-05-04

Zhao Y, Wei HM, Yuan JL, et al (2023)

A comprehensive genomic analysis provides insights on the high environmental adaptability of Acinetobacter strains.

Frontiers in microbiology, 14:1177951.

Acinetobacter is ubiquitous, and it has a high species diversity and a complex evolutionary pattern. To elucidate the mechanism of its high ability to adapt to various environment, 312 genomes of Acinetobacter strains were analyzed using the phylogenomic and comparative genomics methods. It was revealed that the Acinetobacter genus has an open pan-genome and strong genome plasticity. The pan-genome consists of 47,500 genes, with 818 shared by all the genomes of Acinetobacter, while 22,291 are unique genes. Although Acinetobacter strains do not have a complete glycolytic pathway to directly utilize glucose as carbon source, most of them harbored the n-alkane-degrading genes alkB/alkM (97.1% of tested strains) and almA (96.7% of tested strains), which were responsible for medium-and long-chain n-alkane terminal oxidation reaction, respectively. Most Acinetobacter strains also have catA (93.3% of tested strains) and benAB (92.0% of tested strains) genes that can degrade the aromatic compounds catechol and benzoic acid, respectively. These abilities enable the Acinetobacter strains to easily obtain carbon and energy sources from their environment for survival. The Acinetobacter strains can manage osmotic pressure by accumulating potassium and compatible solutes, including betaine, mannitol, trehalose, glutamic acid, and proline. They respond to oxidative stress by synthesizing superoxide dismutase, catalase, disulfide isomerase, and methionine sulfoxide reductase that repair the damage caused by reactive oxygen species. In addition, most Acinetobacter strains contain many efflux pump genes and resistance genes to manage antibiotic stress and can synthesize a variety of secondary metabolites, including arylpolyene, β-lactone and siderophores among others, to adapt to their environment. These genes enable Acinetobacter strains to survive extreme stresses. The genome of each Acinetobacter strain contained different numbers of prophages (0-12) and genomic islands (GIs) (6-70), and genes related to antibiotic resistance were found in the GIs. The phylogenetic analysis revealed that the alkM and almA genes have a similar evolutionary position with the core genome, indicating that they may have been acquired by vertical gene transfer from their ancestor, while catA, benA, benB and the antibiotic resistance genes could have been acquired by horizontal gene transfer from the other organisms.

RevDate: 2023-05-04

Oddy J, Chhetry M, Awal R, et al (2023)

Genetic control of grain amino acid composition in a UK soft wheat mapping population.

The plant genome [Epub ahead of print].

Wheat (Triticum aestivum L.) is a major source of nutrients for populations across the globe, but the amino acid composition of wheat grain does not provide optimal nutrition. The nutritional value of wheat grain is limited by low concentrations of lysine (the most limiting essential amino acid) and high concentrations of free asparagine (precursor to the processing contaminant acrylamide). There are currently few available solutions for asparagine reduction and lysine biofortification through breeding. In this study, we investigated the genetic architecture controlling grain free amino acid composition and its relationship to other traits in a Robigus × Claire doubled haploid population. Multivariate analysis of amino acids and other traits showed that the two groups are largely independent of one another, with the largest effect on amino acids being from the environment. Linkage analysis of the population allowed identification of quantitative trait loci (QTL) controlling free amino acids and other traits, and this was compared against genomic prediction methods. Following identification of a QTL controlling free lysine content, wheat pangenome resources facilitated analysis of candidate genes in this region of the genome. These findings can be used to select appropriate strategies for lysine biofortification and free asparagine reduction in wheat breeding programs.

RevDate: 2023-05-04

Derbyshire MC, Marsh J, Tirnaz S, et al (2023)

Diversity of fatty acid biosynthesis genes across the soybean pangenome.

The plant genome [Epub ahead of print].

Soybean (Glycine max) is a major crop that contributes more than half of global oilseed production. Much research has been directed towards improvement of the fatty acid profile of soybean seeds through marker assisted breeding. Recently published soybean pangenomes, based on thousands of soybean lines, provide an opportunity to identify new alleles that may be involved in fatty acid biosynthesis. In this study, we identify fatty acid biosynthesis genes in soybean pangenomes based on sequence identity with known genes and examine their sequence diversity across diverse soybean collections. We find three possible instances of a gene missing in wild soybean, including FAD8 and FAD2-2D, which may be involved in oleic and linoleic acid desaturation, respectively, although we recommend follow-up research to verify the absence of these genes. More than half of the 53 fatty acid biosynthesis genes identified contained missense variants, including one linked with a previously identified QTL for oil quality. These variants were present in multiple studies based on either short read mappings or alignment of reference grade genomes. Missense variants were found in previously characterized genes including FAD2-1A and FAD2-1B, both of which are involved in desaturation of oleic acid, as well as uncharacterized candidate fatty acid biosynthesis genes. We find that the frequency of missense alleles in fatty acid biosynthesis genes has been reduced significantly more than the global average frequency of missense mutations during domestication, and missense variation in some genes is near absent in modern cultivars. This could be due to the selection for fatty acid profiles in seed, though future work should be conducted towards understanding the phenotypic impacts of these variants.

RevDate: 2023-05-02

Maki JJ, Howard M, Connelly S, et al (2023)

Species Delineation and Comparative Genomics within the Campylobacter ureolyticus Complex.

Journal of clinical microbiology [Epub ahead of print].

Campylobacter ureolyticus is an emerging pathogen increasingly appreciated as a common cause of gastroenteritis and extra-intestinal infections in humans. Outside the setting of gastroenteritis, little work has been done to describe the genomic content and relatedness of the species, especially regarding clinical isolates. We reviewed the epidemiology of clinical C. ureolyticus cultured by our institution over the past 10 years. Fifty-one unique C. ureolyticus isolates were identified between January 2010 and August 2022, mostly originating from abscesses and blood cultures. To clarify the taxonomic relationships between isolates and to attribute specific genes with different clinical manifestations, we sequenced 19 available isolates from a variety of clinical specimen types and conducted a pangenomic analysis with publicly available C. ureolyticus genomes. Digital DNA:DNA hybridization suggested that these C. ureolyticus comprised a species complex of 10 species clusters (SCs) and several subspecies clusters. Although some orthologous genes or gene functions were enriched in isolates found in different SCs and clinical specimens, no association was significant. Nearly a third of the isolates possessed antimicrobial resistance genes, including the ermA resistance gene, potentially conferring resistance to macrolides, the treatment of choice for severe human campylobacteriosis. This work effectively doubles the number of publicly available C. ureolyticus genomes, provides further clarification of taxonomic relationships within this bacterial complex, and identifies target SCs for future analysis.

RevDate: 2023-05-01

Weller CA, Andreev I, Chambers MJ, et al (2023)

Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity.

Genome research pii:gr.277515.122 [Epub ahead of print].

Understanding the genetic causes of trait variation is a primary goal of genetic research. One way that individuals can vary genetically is through variable pangenomic genes - genes that are only present in some individuals in a population. The presence or absence of entire genes could have large effects on trait variation. However, variable pangenomic genes can be missed in standard genotyping workflows, due to reliance on aligning short-read sequencing to reference genomes. A popular method for studying the genetic basis of trait variation is linkage mapping, which identifies quantitative trait loci (QTLs), regions of the genome that harbor causative genetic variants. Large-scale linkage mapping in the budding yeast Saccharomyces cerevisiae has found thousands of QTLs affecting myriad yeast phenotypes. To enable the resolution of QTLs caused by variable pangenomic genes, we used long-read sequencing to generate highly complete de novo assemblies of 16 diverse yeast isolates. With these assemblies we resolved QTLs for growth on maltose, sucrose, raffinose, and oxidative stress to specific genes that are absent from the reference genome but present in the broader yeast population at appreciable frequency. Copies of genes also duplicate onto chromosomes where they are absent in the reference genome, and we found that these copies generate additional QTLs whose resolution requires pangenome characterization. Our findings demonstrate the need for highly complete genome assemblies to identify the genetic basis of trait variation.

RevDate: 2023-05-01

Saxena P, Rauniyar S, Thakur P, et al (2023)

Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria.

Frontiers in microbiology, 14:1086021.

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under "persistent," inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under "shell." Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

RevDate: 2023-04-30

Porubsky D, Harvey WT, Rozanski AN, et al (2023)

Inversion polymorphism in a complete human genome assembly.

Genome biology, 24(1):100.

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.

RevDate: 2023-04-28

Jacob JJ, Pragasam AK, Vasudevan K, et al (2023)

Genomic analysis unveils genome degradation events and gene flux in the emergence and persistence of S. Paratyphi A lineages.

PLoS pathogens, 19(4):e1010650 pii:PPATHOGENS-D-22-01006 [Epub ahead of print].

Paratyphoid fever caused by S. Paratyphi A is endemic in parts of South Asia and Southeast Asia. The proportion of enteric fever cases caused by S. Paratyphi A has substantially increased, yet only limited data is available on the population structure and genetic diversity of this serovar. We examined the phylogenetic distribution and evolutionary trajectory of S. Paratyphi A isolates collected as part of the Indian enteric fever surveillance study "Surveillance of Enteric Fever in India (SEFI)." In the study period (2017-2020), S. Paratyphi A comprised 17.6% (441/2503) of total enteric fever cases in India, with the isolates highly susceptible to all the major antibiotics used for treatment except fluoroquinolones. Phylogenetic analysis clustered the global S. Paratyphi A collection into seven lineages (A-G), and the present study isolates were distributed in lineages A, C and F. Our analysis highlights that the genome degradation events and gene acquisitions or losses are key molecular events in the evolution of new S. Paratyphi A lineages/sub-lineages. A total of 10 hypothetically disrupted coding sequences (HDCS) or pseudogenes-forming mutations possibly associated with the emergence of lineages were identified. The pan-genome analysis identified the insertion of P2/PSP3 phage and acquisition of IncX1 plasmid during the selection in 2.3.2/2.3.3 and 1.2.2 genotypes, respectively. We have identified six characteristic missense mutations associated with lipopolysaccharide (LPS) biosynthesis genes of S. Paratyphi A, however, these mutations confer only a low structural impact and possibly have minimal impact on vaccine effectiveness. Since S. Paratyphi A is human-restricted, high levels of genetic drift are not expected unless these bacteria transmit to naive hosts. However, public-health investigation and monitoring by means of genomic surveillance would be constantly needed to avoid S. Paratyphi A serovar becoming a public health threat similar to the S. Typhi of today.

RevDate: 2023-04-28

Ariute JC, Felice AG, Soares S, et al (2023)

Characterization and Association of Rips Repertoire to Host Range of Novel Ralstonia solanacearum Strains by In Silico Approaches.

Microorganisms, 11(4): pii:microorganisms11040954.

Ralstonia solanacearum species complex (RSSC) cause several phytobacteriosis in many economically important crops around the globe, especially in the tropics. In Brazil, phylotypes I and II cause bacterial wilt (BW) and are indistinguishable by classical microbiological and phytopathological methods, while Moko disease is caused only by phylotype II strains. Type III effectors of RSSC (Rips) are key molecular actors regarding pathogenesis and are associated with specificity to some hosts. In this study, we sequenced and characterized 14 newly RSSC isolates from Brazil's Northern and Northeastern regions, including BW and Moko ecotypes. Virulence and resistance sequences were annotated, and the Rips repertoire was predicted. Confirming previous studies, RSSC pangenome is open as α≅0.77. Genomic information regarding these isolates matches those for R. solanacearum in NCBI. All of them fit in phylotype II with a similarity above 96%, with five isolates in phylotype IIB and nine in phylotype IIA. Almost all R. solanacearum genomes in NCBI are actually from other species in RSSC. Rips repertoire of Moko IIB was more homogeneous, except for isolate B4, which presented ten non-shared Rips. Rips repertoire of phylotype IIA was more diverse in both Moko and BW, with 43 common shared Rips among all 14 isolates. New BW isolates shared more Rips with Moko IIA and Moko IIB than with other public BW genome isolates from Brazil. Rips not shared with other isolates might contribute to individual virulence, but commonly shared Rips are good avirulence candidates. The high number of Rips shared by new Moko and BW isolates suggests they are actually Moko isolates infecting solanaceous hosts. Finally, infection assays and Rips expression on different hosts are needed to better elucidate the association between Rips repertoire and host specificities.

RevDate: 2023-04-27

Henaut-Jacobs S, Passarelli-Araujo H, TM Venancio (2023)

Comparative genomics and phylogenomics of Campylobacter unveil potential novel species and provide insights into niche segregation.

Molecular phylogenetics and evolution pii:S1055-7903(23)00086-6 [Epub ahead of print].

Campylobacter is a bacterial genus associated with community outbreaks and gastrointestinal symptoms. Studies on Campylobacter generally focus on specific pathogenic species such as C. coli and C. jejuni. Currently, there are thousands of publicly available Campylobacter genomes, allowing a more complete assessment of the genus diversity. In this work, we report a network-based analysis of all available Campylobacter genomes to explore the genus structure and diversity, revealing potentially new species and elucidating genus features. We also hypothesize that the previously established Clade III of C. coli is in fact a novel species (referred here as Campylobacter spp12). Finally, we found a negative correlation between pangenome fluidity and saturation coefficient, with potential implications to the lifestyles of distinct Campylobacter species. Since pangenome analysis depends on the number of available genomes, this correlation could help estimate pangenome metrics of Campylobacter species with less sequenced genomes, helping understand their lifestyle and niche adaptation. Together, our results indicate that the Campylobacter genus should be re-evaluated, with particular attention to the interplay between genome structure and niche segregation.

RevDate: 2023-04-27

Matussek A, Mernelius S, Chromek M, et al (2023)

Genome-wide association study of hemolytic uremic syndrome causing Shiga toxin-producing Escherichia coli from Sweden, 1994-2018.

European journal of clinical microbiology & infectious diseases : official publication of the European Society of Clinical Microbiology [Epub ahead of print].

Shiga toxin-producing Escherichia coli (STEC) infection can cause clinical manifestations ranging from diarrhea to potentially fatal hemolytic uremic syndrome (HUS). This study is aimed at identifying STEC genetic factors associated with the development of HUS in Sweden. A total of 238 STEC genomes from STEC-infected patients with and without HUS between 1994 and 2018 in Sweden were included in this study. Serotypes, Shiga toxin gene (stx) subtypes, and virulence genes were characterized in correlation to clinical symptoms (HUS and non-HUS), and pan-genome wide association study was performed. Sixty-five strains belonged to O157:H7, and 173 belonged to non-O157 serotypes. Our study revealed that strains of O157:H7 serotype especially clade 8 were most commonly found in patients with HUS in Sweden. stx2a and stx2a + stx2c subtypes were significantly associated with HUS. Other virulence factors associated with HUS mainly included intimin (eae) and its receptor (tir), adhesion factors, toxins, and secretion system proteins. Pangenome wide-association study identified numbers of accessory genes significantly overrepresented in HUS-STEC strains, including genes encoding outer membrane proteins, transcriptional regulators, phage-related proteins, and numerous genes related to hypothetical proteins. Whole-genome phylogeny and multiple correspondence analysis of pangenomes could not differentiate HUS-STEC from non-HUS-STEC strains. In O157:H7 cluster, strains from HUS patients clustered closely; however, no significant difference in virulence genes was found in O157 strains from patients with and without HUS. These results suggest that STEC strains from different phylogenetic backgrounds may independently acquire genes determining their pathogenicity and confirm that other non-bacterial factors and/or bacteria-host interaction may affect STEC pathogenesis.

RevDate: 2023-04-26

Rodrigues C, Lanza VF, Peixe L, et al (2023)

Phylogenomics of Globally Spread Clonal Groups 14 and 15 of Klebsiella pneumoniae.

Microbiology spectrum [Epub ahead of print].

Klebsiella pneumoniae sequence type 14 (ST14) and ST15 caused outbreaks of CTX-M-15 and/or carbapenemase producers worldwide, but their phylogeny and global dynamics remain unclear. We clarified the evolution of K. pneumoniae clonal group 14 (CG14) and CG15 by analyzing the capsular locus (KL), resistome, virulome, and plasmidome of public genomes (n = 481) and de novo sequences (n = 9) representing main sublineages circulating in Portugal. CG14 and CG15 evolved independently within 6 main subclades defined according to the KL and the accessory genome. The CG14 (n = 65) clade was structured in two large monophyletic subclades, CG14-I (KL2, 86%) and CG14-II (KL16, 14%), whose emergences were dated to 1932 and 1911, respectively. Genes encoding extended-spectrum β-lactamase (ESBL), AmpC, and/or carbapenemases were mostly observed in CG14-I (71% versus 22%). CG15 clade (n = 170) was segregated into subclades CG15-IA (KL19/KL106, 9%), CG15-IB (variable KL types, 6%), CG15-IIA (KL24, 43%) and CG15-IIB (KL112, 37%). Most CG15 genomes carried specific GyrA and ParC mutations and emerged from a common ancestor in 1989. CTX-M-15 was especially prevalent in CG15 (68% CG15 versus 38% CG14) and in CG15-IIB (92%). Plasmidome analysis revealed 27 predominant plasmid groups (PG), including particularly pervasive and recombinant F-type (n = 10), Col (n = 10), and new plasmid types. While blaCTX-M-15 was acquired multiple times by a high diversity of F-type mosaic plasmids, other antibiotic resistance genes (ARGs) were dispersed by IncL (blaOXA-48) or IncC (blaCMY/TEM-24) plasmids. We first demonstrate an independent evolutionary trajectory for CG15 and CG14 and how the acquisition of specific KL, quinolone-resistance determining region (QRDR) mutations (CG15), and ARGs in highly recombinant plasmids could have shaped the expansion and diversification of particular subclades (CG14-I and CG15-IIA/IIB). IMPORTANCE Klebsiella pneumoniae represents a major threat in the burden of antibiotic resistance (ABR). Available studies to explain the origin, the diversity, and the evolution of certain ABR K. pneumoniae populations have mainly been focused on a few clonal groups (CGs) using phylogenetic analysis of the core genome, the accessory genome being overlooked. Here, we provide unique insights into the phylogenetic evolution of CG14 and CG15, two poorly characterized CGs which have contributed to the global dissemination of genes responsible for resistance to first-line antibiotics such as β-lactams. Our results point out an independent evolution of these two CGs and highlight the existence of different subclades structured by the capsular type and the accessory genome. Moreover, the contribution of a turbulent flux of plasmids (especially multireplicon F type and Col) and adaptive traits (antibiotic resistance and metal tolerance genes) to the pangenome reflect the exposure and adaptation of K. pneumoniae under different selective pressures.

RevDate: 2023-04-26

Cui X, Hu M, Yao S, et al (2023)

BnaOmics: a comprehensive platform combining pan-genome and multi-omics data of Brassica napus.

Plant communications pii:S2590-3462(23)00120-7 [Epub ahead of print].

RevDate: 2023-04-25

Gong H, Huang X, Zhu W, et al (2023)

Pan-genome analysis of the Burkholderia gladioli PV. Cocovenenans reveal the extent of variation in the toxigenic gene cluster.

Food microbiology, 113:104249.

Burkholderia gladioli has been reported as the pathogen responsible for cases of foodborne illness in many countries. The poisonous bongkrekic acid (BA) produced by B. gladioli was linked to a gene cluster absent in non-pathogenic strains. The whole genome sequence of eight bacteria strains, which were screened from the collected 175 raw food and environmental samples, were assembled and analyzed to detect a significant association of 19 protein-coding genes with the pathogenic status. Except for the common BA synthesis-related gene, several other genes, including the toxin-antitoxin genes, were also absent in the non-pathogenic strains. The bacteria strains with the BA gene cluster were found to form a single cluster in the analysis of all B. gladioli genome assemblies for the variants in the gene cluster. Divergence of this cluster was detected in the analysis for both the flanking sequences and those of the whole genome level, which indicates its complex origin. Genome recombination was found to cause a precise sequence deletion in the gene cluster region, which was found to be predominant in the non-pathogenic strains indicating the possible effect of horizontal gene transfer. Our study provided new information and resources for understanding the evolution and divergence of the B. gladioli species.

RevDate: 2023-04-24

Baumdicker F, A Kupczok (2023)

Tackling the pangenome dilemma requires the concerted analysis of multiple population genetic processes.

Genome biology and evolution pii:7137407 [Epub ahead of print].

The pangenome is the set of all genes present in a prokaryotic population. Most pangenomes contain many accessory genes of low and intermediate frequencies. Different population genetics processes contribute to the shape of these pangenomes, namely selection and fitness-independent-processes such as gene transfer, gene loss, and migration. However, their relative importance is unknown and highly debated. Here we argue that the debate around prokaryotic pangenomes arose due to the imprecise application of population genetics models. Most importantly, two different processes of horizontal gene transfer act on prokaryotic populations, which are frequently confused, despite their fundamentally different behavior. Genes acquired from distantly related organisms (termed here acquiring gene transfer, AGT) is most comparable to mutation in nucleotide sequences. In contrast, gene gain within the population (termed here spreading gene transfer, SGT) has an effect on gene frequencies that is identical to the effect of positive selection on single genes. We thus show that selection and fitness-independent population genetic processes affecting pangenomes are indistinguishable at the level of single gene dynamics. Nevertheless, population genetics processes are fundamentally different when considering the joint distribution of all accessory genes across individuals of a population. We propose that, to understand to which degree the different processes shaped pangenome diversity, the development of comprehensive models and simulation tools is mandatory. Furthermore, we need to identify summary statistics and measurable features that can distinguish between the processes, where considering the joint distribution of accessory genes across individuals of a population will be particularly relevant.

RevDate: 2023-04-24

Zhong H, Zheng N, Wang J, et al (2023)

Isolation and pan-genome analysis of Enterobacter hormaechei Z129, a ureolytic bacterium, from the rumen of dairy cow.

Frontiers in microbiology, 14:1169973.

INTRODUCTION: Urea is an important non-protein nitrogen source for ruminants. In the rumen, ureolytic bacteria play critical roles in urea-nitrogen metabolism, however, a few ureolytic strains have been isolated and genomically sequenced. The purpose of this study was to isolate a novel ureolytic bacterial strain from cattle rumen and characterize its genome and function.

METHODS: The ureolytic bacterium was isolated using an anaerobic medium with urea and phenol red as a screening indicator from the rumen fluid of dairy cattle. The genome of isolates was sequenced, assembled, annotated, and comparatively analyzed. The pan-genome analysis was performed using IPGA and the biochemical activity was also analyzed by test kits.

RESULTS: A gram-positive ureolytic strain was isolated. Its genome had a length of 4.52 Mbp and predicted genes of 4223. The 16S rRNA gene and genome GTDB-Tk taxonomic annotation showed that it was a novel strain of Enterobacter hormaechei, and it was named E. hormaechei Z129. The pan-genome analysis showed that Z129 had the highest identity to E. hormaechei ATCC 49162 with a genome average nucleotide identity of 98.69% and possessed 238 unique genes. Strain Z129 was the first E. hormaechei strain isolated from the rumen as we know. The functional annotation of the Z129 genome showed genes related to urea metabolism, including urea transport (urtA-urtE), nickel ion transport (ureJ, tonB, nixA, exbB, exbD, and rcnA), urease activation (ureA-ureG) and ammonia assimilation (gdhA, glnA, glnB, glnE, glnL, glsA, gltB, and gltD) were present. Genes involved in carbohydrate metabolism were also present, including starch hydrolysis (amyE), cellulose hydrolysis (celB and bglX), xylose transport (xylF-xylH) and glycolysis (pgi, pgk, fbaA, eno, pfkA, gap, pyk, gpmL). Biochemical activity analysis showed that Z129 was positive for alkaline phosphatase, leucine arylamidase, acid phosphatase, naphthol-AS-BI-phosphohydrolase, α-glucosidase, β-glucosidase, and pyrrolidone arylaminase, and had the ability to use D-ribose, L-arabinose, and D-lactose. Urea-nitrogen hydrolysis rate of Z129 reached 55.37% at 48 h of incubation.

DISCUSSION: Therefore, the isolated novel ureolytic strain E. hormaechei Z129 had diverse nitrogen and carbon metabolisms, and is a preferred model to study the urea hydrolysis mechanism in the rumen.

RevDate: 2023-04-21

Williams AN, Croxen MA, Demczuk WHB, et al (2023)

Genomic characterization of emerging invasive Streptococcus agalactiae serotype VIII in Alberta, Canada.

European journal of clinical microbiology & infectious diseases : official publication of the European Society of Clinical Microbiology [Epub ahead of print].

Invasive Group B Streptococcus (GBS) can infect pregnant women, neonates, and older adults. Invasive GBS serotype VIII is infrequent in Alberta; however, cases have increased in recent years. Here, genomic analysis was used to characterize fourteen adult invasive serotype VIII isolates from 2009 to 2021. Trends in descriptive clinical data and antimicrobial susceptibility results were evaluated for invasive serotype VIII isolates from Alberta. Isolate genomes were sequenced and subjected to molecular sequence typing, virulence and antimicrobial resistance gene identification, phylogenetic analysis, and pangenome determination. Multilocus sequencing typing identified eight ST42 (Clonal Complex; CC19), four ST1 (CC1), and two ST2 (CC1) profiles. Isolates were susceptible to penicillin, erythromycin, chloramphenicol, and clindamycin, apart from one isolate that displayed erythromycin and inducible clindamycin resistance. All isolates carried genes for peptide antibiotic resistance, three isolates for tetracycline resistance, and one for macrolide, lincosamide, and streptogramin resistance. All genomes carried targets currently being considered for protein-based vaccines (e.g., pili and/or Alpha family proteins). Overall, invasive GBS serotype VIII is emerging in Alberta, primarily due to ST42. Characterization and continued surveillance of serotype VIII will be important for outbreak prevention, informing vaccine development, and contributing to our understanding of the global epidemiology of this rare serotype.

RevDate: 2023-04-21

Gangurde SS, Xavier A, Naik YD, et al (2022)

Two decades of association mapping: Insights on disease resistance in major crops.

Frontiers in plant science, 13:1064059.

Climate change across the globe has an impact on the occurrence, prevalence, and severity of plant diseases. About 30% of yield losses in major crops are due to plant diseases; emerging diseases are likely to worsen the sustainable production in the coming years. Plant diseases have led to increased hunger and mass migration of human populations in the past, thus a serious threat to global food security. Equipping the modern varieties/hybrids with enhanced genetic resistance is the most economic, sustainable and environmentally friendly solution. Plant geneticists have done tremendous work in identifying stable resistance in primary genepools and many times other than primary genepools to breed resistant varieties in different major crops. Over the last two decades, the availability of crop and pathogen genomes due to advances in next generation sequencing technologies improved our understanding of trait genetics using different approaches. Genome-wide association studies have been effectively used to identify candidate genes and map loci associated with different diseases in crop plants. In this review, we highlight successful examples for the discovery of resistance genes to many important diseases. In addition, major developments in association studies, statistical models and bioinformatic tools that improve the power, resolution and the efficiency of identifying marker-trait associations. Overall this review provides comprehensive insights into the two decades of advances in GWAS studies and discusses the challenges and opportunities this research area provides for breeding resistant varieties.

RevDate: 2023-04-20

Pucker B, Irisarri I, de Vries J, et al (2022)

Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions.

Quantitative plant biology, 3:e5.

Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.

RevDate: 2023-04-19

Pugh HL, Connor C, Siasat P, et al (2023)

E. coli ST11 (O157:H7) does not encode a functional AcrF efflux pump.

Microbiology (Reading, England), 169(4):.

Escherichia coli is a facultative anaerobe found in a wide range of environments. Commonly described as the laboratory workhorse, E. coli is one of the best characterized bacterial species to date, however much of our understanding comes from studies involving the laboratory strain E. coli K-12. Resistance-nodulation-division efflux pumps are found in Gram-negative bacteria and can export a diverse range of substrates, including antibiotics. E. coli K-12 has six RND pumps; AcrB, AcrD, AcrF, CusA, MdtBC and MdtF, and it is frequently reported that all E. coli strains possess these six pumps. However, this is not true of E. coli ST11, a lineage of E. coli, which is primarily composed of the highly virulent important human pathogen, E. coli O157:H7. Here we show that acrF is absent from the pangenome of ST11 and that this lineage of E. coli has a highly conserved insertion within the acrF gene, which when translated encodes 13 amino acids and two stop codons. This insertion was found to be present in 97.59 % of 1787 ST11 genome assemblies. Non-function of AcrF in ST11 was confirmed in the laboratory as complementation with acrF from ST11 was unable to restore AcrF function in E. coli K-12 substr. MG1655 ΔacrB ΔacrF. This shows that the complement of RND efflux pumps present in laboratory bacterial strains may not reflect the situation in virulent strains of bacterial pathogens.

RevDate: 2023-04-18

Eisenstein M (2023)

Every base everywhere all at once: pangenomics comes of age.

Nature, 616(7957):618-620.

RevDate: 2023-04-17

Garrison E, Guarracino A, Heumos S, et al (2023)

Building pangenome graphs.

bioRxiv : the preprint server for biology pii:2023.04.05.535718.

Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.

RevDate: 2023-04-17

Wan X, Takala TM, Huynh VA, et al (2023)

Comparative genomics of 40 Weissella paramesenteroides strains.

Frontiers in microbiology, 14:1128028.

Weissella strains are often detected in spontaneously fermented foods. Because of their abilities to produce lactic acid and functional exopolysaccharides as well as their probiotic traits, Weissella spp. improve not only the sensorial properties but also nutritional values of the fermented food products. However, some Weissella species have been associated with human and animal diseases. In the era of vast genomic sequencing, new genomic/genome data are becoming available to the public on daily pace. Detailed genomic analyses are due to provide a full understanding of individual Weissella species. In this study, the genomes of six Weissella paramesenteroides strains were de novo sequenced. The genomes of 42 W. paramesenteroides strains were compared to discover their metabolic and functional potentials in food fermentation. Comparative genomics and metabolic pathway reconstructions revealed that W. paramesenteroides is a compact group of heterofermentative bacteria with good capacity of producing secondary metabolites and vitamin Bs. Since the strains rarely harbored plasmid DNA, they did not commonly possess the genes associated with bacteriocin production. All 42 strains were shown to bear vanT gene from the glycopeptide resistance gene cluster vanG. Yet none of the strains carried virulence genes.

RevDate: 2023-04-14

Olson ND, Wagner J, Dwarshuis N, et al (2023)

Variant calling and benchmarking in an era of complete human genome sequences.

Nature reviews. Genetics [Epub ahead of print].

Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.

RevDate: 2023-04-13

Miranda RP, Turrini PCG, Bonadio DT, et al (2023)

Genome Organization of Four Brazilian Xanthomonas albilineans Strains Does Not Correlate with Aggressiveness.

Microbiology spectrum [Epub ahead of print].

An integrative approach combining genomics, transcriptomics, and cell biology is presented to address leaf scald disease, a major problem for the sugarcane industry. To gain insight into the biology of the causal agent, the complete genome sequences of four Brazilian Xanthomonas albilineans strains with differing virulence capabilities are presented and compared to the GPEPC73 reference strain and FJ1. Based on the aggressiveness index, different strains were compared: Xa04 and Xa11 are highly aggressive, Xa26 is intermediate, and Xa21 is the least, while, based on genome structure, Xa04 shares most of its genomic features with Xa26, and Xa11 share most of its genomic features with Xa21. In addition to presenting more clustered regularly interspaced short palindromic repeats (CRISPR) clusters, four more novel prophage insertions are present than the previously sequenced GPEPC73 and FJ1 strains. Incorporating the aggressiveness index and in vitro cell biology into these genome features indicates that disease establishment is not a result of a single determinant factor, as in most other Xanthomonas species. The Brazilian strains lack the previously described plasmids but present more prophage regions. In pairs, the most virulent and the least virulent share unique prophages. In vitro transcriptomics shed light on the 54 most highly expressed genes among the 4 strains compared to ribosomal proteins (RPs), of these, 3 outer membrane proteins. Finally, comparative albicidin inhibition rings and in vitro growth curves of the four strains also do not correlate with pathogenicity. In conclusion, the results disclose that leaf scald disease is not associated with a single shared characteristic between the most or the least pathogenic strains. IMPORTANCE An integrative approach is presented which combines genomics, transcriptomics, and cell biology to address leaf scald disease. The results presented here disclose that the disease is not associated with a single shared characteristic between the most pathogenic strains or a unique genomic pattern. Sequence data from four Brazilian strains are presented that differ in pathogenicity index: Xa04 and Xa11 are highly virulent, Xa26 is intermediate, and Xa21 is the least pathogenic strain, while, based on genome structure, Xa04 shares with Xa26, and Xa11 shares with X21 most of the genome features. Other than presenting more CRISPR clusters and prophages than the previously sequenced strains, the integration of aggressiveness and cell biology points out that disease establishment is not a result of a single determinant factor as in other xanthomonads.

RevDate: 2023-04-13

Tenea GN (2023)

Metabiotics Signature through Genome Sequencing and In Vitro Inhibitory Assessment of a Novel Lactococcus lactis Strain UTNCys6-1 Isolated from Amazonian Camu-Camu Fruits.

International journal of molecular sciences, 24(7): pii:ijms24076127.

Metabiotics are the structural components of probiotic bacteria, functional metabolites, and/or signaling molecules with numerous beneficial properties. A novel Lactococcus lactis strain, UTNCys6-1, was isolated from wild Amazonian camu-camu fruits (Myrciaria dubia), and various functional metabolites with antibacterial capacity were found. The genome size is 2,226,248 base pairs, and it contains 2248 genes, 2191 protein-coding genes (CDSs), 50 tRNAs, 6 rRNAs, 1 16S rRNA, 1 23S rRNA, and 1 tmRNA. The average GC content is 34.88%. In total, 2148 proteins have been mapped to the EggNOG database. The specific annotation consisted of four incomplete prophage regions, one CRISPR-Cas array, six genomic islands (GIs), four insertion sequences (ISs), and four regions of interest (AOI regions) spanning three classes of bacteriocins (enterolysin_A, nisin_Z, and sactipeptides). Based on pangenome analysis, there were 6932 gene clusters, of which 751 (core genes) were commonly observed within the 11 lactococcal strains. Among them, 3883 were sample-specific genes (cloud genes) and 2298 were shell genes, indicating high genetic diversity. A sucrose transporter of the SemiSWEET family (PTS system: phosphoenolpyruvate-dependent transport system) was detected in the genome of UTNCys6-1 but not the other 11 lactococcal strains. In addition, the metabolic profile, antimicrobial susceptibility, and inhibitory activity of both protein-peptide extract (PPE) and exopolysaccharides (EPSs) against several foodborne pathogens were assessed in vitro. Furthermore, UTNCys6-1 was predicted to be a non-human pathogen that was unable to tolerate all tested antibiotics except gentamicin; metabolized several substrates; and lacks virulence factors (VFs), genes related to the production of biogenic amines, and acquired antibiotic resistance genes (ARGs). Overall, this study highlighted the potential of this strain for producing bioactive metabolites (PPE and EPSs) for agri-food and pharmaceutical industry use.

RevDate: 2023-04-12

Ma X, Sun T, Zhou J, et al (2023)

Pangenomic Study of Fusobacterium nucleatum Reveals the Distribution of Pathogenic Genes and Functional Clusters at the Subspecies and Strain Levels.

Microbiology spectrum [Epub ahead of print].

Fusobacterium nucleatum is a prevalent periodontal pathogen and is associated with many systemic diseases. Our knowledge of the genomic characteristics and pathogenic effectors of different F. nucleatum strains is limited. In this study, we completed the whole genome assembly of the 4 F. nucleatum strains and carried out a comprehensive pangenomic study of 30 strains with their complete genome sequences. Phylogenetic analysis revealed that the F. nucleatum strains are mainly divided into 4 subspecies, while 1 of the sequenced strains was classified into a new subspecies. Gene composition analysis revealed that a total of 517 "core/soft-core genes" with housekeeping functions widely distributed in almost all the strains. Each subspecies had a unique gene cluster shared by strains within the subspecies. Analysis of the virulence factors revealed that many virulence factors were widely distributed across all the strains, with some present in multiple copies. Some virulence genes showed no consistent occurrence rule at the subspecies level and were specifically distributed in certain strains. The genomic islands mainly revealed strain-specific characteristics instead of subspecies level consistency, while CRISPR types and secondary metabolite biosynthetic gene clusters were identically distributed in F. nucleatum strains from the same subspecies. The variation in amino acid sites in the adhesion protein FadA did not affect the monomer and dimer 3D structures, but it may affect the binding surface and the stability of binding to host receptors. This study provides a basis for the pathogenic study of F. nucleatum at the subspecies and strain levels. IMPORTANCE We used F. nucleatum as an example to analyze the genomic characteristics of oral pathogens at the species, subspecies, and strain levels and elucidate the similarities and differences in functional genes and virulence factors among different subspecies/strains of the same oral pathogen. We believe that the unique biological characteristics of each subspecies/strain can be attributed to the differences in functional gene clusters or the presence/absence of certain virulence genes. This study showed that F. nucleatum strains from the same subspecies had similar functional gene compositions, CRISPR types, and secondary metabolite biosynthetic gene clusters, while pathogenic genes, such as virulence genes, antibiotic resistance genes, and GIs, had more strain level specificity. The findings of this study suggest that, for microbial pathogenicity studies, we should carefully consider the subspecies/strains being used, as different strains may vary greatly.

RevDate: 2023-04-10

Lu TY, Smaruj PN, Fudenberg G, et al (2023)

The motif composition of variable-number tandem repeats impacts gene expression.

Genome research pii:gr.276768.122 [Epub ahead of print].

Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up roughly 3% of the human genome but are often excluded from association analysis due to poor read mappability or divergent repeat content. While methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9,422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% associations are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213 that have expression associated with motif variation, demonstrating the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.

RevDate: 2023-04-07

Anonymous (2023)

Tomato super-pangenome highlights the potential use of wild relatives in tomato breeding.

Nature genetics [Epub ahead of print].

RevDate: 2023-04-07

De Mesa CA, Mendoza RM, Penir SMU, et al (2023)

Genomic analysis of Vibrio harveyi strain PH1009, a potential multi-drug resistant pathogen due to acquisition of toxin genes.

Heliyon, 9(4):e14926.

In has increasingly been observed that viral and bacterial coinfection frequently occurs among cultured shrimp and this coinfection could exacerbate the disease phenotype. Here, we describe a newly discovered bacterial strain, Vibrio harveyi PH1009 collected from Masbate Island, Philippines that was found to be co-infecting with the White Spot Syndrome virus in a sample of black tiger prawn, Penaeus monodon. The genome of V. harveyi PH1009 was sequenced, assembled, and annotated. Average Nucleotide identity calculation with Vibrio harveyi strains confirmed its taxonomic identity. It is a potential multi-drug and multi-heavy metal resistant strain based on the multiple antibiotic and heavy metal resistance determinants annotated on its genome. Two prophage regions were identified in its genome. One contained genes for Zona occludens toxin (Zot) and Accessory cholera toxin (Ace), essential toxins of toxigenic V. cholerae strains apart from CTX toxins. Pan-genome analysis of V. harveyi strains, including PH1009, revealed an "open" pan-genome for V. harveyi and a core genome mainly composed of genes necessary for growth and metabolism. Phylogenetic tree based on the core genome alignment revealed that PH1009 was closest to strains QT520, CAIM 1754, and 823tez1. Published virulence factors present on the strain QT520 suggest similar pathogenicity with PH1009. However, PH1009 Zot was not found on related strains but was present in strains HENC-01 and CAIM 148. Most unique genes found in the PH1009 strain were identified as hypothetical proteins. Further annotation showed that several of these hypothetical proteins were phage transposases, integrases, and transcription regulators, implying the role of bacteriophages in the distinct genomic features of the PH1009 genome. The PH1009 genome will serve as a valuable genomic resource for comparative genomic studies and in understanding the disease mechanism of the Vibrio harveyi species.

RevDate: 2023-04-06

Li N, He Q, Wang J, et al (2023)

Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species.

Nature genetics [Epub ahead of print].

Effective utilization of wild relatives is key to overcoming challenges in genetic improvement of cultivated tomato, which has a narrow genetic basis; however, current efforts to decipher high-quality genomes for tomato wild species are insufficient. Here, we report chromosome-scale tomato genomes from nine wild species and two cultivated accessions, representative of Solanum section Lycopersicon, the tomato clade. Together with two previously released genomes, we elucidate the phylogeny of Lycopersicon and construct a section-wide gene repertoire. We reveal the landscape of structural variants and provide entry to the genomic diversity among tomato wild relatives, enabling the discovery of a wild tomato gene with the potential to increase yields of modern cultivated tomatoes. Construction of a graph-based genome enables structural-variant-based genome-wide association studies, identifying numerous signals associated with tomato flavor-related traits and fruit metabolites. The tomato super-pangenome resources will expedite biological studies and breeding of this globally important crop.

RevDate: 2023-04-06

Hochhauser D, Millman A, R Sorek (2023)

The defense island repertoire of the Escherichia coli pan-genome.

PLoS genetics, 19(4):e1010694 pii:PGENETICS-D-23-00150 [Epub ahead of print].

It has become clear in recent years that anti-phage defense systems cluster non-randomly within bacterial genomes in so-called "defense islands". Despite serving as a valuable tool for the discovery of novel defense systems, the nature and distribution of defense islands themselves remain poorly understood. In this study, we comprehensively mapped the defense system repertoire of >1,300 strains of Escherichia coli, the most widely studied organism for phage-bacteria interactions. We found that defense systems are usually carried on mobile genetic elements including prophages, integrative conjugative elements and transposons, which preferentially integrate at several dozens of dedicated hotspots in the E. coli genome. Each mobile genetic element type has a preferred integration position but can carry a diverse variety of defensive cargo. On average, an E. coli genome has 4.7 hotspots occupied by defense system-containing mobile elements, with some strains possessing up to eight defensively occupied hotspots. Defense systems frequently co-localize with other systems on the same mobile genetic element, in agreement with the observed defense island phenomenon. Our data show that the overwhelming majority of the E. coli pan-immune system is carried on mobile genetic elements, explaining why the immune repertoire varies substantially between different strains of the same species.

RevDate: 2023-04-05

Dart E, NA Ahlgren (2023)

New tRNA-targeting transposons that hijack phage and vesicles.

Trends in genetics : TIG pii:S0168-9525(23)00065-3 [Epub ahead of print].

Genomic islands are hotspots for horizontal gene transfer (HGT) in bacteria, but, for Prochlorococcus, an abundant marine cyanobacterium, how these islands form has puzzled scientists. With the discovery of tycheposons, a new family of transposons, Hackl et al. provide evidence for elegant new mechanisms of gene rearrangement and transfer among Prochlorococcus and bacteria more broadly.

RevDate: 2023-04-05

Muzahid NH, Hussain MH, Huët MAL, et al (2023)

Molecular characterization and comparative genomic analysis of Acinetobacter baumannii isolated from the community and the hospital: an epidemiological study in Segamat, Malaysia.

Microbial genomics, 9(4):.

Acinetobacter baumannii is a common cause of multidrug-resistant (MDR) nosocomial infections around the world. However, little is known about the persistence and dynamics of A. baumannii in a healthy community. This study investigated the role of the community as a prospective reservoir for A. baumannii and explored possible links between hospital and community isolates. A total of 12 independent A. baumannii strains were isolated from human faecal samples from the community in Segamat, Malaysia, in 2018 and 2019. Another 15 were obtained in 2020 from patients at the co-located tertiary public hospital. The antimicrobial resistance profile and biofilm formation ability were analysed, and the relatedness of community and hospital isolates was determined using whole-genome sequencing (WGS). Antibiotic profile analysis revealed that 12 out of 15 hospital isolates were MDR, but none of the community isolates were MDR. However, phylogenetic analysis based on single-nucleotide polymorphisms (SNPs) and a pangenome analysis of core genes showed clustering between four community and two hospital strains. Such clustering of strains from two different settings based on their genomes suggests that these strains could persist in both. WGS revealed 41 potential resistance genes on average in the hospital strains, but fewer (n=32) were detected in the community strains. In contrast, 68 virulence genes were commonly seen in strains from both sources. This study highlights the possible transmission threat to public health posed by virulent A. baumannii present in the gut of asymptomatic individuals in the community.

RevDate: 2023-04-04

Commichaux S, Rand H, Javkar K, et al (2023)

Assessment of plasmids for relating the 2020 Salmonella enterica serovar Newport onion outbreak to farms implicated by the outbreak investigation.

BMC genomics, 24(1):165.

BACKGROUND: The Salmonella enterica serovar Newport red onion outbreak of 2020 was the largest foodborne outbreak of Salmonella in over a decade. The epidemiological investigation suggested two farms as the likely source of contamination. However, single nucleotide polymorphism (SNP) analysis of the whole genome sequencing data showed that none of the Salmonella isolates collected from the farm regions were linked to the clinical isolates-preventing the use of phylogenetics in source identification. Here, we explored an alternative method for analyzing the whole genome sequencing data driven by the hypothesis that if the outbreak strain had come from the farm regions, then the clinical isolates would disproportionately contain plasmids found in isolates from the farm regions due to horizontal transfer.

RESULTS: SNP analysis confirmed that the clinical isolates formed a single, nearly-clonal clade with evidence for ancestry in California going back a decade. The clinical clade had a large core genome (4,399 genes) and a large and sparsely distributed accessory genome (2,577 genes, at least 64% on plasmids). At least 20 plasmid types occurred in the clinical clade, more than were found in the literature for Salmonella Newport. A small number of plasmids, 14 from 13 clinical isolates and 17 from 8 farm isolates, were found to be highly similar (> 95% identical)-indicating they might be related by horizontal transfer. Phylogenetic analysis was unable to determine the geographic origin, isolation source, or time of transfer of the plasmids, likely due to their promiscuous and transient nature. However, our resampling analysis suggested that observing a similar number and combination of highly similar plasmids in random samples of environmental Salmonella enterica within the NCBI Pathogen Detection database was unlikely, supporting a connection between the outbreak strain and the farms implicated by the epidemiological investigation.

CONCLUSION: Horizontally transferred plasmids provided evidence for a connection between clinical isolates and the farms implicated as the source of the outbreak. Our case study suggests that such analyses might add a new dimension to source tracking investigations, but highlights the need for detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution.

RevDate: 2023-04-04

Li W, Wang D, Hong X, et al (2023)

Identification and validation of new MADS-box homologous genes in 3010 rice pan-genome.

Plant cell reports [Epub ahead of print].

Identification and validation of ten new MADS-box homologous genes in 3010 rice pan-genome for rice breeding. The functional genome is significant for rice breeding. MADS-box genes encode transcription factors that are indispensable for rice growth and development. The reported 15,362 novel genes in the rice pan-genome (RPAN) of Asian cultivated rice accessions provided a useful gene reservoir for the identification of more MADS-box candidates to overcome the limitation for the usage of only 75 MADS-box genes identified in Nipponbare for rice breeding. Here, we report the identification and validation of ten MADS-box homologous genes in RPAN. Origin and identity analysis indicated that they are originated from different wild rice accessions and structure of motif analysis revealed high variations in their amino acid sequences. Phylogenetic results with 277 MADS-box genes in 41 species showed that all these ten MADS-box homologous genes belong to type I (SRF-like, M-type). Gene expression analysis confirmed the existence of these ten MADS-box genes in IRIS_313-10,394, all of them were expressed in flower tissues, and six of them were highly expressed during seed development. Altogether, we identified and validated experimentally, for the first time, ten novel MADS-box genes in RPAN, which provides new genetic sources for rice improvement.

RevDate: 2023-04-03

von Meijenfeldt FAB, Hogeweg P, BE Dutilh (2023)

A social niche breadth score reveals niche range strategies of generalists and specialists.

Nature ecology & evolution [Epub ahead of print].

Generalists can survive in many environments, whereas specialists are restricted to a single environment. Although a classical concept in ecology, niche breadth has remained challenging to quantify for microorganisms because it depends on an objective definition of the environment. Here, by defining the environment of a microorganism as the community it resides in, we integrated information from over 22,000 environmental sequencing samples to derive a quantitative measure of the niche, which we call social niche breadth. At the level of genera, we explored niche range strategies throughout the prokaryotic tree of life. We found that social generalists include opportunists that stochastically dominate local communities, whereas social specialists are stable but low in abundance. Social generalists have a more diverse and open pan-genome than social specialists, but we found no global correlation between social niche breadth and genome size. Instead, we observed two distinct evolutionary strategies, whereby specialists have relatively small genomes in habitats with low local diversity, but relatively large genomes in habitats with high local diversity. Together, our analysis shines data-driven light on microbial niche range strategies.

RevDate: 2023-04-03

Maranga M, Szczerbiak P, Bezshapkin V, et al (2023)

Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.

mSystems [Epub ahead of print].

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. IMPORTANCE The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.

RevDate: 2023-04-03

Heng E, Tan LL, Tay DWP, et al (2023)

Cost-effective hybrid long-short read assembly delineates alternative GC-rich Streptomyces hosts for natural product discovery.

Synthetic and systems biotechnology, 8(2):253-261.

With the advent of rapid automated in silico identification of biosynthetic gene clusters (BGCs), genomics presents vast opportunities to accelerate natural product (NP) discovery. However, prolific NP producers, Streptomyces, are exceptionally GC-rich (>80%) and highly repetitive within BGCs. These pose challenges in sequencing and high-quality genome assembly which are currently circumvented via intensive sequencing. Here, we outline a more cost-effective workflow using multiplex Illumina and Oxford Nanopore sequencing with hybrid long-short read assembly algorithms to generate high quality genomes. Our protocol involves subjecting long read-derived assemblies to up to 4 rounds of polishing with short reads to yield accurate BGC predictions. We successfully sequenced and assembled 8 GC-rich Streptomyces genomes whose lengths range from 7.1 to 12.1 Mb with a median N50 of 8.2 Mb. Taxonomic analysis revealed previous misrepresentation among these strains and allowed us to propose a potentially new species, Streptomyces sydneybrenneri. Further comprehensive characterization of their biosynthetic, pan-genomic and antibiotic resistance features especially for molecules derived from type I polyketide synthase (PKS) BGCs reflected their potential as alternative NP hosts. Thus, the genome assemblies and insights presented here are envisioned to serve as gateway for the scientific community to expand their avenues in NP discovery.

RevDate: 2023-04-02

Raza Q, Rashid MAR, Waqas M, et al (2023)

Genomic diversity of aquaporins across genus Oryza provides a rich genetic resource for development of climate resilient rice cultivars.

BMC plant biology, 23(1):172.

BACKGROUND: Plant aquaporins are critical genetic players performing multiple biological functions, especially climate resilience and water-use efficiency. Their genomic diversity across genus Oryza is yet to be explored.

RESULTS: This study identified 369 aquaporin-encoding genes from 11 cultivated and wild rice species and further categorized these into four major subfamilies, among which small basic intrinsic proteins are speculated to be ancestral to all land plant aquaporins. Evolutionarily conserved motifs in peptides of aquaporins participate in transmembrane transport of materials and their relatively complex gene structures provide an evolutionary playground for regulation of genome structure and transcription. Duplication and evolution analyses revealed higher genetic conservation among Oryza aquaporins and strong purifying selections are assisting in conserving the climate resilience associated functions. Promoter analysis highlighted enrichment of gene upstream regions with cis-acting regulatory elements involved in diverse biological processes, whereas miRNA target site prediction analysis unveiled substantial involvement of osa-miR2102-3p, osa-miR2927 and osa-miR5075 in post-transcriptional regulation of gene expression patterns. Moreover, expression patterns of japonica aquaporins were significantly perturbed in response to different treatment levels of six phytohormones and four abiotic stresses, suggesting their multifarious roles in plants survival under stressed environments. Furthermore, superior haplotypes of seven conserved orthologous aquaporins for higher thousand-grain weight are reported from a gold mine of 3,010 sequenced rice pangenomes.

CONCLUSIONS: This study unveils the complete genomic atlas of aquaporins across genus Oryza and provides a comprehensive genetic resource for genomics-assisted development of climate-resilient rice cultivars.

RevDate: 2023-03-31

Pagnossin D, Weir W, Smith A, et al (2023)

Streptococcus canis genomic epidemiology reveals the potential for zoonotic transfer.

Microbial genomics, 9(3):.

Streptococcus canis, a multi-host pathogen commonly isolated from dogs and cats, has been occasionally reported in severe cases of human infection. To address the gap in knowledge on its virulence and host tropism, we investigated S. canis genomic epidemiology and report the results of this analysis for the first time. We analysed 59 S. canis whole genome sequences originating from a variety of host species, comprising 39 newly sequenced isolates from UK sources, along with all (n=20) publicly available genomes. Antimicrobial resistance (AMR) phenotype was determined for all 39 available isolates. Genomes were screened for determinants of resistance and virulence. We created a core SNP phylogeny and compared strain clustering to multi-locus sequence typing (MLST) and S. canis M-like protein (SCM) typing. We investigated the dataset for signals of host adaptation using phylogenetic analysis, accessory genome clustering and pan-genome-wide association study analysis. A total of 23 % (9/39) of isolates exhibited phenotypic resistance to lincosamides, macrolides and/or tetracyclines. This was complemented by the identification of AMR-encoding genes in all genomes: tetracycline (tetO 14 %, 8/59; and tetM 7 %, 4/59) and lincosamide/macrolide (ermB, 7 %, 4/59). AMR was more common in human (36 %, 4/11) compared to companion animal (18 %, 5/28) isolates. We identified 19 virulence gene homologues, 14 of which were present in all strains analysed. In an S. canis strain isolated from a dog with otitis externa we identified a homologue of S. pyogenes superantigen SMEZ. The MLST and SCM typing schemes were found to be incapable of accurately representing core SNP-based genomic diversity of the S. canis population. No evidence of host adaptation was detected, suggesting the potential for inter-species transmission, including zoonotic transfer.

RevDate: 2023-03-30

Akparov Z, Hajiyeva S, Abbasov M, et al (2023)

Two major chromosome evolution events with unrivaled conserved gene content in pomegranate.

Frontiers in plant science, 14:1039211.

Pomegranate has a unique evolutionary history given that different cultivars have eight or nine bivalent chromosomes with possible crossability between the two classes. Therefore, it is important to study chromosome evolution in pomegranate to understand the dynamics of its population. Here, we de novo assembled the Azerbaijani cultivar "Azerbaijan guloyshasi" (AG2017; 2n = 16) and re-sequenced six cultivars to track the evolution of pomegranate and to compare it with previously published de novo assembled and re-sequenced cultivars. High synteny was observed between AG2017, Bhagawa (2n = 16), Tunisia (2n = 16), and Dabenzi (2n = 18), but these four cultivars diverged from the cultivar Taishanhong (2n = 18) with several rearrangements indicating the presence of two major chromosome evolution events. Major presence/absence variations were not observed as >99% of the five genomes aligned across the cultivars, while >99% of the pan-genic content was represented by Tunisia and Taishanhong only. We also revisited the divergence between soft- and hard-seeded cultivars with less structured population genomic data, compared to previous studies, to refine the selected genomic regions and detect global migration routes for pomegranate. We reported a unique admixture between soft- and hard-seeded cultivars that can be exploited to improve the diversity, quality, and adaptability of local pomegranate varieties around the world. Our study adds body knowledge to understanding the evolution of the pomegranate genome and its implications for the population structure of global pomegranate diversity, as well as planning breeding programs aiming to develop improved cultivars.

RevDate: 2023-03-30

Carballo J, Bellido AM, Selva JP, et al (2023)

From tetraploid to diploid, a pangenomic approach to identify genes lost during synthetic diploidization of Eragrostis curvula.

Frontiers in plant science, 14:1133986.

INTRODUCTION: In Eragrostis curvula, commonly known as weeping lovegrass, a synthetic diploidization event of the facultative apomictic tetraploid Tanganyika INTA cv. originated from the sexual diploid Victoria cv. Apomixis is an asexual reproduction by seeds in which the progeny is genetically identical to the maternal plant.

METHODS: To assess the genomic changes related to ploidy and to the reproductive mode occurring during diploidization, a mapping approach was followed to obtain the first E. curvula pangenome assembly. In this way, gDNA of Tanganyika INTA was extracted and sequenced in 2x250 Illumina pair-end reads and mapped against the Victoria genome assembly. The unmapped reads were used for variant calling, while the mapped reads were assembled using Masurca software.

RESULTS: The length of the assembly was 28,982,419 bp distributed in 18,032 contigs, and the variable genes annotated in these contigs rendered 3,952 gene models. Functional annotation of the genes showed that the reproductive pathway was differentially enriched. PCR amplification in gDNA and cDNA of Tanganyika INTA and Victoria was conducted to validate the presence/absence variation in five genes related to reproduction and ploidy. The polyploid nature of the Tanganyika INTA genome was also evaluated through the variant calling analysis showing the single nucleotide polymorphism (SNP) coverage and allele frequency distribution with a segmental allotetraploid pairing behavior.

DISCUSSION: The results presented here suggest that the genes were lost in Tanganyika INTA during the diploidization process that was conducted to suppress the apomictic pathway, affecting severely the fertility of Victoria cv.

RevDate: 2023-03-29

Zhen C, Chen XK, Ge XF, et al (2023)

Streptomonospora mangrovi sp. nov., isolated from mangrove soil showing similar metabolic capabilities, but distinct secondary metabolites profiles.

Archives of microbiology, 205(4):148.

A novel actinomycete, designated strain S1-112[ T], was isolated from a mangrove soil sample from Hainan, China, and characterized using a polyphasic approach. Strain S1-112[ T] showed the highest similarity of the 16S rRNA gene to Streptomonospora nanhaiensis 12A09[T] (99.24%). Their close relationship was further supported by phylogenetic analyses, which placed these two strains within a stable clade. The highest values of digital DNA-DNA hybridization (dDDH, 41.4%) and average nucleotide identity (ANI, 90.55%) were detected between strain S1-112[ T] and Streptomonospora halotolerans NEAU-Jh2-17[ T]. Genotypic and phenotypic characteristics demonstrated that strain S1-112[ T] could be distinguished from its closely related relatives. We also profiled the pan-genome and metabolic features of genomic assemblies of strains belonging to the genus Streptomonospora, indicating similar functional capacities and metabolic activities. However, all of these strains showed promising potential for producing diverse types of secondary metabolites. In conclusion, strain S1-112[ T] represents a novel species of the genus Streptomonospora, for which the name Streptomonospora mangrovi sp. nov. was proposed. The type strain is S1-112[ T] (= JCM 34292[ T]).

RevDate: 2023-03-29

Karetnikov DI, Vasiliev GV, Toshchakov SV, et al (2023)

Analysis of Genome Structure and Its Variations in Potato Cultivars Grown in Russia.

International journal of molecular sciences, 24(6): pii:ijms24065713.

Solanum tuberosum L. (common potato) is one of the most important crops produced almost all over the world. Genomic sequences of potato opens the way for studying the molecular variations related to diversification. We performed a reconstruction of genomic sequences for 15 tetraploid potato cultivars grown in Russia using short reads. Protein-coding genes were identified; conserved and variable parts of pan-genome and the repertoire of the NBS-LRR genes were characterized. For comparison, we used additional genomic sequences for twelve South American potato accessions, performed analysis of genetic diversity, and identified the copy number variations (CNVs) in two these groups of potato. Genomes of Russian potato cultivars were more homogeneous by CNV characteristics and have smaller maximum deletion size in comparison with South American ones. Genes with different CNV occurrences in two these groups of potato accessions were identified. We revealed genes of immune/abiotic stress response, transport and five genes related to tuberization and photoperiod control among them. Four genes related to tuberization and photoperiod were investigated in potatoes previously (phytochrome A among them). A novel gene, homologous to the poly(ADP-ribose) glycohydrolase (PARG) of Arabidopsis, was identified that may be involved in circadian rhythm control and contribute to the acclimatization processes of Russian potato cultivars.

RevDate: 2023-03-29

Wartha S, Bretschneider N, Dangel A, et al (2023)

Genetic Characterization of Listeria from Food of Non-Animal Origin Products and from Producing and Processing Companies in Bavaria, Germany.

Foods (Basel, Switzerland), 12(6): pii:foods12061120.

Reported cases of listeriosis from food of non-animal origin (FNAO) are increasing. In order to assess the risk of exposure to Listeria monocytogenes from FNAO, the genetic characterization of the pathogen in FNAO products and in primary production and processing plants needs to be investigated. For this, 123 samples of fresh and frozen soft fruit and 407 samples of 39 plants in Bavaria, Germany that produce and process FNAO were investigated for Listeria contamination. As a result, 64 Listeria spp. isolates were detected using ISO 11290-1:2017. Environmental swabs and water and food samples were investigated. L. seeligeri (36/64, 56.25%) was the most frequently identified species, followed by L. monocytogenes (8/64, 12.50%), L. innocua (8/64, 12.50%), L. ivanovii (6/64, 9.38%), L. newyorkensis (5/64, 7.81%), and L. grayi (1/64, 1.56%). Those isolates were subsequently sequenced by whole-genome sequencing and subjected to pangenome analysis to retrieve data on the genotype, serotype, antimicrobial resistance (AMR), and virulence markers. Eight out of sixty-four Listeria spp. isolates were identified as L. monocytogenes. The serogroup analysis detected that 62.5% of the L. monocytogenes isolates belonged to serogroup IIa (1/2a and 3a) and 37.5% to serogroup IVb (4b, 4d, and 4e). Furthermore, the MLST (multilocus sequence typing) analysis of the eight detected L. monocytogenes isolates identified seven different sequence types (STs) and clonal complexes (CCs), i.e., ST1/CC1, ST2/CC2, ST6/CC6, ST7/CC7, ST21/CC21, ST504/CC475, and ST1413/CC739. The core genome MLST analysis also showed high allelic differences and suggests plant-specific isolates. Regarding the AMR, we detected phenotypic resistance against benzylpenicillin, fosfomycin, and moxifloxacin in all eight L. monocytogenes isolates. Moreover, virulence factors, such as prfA, hly, plcA, plcB, hpt, actA, inlA, inlB, and mpl, were identified in pathogenic and nonpathogenic Listeria species. The significance of L. monocytogenes in FNAO is growing and should receive increasing levels of attention.

RevDate: 2023-03-29

Weltzer ML, D Wall (2023)

Social Diversification Driven by Mobile Genetic Elements.

Genes, 14(3): pii:genes14030648.

Social diversification in microbes is an evolutionary process where lineages bifurcate into distinct populations that cooperate with themselves but not with other groups. In bacteria, this is frequently driven by horizontal transfer of mobile genetic elements (MGEs). Here, the resulting acquisition of new genes changes the recipient's social traits and consequently how they interact with kin. These changes include discriminating behaviors mediated by newly acquired effectors. Since the producing cell is protected by cognate immunity factors, these selfish elements benefit from selective discrimination against recent ancestors, thus facilitating their proliferation and benefiting the host. Whether social diversification benefits the population at large is less obvious. The widespread use of next-generation sequencing has recently provided new insights into population dynamics in natural habitats and the roles MGEs play. MGEs belong to accessory genomes, which often constitute the majority of the pangenome of a taxon, and contain most of the kin-discriminating loci that fuel rapid social diversification. We further discuss mechanisms of diversification and its consequences to populations and conclude with a case study involving myxobacteria.

RevDate: 2023-03-29

Sedeek AM, Salah I, Kamel HL, et al (2023)

Genome-Based Analysis of the Potential Bioactivity of the Terrestrial Streptomyces vinaceusdrappus Strain AC-40.

Biology, 12(3): pii:biology12030345.

Streptomyces are factories of antimicrobial secondary metabolites. We isolated a Streptomyces species associated with the Pelargonium graveolens rhizosphere. Its total metabolic extract exhibited potent antibacterial and antifungal properties against all the tested pathogenic microbes. Whole genome sequencing and genome analyses were performed to take a look at its main characteristics and to reconstruct the metabolic pathways that can be associated with biotechnologically useful traits. AntiSMASH was used to identify the secondary metabolite gene clusters. In addition, we searched for known genes associated with plant growth-promoting characteristics. Finally, a comparative and pan-genome analysis with three closely related genomes was conducted. It was identified as Streptomyces vinaceusdrappus strain AC-40. Genome mining indicated the presence of several secondary metabolite gene clusters. Some of them are identical or homologs to gene clusters of known metabolites with antimicrobial, antioxidant, and other bioactivities. It also showed the presence of several genes related to plant growth promotion traits. The comparative genome analysis indicated that at least five of these gene clusters are highly conserved through rochei group genomes. The genotypic and phenotypic characteristics of S. vinaceusdrappus strain AC-40 indicate that it is a promising source of beneficial secondary metabolites with pharmaceutical and biotechnological applications.

RevDate: 2023-03-28

Lu W, Zhang T, Zhang Q, et al (2023)

FibH Gene Complete Sequences (FibHome) Revealed Silkworm Pedigree.

Insects, 14(3): pii:insects14030244.

The highly repetitive and variable fibroin heavy chain (FibH) gene can be used as a silkworm identification; however, only a few complete FibH sequences are known. In this study, we extracted and examined 264 FibH gene complete sequences (FibHome) from a high-resolution silkworm pan-genome. The average FibH lengths of the wild silkworm, local, and improved strains were 19,698 bp, 16,427 bp, and 15,795 bp, respectively. All FibH sequences had a conserved 5' and 3' terminal non-repetitive (5' and 3' TNR, 99.74% and 99.99% identity, respectively) sequence and a variable repetitive core (RC). The RCs differed greatly, but they all shared the same motif. During domestication or breeding, the FibH gene mutated with hexanucleotide (GGTGCT) as the core unit. Numerous variations existed that were not unique to wild and domesticated silkworms. However, the transcriptional factor binding sites, such as fibroin modulator-binding protein, were highly conserved and had 100% identity in the FibH gene's intron and upstream sequences. The local and improved strains with the same FibH gene were divided into four families using this gene as a marker. Family I contained a maximum of 62 strains with the optional FibH (Opti-FibH, 15,960 bp) gene. This study provides new insights into FibH variations and silkworm breeding.

RevDate: 2023-03-27

Baaijens JA, Bonizzoni P, Boucher C, et al (2022)

Computational graph pangenomics: a tutorial on data structures and their applications.

Natural computing, 21(1):81-108.

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.

RevDate: 2023-03-27

Rehman MNU, Dawar FU, Zeng J, et al (2023)

Complete genome sequence analysis of Edwardsiella tarda SC002 from hatchlings of Siamese crocodile.

Frontiers in veterinary science, 10:1140655.

Edwardsiella tarda is a Gram-negative, facultative anaerobic rod-shaped bacterium and the causative agent of the systemic disease "Edwardsiellosis". It is commonly prevalent in aquatic organisms with subsequent economic loss and hence has attracted increasing attention from researchers. In this study, we investigated the complete genome sequence of a highly virulent isolate Edwardsiella tarda SC002 isolated from hatchlings of the Siamese crocodile. The genome of SC002 consisted of one circular chromosome of length 3,662,469 bp with a 57.29% G+C content and four novel plasmids. A total of 3,734 protein-coding genes, 12 genomic islands (GIs), 7 prophages, 48 interspersed repeat sequences, 248 tandem repeat sequences, a CRISPR component with a total length of 175 bp, and 171 ncRNAs (tRNA = 106, sRNA = 37, and rRNA = 28) were predicted. In addition, the coding genes of assembled genome were successfully annotated against eight general databases (NR = 3,618/3,734, COG = 2,947/3,734, KEGG = 3,485/3,734, SWISS-PROT = 2,787/3,734, GO = 2,648/3,734, Pfam = 2,648/3,734, CAZy = 130/3,734, and TCDB = 637/3,734) and four pathogenicity-related databases (ARDB = 11/3,734, CARD = 142/3,734, PHI = 538/3,734, and VFDB = 315/3,734). Pan-genome and comparative genome analyses of the complete sequenced genomes confirmed their evolutionary relationships. The present study confirmed that E. tarda SC002 is a potential pathogen bearing a bulk amount of antibiotic resistance, virulence, and pathogenic genes and its open pan-genome may enhance its host range in the future.


ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
961 Red Tail Lane
Bellingham, WA 98226

E-mail: RJR8222 @

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin (and even a collection of poetry — Chicago Poems by Carl Sandburg).


ESP now offers a much improved and expanded collection of timelines, designed to give the user choice over subject matter and dates.


Biographical information about many key scientists.

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are now being automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 07 JUL 2018 )