I. Pipeline of de novo assembly

II. Pipeline of novel sequence obtaining


I. Samples and population structure

Our database integrates resequencing data from published cattle genetic works, giving a total of 432 sample set representing 54 breeds. The set contains 10 geographic groups: 108 West European cattle, 83 Central South European cattle, 9 Middle East cattle, 9 Tibetan cattle, 28 Northeast Asian cattle, 47 North and Central Chinese cattle, 33 South Chinese cattle, 24 Indo-Pakistani cattle and 70 African cattle. Principal component analysis (PCA) and ADMIXTURE analysis demonstrated a clear genetic structure with samples from each geographical region clustering together. Six geographically distributed ancestral components can be roughly ascribed to: African taurine, European taurine, Eurasian taurine, East Asian taurine, Chinese indicine, and Indian indicine.


Fig 1. Geographic distribution and population genetics analyses of 432 cattle individuals.

II. Gene Quick search

We integrated information from NCBI, AmiGO 2 and KEGG. Users can input a gene symbol to view basic gene information (e.g., genomic location, transcript and protein sequence, GO ID and GO terms, and relevant KEGG pathways), gene variation information (e.g., SNPs, Indels, and CNVs), and gene selective signatures (e.g., FST, XP-CLR, XP-EHH, Pi, Hp, iHS). We also provide links to Gbrowse and external databases (NCBI, AmiGO 2, and KEGG) to help the user obtain more information, such as gene/mRNA/protein sequence, KEGG Orthology (KO), and motif.

III. Variation search

The BGVD allows users to obtain information of SNPs, indels and CNVs by searching for a specific gene or a genomic region in three versions of the bovine genome (Btau 5.0.1, UMD3.1.1 and newly published ARS-UCD1.2). Users can filter SNPs and indels further by "Advanced Search", in which some parameters, such as minor allele frequency and consequence type, can be set; this option enables users to narrow down the items of interest in an efficient and intuitive manner. The results are presented in an interactive table and graph. For SNPs and indels, users can obtain related details including variant position, alleles, minor allele frequency, variant effect, rs id and the allele frequency distribution pattern in 54 world-wide cattle breeds or six "core" cattle groups. For CNVs, users can obtain information about CNV region, such as intersected genomic region, CNV length, the closest gene, consequence type and copy number distribution in 432 individuals representing 49 cattle populations.

1. SNPs or indels Search

2. CNVs Search

IV. Signature Search

Users can select a specific gene symbol or genomic region, one of the statistical methods (Pi, Hp, iHS, FST, XP-CLR, XP-EHH), and a specific "core" cattle group to view the selection scores. In our database, the selection scores are pre-processed by several algorithms (Z-transform, logarithm) which are commonly used in published papers. The results are retrieved in a tabular format. When users click the "show" button on the table, selective signals are displayed in Manhattan plots or common graphics, where the target region or gene is highlighted in red/blue colour.

V. BGVD Tools

1. Local UCSC Genome Browser

Users can search with a gene symbol, or a transcript name, or a genomic region to view SNPs, indels, CNVs, genomic signature, QTL and conserved elements in the global view. Currently, 57 tracks have been released for the Btau 5.0.1 assembly. The "PDF/PS" item under the "View" menu of navigation bar was used to generate a high quality image in PostScript or PDF formats.

2. Alignment Search Tools (BLAT/BLAST)

We introduced two sequence alignment tools, webBlat and NCBI wwwBLAST. The webBlat can be used to quickly search for homologous regions of a DNA or mRNA sequence, which can then be displayed in the browser. BLAST can find regions of local similarity between sequences, which can be used to infer functional and evolutionary relationships between sequences.

3. Genome Coordinate Conversion Tool (liftOver)

We also introduced a genome coordinate conversion Tool, liftOver. The liftOver tool is used to translate genomic coordinates from one assembly version into another and also retrieves putative orthologous regions in other species. Our database produces two liftOver chain files (Btau5.0.1ToUMD3.1.1.chain.gz and Btau5.0.1ToARS-UCD1.2.chain.gz) and provides an online lift from Btau_5.0.1 to UMD_3.1.1 and from Btau_5.0.1 to ARS-UCD1.2.

Project Organizers

Yu Jiang

Northwest A&F University, Yangling, Shaanxi, China

Email: yu.jiang@nwafu.edu.cn

Chuzhao Lei

Northwest A&F University, Yangling, Shaanxi, China

Email: leichuzhao1118@ nwafu.edu.cn