The pan-genome analysis has been applied to a variety of plant species to capture the missing genomic contents from the reference assembly. However, such analysis in mammalian species is still rare, partially due to the unavailability of multiple de novo assemblies. The newly reported goat reference genome (ARS1) is the most continuous livestock assembly generated so far, which provides a great reference for building a goat pan-genome. In addition, we have collected another nine de novo assemblies from seven Caprini species (including sheep, argali sheep, mouflon sheep, wild goat, ibex, Barbary sheep and bluesheep). By iteratively comparing each assemblies to ARS1 and using resequencing data for validation, we have identified 38.3 Mb pan-sequences which are absent from ARS1 accounting for ~57.0% of the total estimated goat pan-genome size. Those pan-sequences contain genic regions and shows population-specific pattern, indicating that they could be of essential biological functions. Furthermore, our results showed that the efficacy of variant calling (SNP, CNV) could be greatly improved with the pan-genome. In conclusion, by presenting the first goat pan-genome, our study has shown that pan-genome should be expanded to mammals especially livestock species for a better exploration of the underlying genomic variations. Furthermore, we have provided a new strategy for pan-genome construction by comparing de novo species from close relative species.

Cumulative size of additional sequences

Alignment hits in pan-sequences of each protein family

Li R, Fu W, Su R, et al. Towards the Complete Goat Pan-Genome by Recovering Missing Genomic Segments From the Reference Genome[J]. Frontiers in Genetics, 10:1169.