Читать книгу Genotyping by Sequencing for Crop Improvement - Группа авторов - Страница 27
1.5.1 Genotyping‐by‐Sequencing (GBS)
ОглавлениеGBS is an NGS‐based reduced representation sequencing technique for the identification of genome‐wide SNPs and genotyping large populations (Bhatia et al. 2013). GBS is a one‐step approach for the identification and utilization of markers in a single reaction. It is a complexity reduction procedure where a combination of restriction enzymes is used to separate low copy sequences from high copy repetitive regions. In general, GBS involves the sequencing of fragments generated through restriction digestion of the genome on the NGS platform. In this process, the DNA of the population is digested with RE followed by ligation of RE‐specific adaptors containing genotype‐specific barcode sequences and sites for binding PCR and sequencing primers (Figure 1.1). The fragments thus generated can be PCR amplified and an equal volume of PCR product from different individuals are pooled in a tube. The fragments in the pool can be selected based on their size and sequenced on the NGS platform. The choice of restriction enzymes depends upon the complexity and size of the genome. Presently, different versions of GBS are available, which includes RAD‐seq (restriction associated DNA sequencing), ddRAD‐seq (double‐digest restriction associated sequencing), SLAF‐seq (specific‐locus amplified fragment sequencing), Rest‐seq (restriction DNA sequencing), Skim GBS (skim‐based GBS) (Bhatia 2020). These versions differ with respect to fragment size selection, the extent of complexity reduction, and genome coverage. Since GBS is a population‐dependent genotyping method, to make it cost‐effective a low‐depth sequencing is adopted which caused a high rate of missing data. The low‐depth sequencing makes it an ineffective genotyping approach in heterozygous populations. GBS has low genome coverage due to reduced representation sequencing.
Figure 1.1 An example of GBS and GBS data analysis workflow for identification of SNP markers.
GBS is being widely used to capture SNPs and other marker variations by NGS. GBS overtook the conventional genotyping procedures involving the use of traditional markers such as RAPD, AFLP, SSR, and many others in terms of time, labor, and cost involved. As an example, GBS can generate data of thousands of markers in a large population in a week, which can be analyzed in a month (Bhatia et al. 2018). The approach has been utilized in the mapping of several economically important traits in a number of crop plants (Poland and Rife 2012). Most of the developing countries have in‐house computational facilities that are being used for GBS analysis. Few online servers are also available, where GBS analysis can be done using in‐built pipelines such as cyverse (www.cyverse.org); however, these are unable to analyze the large dataset. Further speed of analysis depends upon the internet speed. Alignment of NGS‐based reads and calling SNPs and Indels are the two major steps in GBS analysis, for which several pipelines are available publically such as Stacks, IGST, GB‐eaSY, TASSEL‐GBS, FAST‐GBS, UNEAK, etc. (Wickland et al. 2017).
Another important pipeline widely used for NGS data analysis is dDocent pipeline (www.dDocent.com) which is a simple bash wrapper to quality analysis, assemble, map, and call SNPs from almost any kind of RAD sequencing (Puritz et al. 2014). However, most of these pipelines are hard to code for a student with little bioinformatics background. Most of these pipelines vary with respect to the complexity of the genome and computational space required. Besides there are several bioinformatics tools such as BWA, Bowtie2, SAM tools, GATK, BCFtools including a set of Perl utility scripts (Kagale et al. 2016) that can be used for GBS data analysis. However, there should be knowledge of the installation and usage of these tools for proper utilization in data analysis. With the advancements in NGS approaches, GBS has become a widely used approach in plant breeding and genetics, particularly for understanding complex quantitative traits.
DArT‐seq GBS (https://www.diversityarrays.com/technology‐and‐resources/dartseq/) somehow overcomes the limitation of the missing data point. The technique is an extension of traditional DArT technology where DArT representations are sequenced on the NGS platform. The fragment sequencing enables a dramatic increase in the number of genomic fragments analyzed and an increase in the number of reported markers thus making it a cost‐effective technology than the initial DArT method.