Читать книгу Algorithms in Bioinformatics - Paul A. Gagniuc - Страница 55
2.2 Rules of Engagement
ОглавлениеGenome size refers to the amount of DNA contained in a haploid genome (a single set of chromosomes). The genome size is expressed in terms of base pairs (bp) and the related transformations: kilo base pairs (1 kbp = 1000 bp), or mega base pairs (1 Mbp = 1 000 000 bp), or giga base pairs (1000 Mbp = 1 Gbp), and so on. By excellence, base pairs are discrete units. Nonetheless, these units of measurement are also used to express averages. For single-stranded DNA (ssDNA)/RNA sequences, the unit of measurement is the nucleotide (nt) and is written as: 1 000 000 nt, 1000 Knt, 1 Mnt, 0.001 Gnt, and so on. However, most often than not, base pairs are written as simple bases when the context is understood (e.g. 1 000 000 b, 1000 kb, 1 Mb, 0.001 Gb, and so on). For instance, the notations “b,” “kb,” “Mb,” “Gb” are used when referring to DNA/RNA sequences in text format. FASTA files contain nucleic acid sequences in the 5′–3′ direction. Technically, all nucleic acids represented as FASTA are single-stranded; however, through complementarity, the reference can be considered as double-stranded. In this chapter, the CG% content is mentioned as an intuitive parameter for the overall composition of the genomes of different species. Note that the (C+G)% or GC% content represents the percentage of guanine and cytosine along a DNA or RNA sequence (e.g. a DNA/RNA fragment, a gene, an entire genome).