Читать книгу Genome Editing in Drug Discovery - Группа авторов - Страница 32
3.2 CRISPR Biology in a Nutshell
ОглавлениеCRISPR, or Clustered Regularly Interspaced Short Palindromic Repeats, represents an adaptive immune system of microbes, able to adapt in response to invasions by mobile genetic elements such as bacteriophages, plasmids, and transposons. The locus encoding for the components of the CRISPR system was discovered as an array of palindromic repeats interrupted by a 20–40 nt sequence downstream of the iap gene in Escherichia coli (Ishino et al. 1987; Nakata et al. 1989). Thanks to the increasing availability of sequences from the microbial species, the structure of CRISPR loci, with many properties relating to their function, was uncovered (Mojica et al. 2000; Jansen et al. 2002).
The identifying trait of the CRISPR locus in an array of alternating identical repeats, interleaved by unique spacersequence (Figure 3.1a), with extreme interspecies diversity in sequence and size, usually in the range of 20–72 bp (Grissa et al. 2007). Bioinformatic analysis of spacer sequences within CRISPR arrays has shown a striking homology to various sequences found in bacteriophages and plasmids (Bolotin et al. 2005; Mojica et al. 2005; Pourcel et al. 2005), which was an early clue that CRISPR systems act as a defense mechanism against mobile genetic elements (Makarova et al. 2006). The collection of the spacers within a microbial species is extremely diverse and dynamic (Weinberger et al. 2012), as the microbes can acquire new spacers upon neutralizing the infection (Barrangou et al. 2007; Marraffini and Sontheimer 2008). The spacer acquisition represents the adaptive component of this immune system, as this confers resistance to reinfection by the same mobile genetic element.
In most species, repeat monomers vary between 23 and 47 bp in length (Godde and Bickerton 2006), and in most species consist of partially palindromic sequences, able to form stable secondary structures (Kunin et al. 2007). Related species can have similar repeat sequences, but the overall bacterial and archaeal sequence diversity of both spacers and repeats is great.
Figure 3.1 Phases of CRISPR‐mediated immune response. (a) The simplified architecture of an idealized CRISPR locus, containing a CRISPR array consisting of a promoter‐containing leader sequence and spacer and repeat sequences (depicted on the left) and several different cas genes (depicted in different colors on the right), which generate functional components of the CRISPR‐Cas system. During the infection by a mobile genetic element (depicted as a bacteriophage here), invading DNA can be processed by the RecBCD/AddAB systems so that it can be integrated as a spacer by the Cas1:Cas2 adaptation complex as a part of the naïve adaptation phase (b). In the expression phase (c), the CRISPR array is transcribed into pre‐crRNA, which is further processed to crRNA and paired with Cas proteins to form a functional effector complex. Upon a reinfection with phage carrying the same or similar protospacer sequence, the effector complex recognizes the sequence (d) and induces the degradation of the target. Suitable fragments generated during the interference phase can be used as spacers in the primed adaptation phase (e), updating the CRISPR array with the most recent version of suitable spacers.
Immediately adjacent to the CRISPR array is an AT‐rich element, most frequently between 100 and 400 bp long (Jansen et al. 2002; Alkhnbashi et al. 2016). It was subsequently shown that this sequence, termed leader sequence, contains the promoter used to direct transcription of the CRISPR array (Figure 3.1a) and determines the transcriptional start site (Pougach et al. 2010; Pul et al. 2010). Transcription of the CRISPR locus and its subsequent processing ultimately gives rise to CRISPR RNA (crRNA), which provides specificity to the CRISPR systems and guides the degradation of the invading nucleic acids (Brouns et al. 2008). Much like the repeats, leaders are up to 80% identical within a genome, but quite dissimilar among species.
CRISPR arrays are often surrounded by a collection of conserved protein coding genes, termed CRISPR‐associated or cas genes; these genes encode for the protein components of the CRISPR immunological systems. The groups of protein coding genes vary dramatically in composition, order, and sequence between species, underpinning the diversity of CRISPR systems, and their modes of action. Indeed, the complement of cas genes, together with the structure of the CRISPR locus is the basis for CRISPR classification (Makarova et al. 2019). Furthermore, many genes not evolutionary related to cas genes can also be found within the CRISPR loci. These genes often encode for proteins providing ancillary functions, or are evolutionary relicts and do not have a role attributed to them yet, and some loci also encode for noncoding RNA. Furthermore, there might be multiple CRISPR loci within a single genome: E. coli has four loci (Touchon and Rocha 2010) and Methanocaldococcus jannaschii has up to 18 (Lillestol et al. 2006), but rarely more than one or two are active simultaneously (Horvath et al. 2008).
All the components of CRISPR loci function together to provide an adaptive immune response. The CRISPR immune response can be divided into three phases (Figure 3.1):
1 Adaptation (spacer acquisition, or immunization). On a rare occasion during infection by a bacteriophage or other mobile elements, suitable pieces of invaders genome can be snatched up by the microbe’s defense machinery and integrated into CRISPR array as a spacer by the adaptation machinery (Figure 3.1b). This acquisition of not previously encountered spacers is known as naïve adaptation. Once acquired, the spacer acts as a heritable record of immunization and will restrict mobile elements which have the exact or similar sequence through the next stages of the immune response.
2 Expression (crRNA biogenesis). Once acquired, the spacer can be utilized to fight future infections (Figure 3.1c). The whole CRISPR array is transcribed from the promoter located within the leader sequence, producing a pre‐crRNA transcript. With the help of Cas proteins, other small RNA molecules, or the host’s machinery, pre‐crRNA is processed into mature crRNA, which are paired with the effector Cas proteins to form a functional effector complex that confers immunity.
3 Interference (immunity) is maintained by the assembled Cas:crRNA effector complex. Invading genomes that carry sequence complementary (or partially complementary) to one of the spacers will be recognized by base pairing of the crRNA, and subsequently degraded by the nucleolytic activity of the associated Cas proteins, thus terminating the infection (Figure 3.1d).
In some circumstances, the degraded nucleic acids can be captured by the adaptation complex and integrated as a new spacer into the CRISPR array, restarting the process. This primed adaption (Figure 3.1e), where an invading genome is neutralized by the previously acquired spacer and actively includes Cas systems, is several orders of magnitude more efficient than the naïve adaptation (Staals et al. 2016; Stringer et al. 2020), and acts as a magnificent example of adaptive immunity.
One caveat of an immune system relying on nucleic acid base recognition is how to discriminate between the invading genome and endogenous sequences (for example, in the CRISPR array). Nearly all CRISPR systems have a discrimination mechanism where a short sequence adjacent to the target sequence must be recognized by the effector complex to efficiently bind to and then degrade the target sequence (Garneau et al. 2010; Sashital et al. 2012; Anders et al. 2014). Similarly, these species‐specific protospacer adjacent motifs (PAM) are recognized by the adaptation complex and are processed in such a way so they are not integrated into the CRISPR array (Datsenko et al. 2012; Wang et al. 2015; Rollie et al. 2018). The presence of PAM adjacent to the target sequence (collectively termed protospacer) and its absence from the CRISPR array ensure correct recognition of invading genomes as nonself and preventing cleavage of the host genome. It is important to note that the exact sequence of PAM required for the interference and adaptation stages vary dramatically between species (for example, for Streptococcus pyogenes, PAM is 5’‐NGG‐3’, while for Staphylococcus aureus, it is 5’‐NNGRRT‐3’, where N denotes any nucleotide and R is A or G) and different taxa of CRISPR systems (Mojica et al. 2009; Shah et al. 2013), and often can be fairly liberal (Leenay et al. 2016), allowing the immune response to be responsive even if the mutation arise within the PAM or protospacer.