Читать книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs - Страница 29
Example 1.2: Genetics example
ОглавлениеIn cancer research, often cell-lines are used derived from tumour tissue (Iorio et al., 2016). Of these cell-lines many measurements are made available in public databases. Such measurements may consist of measured RNA-levels (ratio-scaled values), but also measurements related to mutations (so-called single nucleotide polymorphisms or SNPs) which are on/off measurements and intrinsic of a binary nature.
One of the possible genetic determinants is the copy number of a gene, see Figure 1.5(a); such a gene may be duplicated. An extra layer of gene-regulation is provided by methylation of certain nucleotides of the genome (see Figure 1.5(b)). If a nucleotide is methylated, then transcription of the corresponding gene cannot occur; this area of genetics is called epi-genetics. There are different ways of expressing methylation, but the most simple one is a yes or no whether or not a specific site is methylated. At a certain position on the genome, one nucleotide may have been changed (see Figure 1.5(c)). This is obviously binary since there may be a SNP or no SNP at a certain position on the genome. Hence, treating such data in a multiblock fusion setting requires specialised methods, see Chapter 5.