Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 83

5.3 Variational Autoencoder

Many different variants of the autoencoder have been developed in the past years, but the variational autoencoder (VAE) is the one that achieved a major improvement in this field. VAE is one of the frameworks, which attempts to describe an observation in latent space in a probabilistic manner. Instead of using a single value to describe each dimension of the latent space, the encoder part of VAE uses a probability distribution to describe each latent dimension [17].

Figure 6 shows the structure of the VAE. The assumption is that each input data is generated by some random process conditioned on an unobserved random latent variable . The random process consists of two steps, where is first generated from some prior distribution , and then is generated from a conditional distribution . The probabilistic decoder part of VAE performs the random generation process. We are interested in the posterior over the latent variable , but it is intractable since the marginal likelihood is intractable. To approximate the true posterior, the posterior distribution over the latent variable is assumed to be a distribution parameterized by .

Given an observed dataset , the marginal log‐likelihood is composed of a sum over the marginal log‐likelihoods of all individual data points: , where each marginal log‐likelihood can be written as

(4)

where the first term is the KL divergence [18] between the approximate and the true posterior, and the second term is called the variational lower bound. Since KL divergence is nonnegative, the variational lower bound is defined as

(5)

Figure 6 Architecture of variational autoencoder (VAE).

Therefore, the loss function of training a VAE can be simplified as

(6)

where the first term captures the reconstruction loss, and the second term is regularization on the embedding. To optimize the loss function (6), a reparameterization trick is used. For a chosen approximate posterior , the latent variable is approximated by

(7)

where is an auxiliary variable with independent marginal , and is some vector‐valued function parameterized by . With this reparameterization trick, the variational lower bound can be estimated by sampling a batch of from :

(8)

where and . The selections of and are discussed in detail in Kingma and Welling [17].

Computational Statistics in Data Science

Подняться наверх