Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 27

IOP Publishing Machine Learning for Tomographic Imaging Ge Wang, Yi Zhang, Xiaojing Ye and Xuanqin Mou Chapter 2 Tomographic reconstruction based on a learned dictionary 2.1 Prior information guided reconstruction

Оглавление

In the first chapter, we have discussed the acquisition and representation of prior knowledge on images from two perspectives, which are natural image statistics and neurophysiological HVS functions. In this chapter, we will discuss the computational methods to solve the inverse problem iteratively, aided by prior information.

Previously, we have presented the approach for solving inverse problems in the Bayesian inference framework, which is equivalent to minimizing the objective function via Lagrangian optimization:

xˆ=arg minxˆ∈x{ϕ(x,y)+λψ(x)},(2.1)

where ϕ(x,y) is the data fidelity term to encourage that an estimated CT image x be consistent with observed projection data y; ϕ(x,y) is also a logarithmic function that reflects the noise statistics. As a result, the fidelity term can be characterized by some norm to model the type of statistical noise. For instance, if the noise obeys a Gaussian distribution, we use the L2-norm such as in the form of 12∥Ax−y∥22. In the case of Poisson noise, an informational measure is proper in the form of ∫(Ax−ylnAy). If the inverse problem is subject to impulsive noise, such as salt-and-pepper noise, the L1-norm can be used, which is expressed as ∥Ax−y∥1. On the other hand, the regularization term ψ(x) promotes solutions with some desirable properties. As explained in the first chapter, this term reflects the characteristic of natural images, which can be obtained by removing the redundancy of images. As mentioned before, by using either principal component analysis (PCA) whitening basis functions or zero-phase component analysis (ZCA) whitening basis functions prior to independent component analysis (ICA), the first- and second-order redundancies of images can be removed and the whitened features are uncorrelated. Moreover, a number of excellent basis functions can be learned using a sparse coding technique to capture statistically independent features. To go a step further along this direction, with the use of a multi-layer neural network, we can extract and represent structural information or semantics, which could promote the effectivity and efficiency of the solution to inverse problems. This neural network perspective will be focused on in this book.

It is underlined that Bayesian inference is a classic approach to solve inverse problems. In the Bayesian framework, an image prior is introduced to constrain the solution space for suppression of measurement noise and image artifacts. This strategy generally requires an iterative algorithm. Indeed, there are many optimization methods to minimize a regularized objective function, the optimal result is almost always iteratively obtained for a balance between the data fidelity and the regularization term. In other words, the fidelity term is not necessarily equal to zero. It is so because the observation y contains both the ideal signal we want to obtain and noise/error that cannot be avoided in practice. Then, a regularizer or prior knowledge can be used to guide the search path for an optimal solution xˆ as a trade-off between imperfect measurement and desirable image properties.

In the following, let us intuitively explain regularized image reconstruction. Without loss of generality, let us consider the objective function with the fidelity term in the L2-norm:

xˆ=arg minxˆ∈x∥Ax−y∥22+λψ(x).(2.2)

It is assumed that the observation y is related to the unknowns of interest x through a degradation operator A and, at the same time, y is corrupted by non-ideal factors in the data acquisition process, such as Gaussian noise, and is modeled as

y=Ax+ny,x,n∈RN.(2.3)

Indeed, this imaging model is totally different from conventional image processing tasks which are from images to images or from images to features. The quintessence of image processing is how to discriminate the signal from its noisy or some error contaminated background. It is often not an easy task in the spatial domain because an image appearing as a collection of pixel values does not provide the signal and the noise/error separately. Fortunately, we can transform the image to a feature space in which the image signal and noise/measurement error can be much more easily discriminated. The workflow involved consists of the following three steps. First, an original image is transformed from the spatial domain into a feature space, where a specific aspect of physical properties of the image can be well presented. It should be noted that the transform is invertible and conservative, which means that transformational loss is zero. The Fourier transform and wavelet transform are good examples. With such a lossless transform, structures, errors, and noise in the image are all preserved but exist in a different form. In the second step, according to statistical rules in the transformed feature space, noise and errors can be suppressed by modifying features so that they satisfy the underlying statistical laws; for example, by means of either soft thresholding in the wavelet domain or frequency filtering in the Fourier domain. The former makes refinements in reference to the prior information in the wavelet domain that the structural components should have a sparse distribution of significant coefficients, while the image noise has a broad and weak spectrum of amplitudes in the wavelet domain. The latter is based on the fact that the frequency components of the image concentrate to a low frequency band while the noise spreads over the whole Fourier space. Finally, in the third step the output is obtained by the corresponding inverse transform of the modified features.

Different to the transform method, the key ingredient of the regularization method is to estimate an underlying image by leveraging prior knowledge on desirable features/structures of images, while eliminating the noise and error. The use of a regularizer constrains the image model. In this way, it is convenient and effective to promote favorable properties of images such that the learned model will represent the image characteristics we want, such as sparseness, low rank, smoothness, and so on. Mathematically, the regularizer can be expressed as a norm that measures the image x in a way that is optimal for the inverse problem of interest.

It is worth mentioning that both the above strategies utilize natural image statistics but in different ways. To be specific, the Bayesian framework uses prior knowledge as the constraint in the objective function, while the transform approach uses the statistical distributions of signals and noise/errors in the feature space, which can be also regarded as prior knowledge.

We have discussed the regularization term in the previous chapter. In the following, we will give an intuitive example to show the impact of the regularizer on the efficiency of the solution to the inverse problem. More specifically, the L1-norm-based regularized solution will be compared with that based on the L2-norm.

We will focus on the regularization issue in the context of the machine learning-based inverse problem solution in this book. Mathematically, the L2-norm-based term is called Tikhonov regularization, or ridge regression, which is a commonly used method. The L1-norm-based term is called Lasso regression, which is essentially a sparsity constraint. In the previous chapter, we have shown that the sparse constraint favors an efficient information representation. Next, let us elaborate the effects of the two regularization terms, respectively.

Based on equations (2.2) and (2.3), we want to estimate the original signal, i.e. the image x from the observation y. When n is the Gaussian white noise with zero mean and a variance of σ2, the noise has a constant power spectral density in the frequency domain. Equation (2.3) can be written as ∥Ax−y∥2=σ2. The expected solution space of x is on the surface of the hyper-ball, as shown in figure 2.1.


Figure 2.1. Under the data fidelity constraint, the minimized L2 distance corresponds to a non-sparse solution, and the minimized L1 distance most likely gives a sparse solution. Adapted with permission from Hastie et al (2009). Copyright 2009 Springer.

How to find the optimal solution? It is well known that the L1-norm is defined as the summation of absolute component values of a vector, while the L2-norm is the square root of the sum of the squared element values. Therefore, the geometric meanings of the L1- and L2-norm-based regularization are quite different, with the L1-based regularization clearly favoring a sparse solution, as shown in figure 2.1.

Let us consider the optimization problem as expanding the manifold defined by the regularizer until it intersects the manifold defined by ∥Ax−y∥2=σ2 and then returning the touching point or singling out any point in the intersection. When the touching point does not find the optimal solution, the result can be a smooth version or noisy version, as is depicted in figure 2.1. Since our data fidelity term is quadratic, it is very likely that the touching point in the L1 case is at a corner of the square, cube or hyper-square defined by ∥x∥1. On the other hand, in the L2 case, the circle or ball or hyper-ball defined by ∥x∥2 does not have such a property. Since a hyper-ball has no corner at all, the probability for the touching point to arrive at a sparse position becomes very slim. Intuitively, if the square/cubic boundary expands in the L1 case, the only way to reach a non-sparse solution is for the L1 boundary to meet the solution subspace on the ‘side’ of the titled square/cube. But this only happens if the solution subspace is in parallel to the square/cubic boundary, which could happen, such as in an under-determined case, but with a very low probability. Thus, the L1 solution will almost certainly be sparse, which explains why the L1 solution tends to give a sparse solution. Accordingly, the L1-norm minimization is widely performed to satisfy the sparse constraint.

The L1-norm-based regularization shows a high utility in the inverse problem solution. Theoretically, the convergence of this solution is also important. An iterative solution to equation (2.2) with total variation minimization that is measured with an L1-norm has been proved to be convergent (Chambolle 2004). Nevertheless, the best operator for the sparsest solution should be established based on the L0-norm minimization that counts the number of nonzero components of the solution vector, leading to the highest representation efficiency. Unfortunately, the L0-norm solution is an NP problem and computationally impracticable. In 2006, it was demonstrated that the L1-norm prior is equivalent to the L0-semi-norm prior in signal recovery for some important linear systems (Dandes 2006). Thus, the L1-norm-based sparse regularization is popular in solving the optimization problem. Moreover, being capable of extracting structural information efficiently in images, various machine learning-based techniques have been applied in the optimization framework, which exhibit excellent performance, superior to conventional sparse operators such as total variation minimization (Chambolle and Lions 1997).

Machine Learning for Tomographic Imaging

Подняться наверх