Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 30

2.2.2 The K-SVD algorithm

As discussed before, OMP works well for a fixed dictionary. On the other hand, it should work better if we optimize the dictionary to fit the data, which brings us to the second problem, i.e. how to train a dictionary from data. K-SVD is a classical dictionary learning algorithm for designing an over-complete dictionary for a sparse representation via a singular value decomposition. It was first introduced by Aharon et al (2006). The kernel function of K-SVD can be regarded as a direct generalization of the K-means, in which each input signal is allowed to be represented by a linear combination of dictionary atoms.

Theoretically an over-complete dictionary D∈RM×K can be directly chosen as a pre-specified transform matrix, such as over-complete wavelets, curvelets, steerable wavelet filters, short-time Fourier transforms, and others. However, dictionary learning driven by training data is a novel route, and has attracted major attention recently due to its amazing performance in terms of both the effectiveness and efficiency of the resultant sparse representation. Specifically, given the training signals Y∈RM×N that contain N samples, the goal is to find the dictionary D∈RM×K that can be used for sparse representation of Y with a coefficient matrix φ∈RK×N. It is noted that different from equation (2.4) in which the y and α are the column vectors, Y and φ are both matrices, representing a number of vectorized samples together to train the dictionary, which is expressed as

minD,φ∥Y−Dφ∥F2s.t.φi0⩽ε∀i.(2.9)

It is also a nonconvex problem over the dictionary D and sparse coefficients φ. Correspondingly, the measure metric on the representation residue ∥·∥F2 is called the Frobenius-norm, which calculates the sum of squares of all the elements of the matrix. Moreover, finding a sparse code and constituting a dictionary for a sparse representation have to be performed in a coordinated fashion. Next, let us introduce K-SVD as a typical example. The objective function of K-SVD is formulated as

minD,φ{∥Y−Dφ∥F2}s.t.∀i,φi0⩽T0.(2.10)

The K-SVD algorithm consists of two steps, namely, sparse coding and dictionary updating.

Step 1: The sparse coding stage.

Before the sparse coding procedure, an initial dictionary is generated by a traditional data sparsifier, such as an over-complete discrete cosine transform (DCT) dictionary. Then, we can seek the sparse coding vector φ for the current dictionary. Figure 2.3 depicts the sparse coding stage, in which red rectangles at the φ∈RK×N matrix present the coding of selected atoms. Usually, the OMP algorithm is used to compute the coefficients, as long as it can supply a solution with a predetermined number of nonzero entries T₀. The OMP method has been discussed in the previous section; here we will focus on the next stage.

Figure 2.3. The sparse coding stage of the dictionary learning method.

Step 2: The dictionary updating stage.

In this stage, the total number of K atoms in the dictionary is updated independently with φ fixed. In practice, the K atoms are updated one by one, which means that in one step, only one atom is updated while the others are fixed. As is seen in figure 2.4(1), the kth column of D, denoted as dk,k∈(1,2,…,K), and the coefficients that correspond to it, i.e. the kth row in φ, denoted as φTk (note that this is not the vector φk that is the kth column in φ) are taken, for example, to be updated while fixing the rest of the atoms, the objective function is rewritten as

∥Y−Dφ∥F2=Y−∑j=1KdjφTjF2=Y−∑j≠kdjφTj−dkφTkF2=Ek−dkφTkF2.(2.11)

Figure 2.4. Dictionary update stage. (1) Independent update of each column of the dictionary and (2) shrink operation.

In this equation, D=d1,d2,…,dK∈RM×K contains K atoms, and Ek stands for the approximation error when atom dk is removed from the dictionary. With this variable separation, Dφ is decomposed into the K matrices with a rank of 1 (denoted as rank-1), in which only the kth column remains in question for updating.

Based on the refined objective function in equation (2.11), whose goal is to search for the closest dk with its coefficient φTk to maximally approximate Ek, the SVD algorithm could be a straightforward method to solve the problem of updating only one column of D and φTk. The SVD decomposition is done with a factorization of the form Ek=UΔV⊤, where both U and V are an orthogonal matrix, and Δ is a diagonal matrix. The key point of the SVD method in this study is that only a few principal components in the diagonal matrix can closely approximate the matrix. Based on this, dk is taken as the first column of U while φTk is taken as the first column of V multiplied by the first element of the diagonal matrix Δ denoted as Δ(1,1).

However, directly decomposing Ek by SVD for the update of dk and the corresponding φTk cannot ensure the sparsity of the updated φTk, because there is no such sparsity control regularizer in SVD (Sahoo and Makur 2013); in other words, the position and value of the zero elements in the update φTk may change. To address the loss of sparsity issue, K-SVD considers those columns of Ek by extracting the corresponding φTk that is nonzero, and obtain a new Ek, denoted as E¯k. A simple solution is to restrict Ek to obtain by a shrink operation E¯k=EkΩk, whose process is illustrated in figure 2.4 by a shrink operation Ωk. In particular, Ωk is defined as a matrix to restrict the Ek by discarding the zero entries. Similarly, we manage to obtain the φ¯Tk by φ¯Tk=φTkΩk, as well. Hence, by rewriting the objective function in equation (2.11), we have an expected dˆk and the corresponding φˆTk:

dˆk,φˆTk=arg mindk,φTkEkΩk−dkφTkΩkF2=arg mindk,φ¯TkE¯k−dkφ¯TkF2.(2.12)

In order to solve it by approximating the E¯k term with a rank-1 matrix, the SVD method is suggested to find alternative dk and φ¯Tk. In detail, E¯k is SVD decomposed into UΔV⊤, setting dk to the first column of the matrix U and φ¯Tk to the first column of V multiplied by Δ(1,1), and updating the whole dictionary, the process solves φ and D iteratively. Once all the K atoms of D are updated column by column in the same fashion to obtain an expected dictionary, we fix this dictionary and go to the sparse coding stage until it reaches the stopping condition. It is noted that the K-SVD algorithm is flexible to use other methods in the sparse coding stage. The workflow of the K-SVD algorithm can be summarized in algorithm 2.1.

Input: A set of training data Y={yi}i=1N∈RM×N

Output: An over-complete dictionary D∈RM×K and a sparse coding vector φ∈RK×N

1: Initialize the dictionary D with K randomly selected training signals

2: while not converge do

3: Sparse coding:

4: for each training signal yi∈Y, use OMP to compute the representation vectors φi, i=1,2,…,N: do

5: minαyi−DφiF2 s.t. ∀i,φi0⩽T0

6: end for

7: Dictionary updating:

8: for k = 1, 2,…,K update the kth atom dk of D and the kth row of the coding coefficients φTk: do

9: Compute the representation error matrix: Ek=Y−∑j≠kdjφTj

10: Obtain E¯k and φ¯Tk by discarding zero entries with a shrink operation E¯k=EkΩk and φ¯Tk=φTkΩk

11: Apply SVD decomposition E¯k=UΔVT. Update the atom dk with the first column of U. The corresponding coefficient vector φ¯Tk is together updated with the first column of V multiplied by Δ(1,1)

12: end for

13: end while

Despite the fact that the K-SVD algorithm converges quickly, it is still computationally expensive as the SVD decomposition must be performed K times, and all N training signals are used for sparse coding at each iteration. This task demands a heavy memory use, in particular when the set of training data is large. Some modifications to the original algorithm were presented that alleviate the computational challenge; for example, the use of batch-OMP for K-SVD presented by Rubinstein et al (2008). In the next section, we will introduce an application of the dictionary learning approach for CT reconstruction.

Machine Learning for Tomographic Imaging

Подняться наверх