Читать книгу The Sparse Fourier Transform - Haitham Hassanieh - Страница 12

Оглавление

3

Simple and Practical Algorithm

3.1 Introduction

In this chapter, we propose a new sublinear Sparse Fourier Transform algorithm over the complex field. The key feature of this algorithm is its simplicity: the algorithm has a simple structure, which leads to efficient runtime with low big-Oh constant. This was the first algorithm to outperform FFT in practice for a practical range of sparsity as we will show later in Chapter 6.

3.1.1 Results

We present an algorithm which has the runtime of


where n is the size of the signal and k is the number of non-zero frequency coefficients. Thus, the algorithm is faster than FFT for k up to O(n/log n). In contrast, earlier algorithms required asymptotically smaller bounds on k. This asymptotic improvement is also reflected in empirical runtimes. For example, we show in Chapter 6 that for n = 222, this algorithm outperforms FFT for k up to about 2,200, which is an order of magnitude higher than what was achieved by prior work.

The estimations provided by our algorithm satisfy the so-called ℓ∞/2 guarantee. Specifically, let y be the minimizer of ||y||2. For a precision parameter δ = 1/nO(1), and a constant > 0, our (randomized) algorithm outputs ′ such that:

with probability 1 − 1/n. The additive term that depends on δ appears in all past algorithms [Akavia 2010, Akavia et al. 2003, Gilbert et al. 2002, Gilbert et al. 2005a, Iwen 2010b, Mansour 1992], although typically (with the exception of Iwen [2010a]) it is eliminated by assuming that all coordinates are integers in the range {−nO(1)nO(1)}. In this chapter, we keep the dependence on δ explicit.

The ℓ∞/2 guarantee of Equation (3.1) is stronger than the 2/2 guarantee of Equation (1.2). In particular, the ℓ∞/2 guarantee with a constant approximation factor C implies the 2/2 guarantee with a constant approximation factor C′, if one sets all but the k largest entries in ′ to 0.1 Furthermore, instead of bounding only the collective error, the ℓ∞/2 guarantee ensures that every Fourier coefficient is well approximated.

3.1.2 Techniques

Recall from Chapter 1 that the Sparse Fourier Transform algorithms work by binning2 the Fourier coefficients into a small number of buckets. Since the signal is sparse in the frequency domain, each bucket is likely3 to have only one large coefficient, which can then be located (to find its position) and estimated (to find its value). For the algorithm to be sublinear, the binning has to be done in sublinear time. Binning the Fourier coefficient is done using an n-dimensional filter vector G that is concentrated both in time and frequency, i.e., G is zero except at a small number of time coordinates, and its Fourier transform is negligible except at a small fraction (about 1/k) of the frequency coordinates (the “pass” region).

Prior work uses different types of filters. Depending on the choice of the filter G, past algorithms can be classified as iteration-based or interpolation-based.

Iteration-based algorithms use a filter that has a significant mass outside its pass region [Gilbert et al. 2002, Gilbert et al. 2005a, Mansour 1992]. For example, Gilbert et al. [2002] and Gilbert et al. [2005a] set G to the rectangular filter which was shown in Figure 1.3(a), in which case is the Dirichlet kernel4, whose tail decays in an inverse linear fashion. Since the tail decays slowly, the Fourier coefficients binned to a particular bucket “leak” into other buckets. On the other hand, Mansour [1992] estimates the convolution in time domain via random sampling, which also leads to a large estimation error. To reduce these errors and obtain the 2/2 guarantee, these algorithms have to perform multiple iterations, where each iteration estimates the largest Fourier coefficient (the one least impacted by leakage) and subtracts its contribution to the time signal. The iterative update of the time signal causes a large increase in runtime. The algorithms in Gilbert et al. [2002] and Mansour [1992] perform this update by going through O(k) iterations, each of which updates at least O(k) time samples, resulting in an O(k2) term in the runtime. The algorithm of Gilbert et al. [2005a] introduced a “bulk sampling” algorithm that amortizes this process but it requires solving instances of a non-uniform Fourier transform, which is expensive in practice.

Interpolation-based algorithms are less common and limited to the design in Iwen [2010b]. This approach uses the aliasing filter presented in Chapter 1, which is a leakage-free filter that allows [Iwen 2010b] to avoid the need for iteration. Recall that in this case, the filter G has Gi = 1 iff i mod n/p = 0 and Gi = 0 otherwise. The Fourier transform of this filter is a “spike train” with period p and hence this filter does not leak; it is equal to 1 on 1/p fraction of coordinates and is zero elsewhere. Unfortunately, however, such a filter requires that p divides n and the algorithm in Iwen [2010b] needs many different values of p. Since in general one cannot assume that n is divisible by all numbers p, the algorithm treats the signal as a continuous function and interpolates it at the required points. Interpolation introduces additional complexity and increases the exponents in the runtime.

3.1.3 Our Approach

The key feature of our algorithm is the use of a different type of filter. In the simplest case, we use a filter obtained by convolving a Gaussian function with a box-car function.5 Because of this new filter, our algorithm does not need to either iterate or interpolate. Specifically, the frequency response of our filter is nearly flat inside the pass region and has an exponential tail outside it. This means that leakage from frequencies in other buckets is negligible, and hence, our algorithm need not iterate. Also, filtering can be performed using the existing input samples xi, and hence our algorithm need not interpolate the signal at new points. Avoiding both iteration and interpolation is the key feature that makes this algorithm efficient.

Further, once a large coefficient is isolated in a bucket, one needs to identify its frequency. In contrast to past work which typically uses binary search for this task, we adopt an idea from Porat and Strauss [2010] and tailor it to our problem. Specifically, we simply select the set of “large” bins which are likely to contain large coefficients, and directly estimate all frequencies in those bins. To balance the cost of the bin selection and estimation steps, we make the number of bins somewhat larger than the typical value of O(k). Specifically, we use , which leads to the stated runtime.6

3.2 Algorithm

We refer to our algorithm as SFT 1.0 and it is shown in Algorithm 3.1. A key element of this algorithm is the inner loop, which finds and estimates each “large” coefficient with constant probability. In Section 3.2.1 we describe the inner loop, and in Section 3.2.2 we show how to use it to construct the full algorithm.

3.2.1 Inner Loop

Let B be a parameter that divides n, to be determined later. Let G be a (1/B, 1/(2B), δ, ω) flat window function described in Section 2.2.1 for some δ and . We will have δ ≈ 1/nc, so one can think of it as negligibly small.

There are two versions of the inner loop: location loops and estimation loops. Location loops, described as the procedure LOCATIONINNERLOOP in Algorithm 3.1, are given a parameter d, and output a set I ⊂ [n] of dkn/B coordinates that contains each large coefficient with “good” probability. Estimation loops, described as the procedure ESTIMATIONINNERLOOP in Algorithm 3.1, are given a set I ⊂ [n] and estimate x̂I such that each coordinate is estimated well with “good” probability.

By Claim 2.4, we can compute in time. Location loops thus take time and estimation loops take time. Figure 3.1 illustrates the inner loop.

For estimation loops, we get the following guarantee.

Lemma 3.1 Let S be the support of the largest k coefficients of , and −S contain the rest. Then for any ≤ 1,


Proof The proof can be found in Appendix A.1. ■

Furthermore, since is a good estimate for |x̂i|—the division is mainly useful for fixing the phase. Therefore in location loops, we get the following guarantee:

Algorithm 3.1 SFT 1.0: Non-iterative Sparse Fourier Transform for


Figure 3.1 Example inner loop of the algorithm on sparse input. This run has parameters n = 256, k = 4, G being the (0.11, 0.06, 2 × 10−9, 133) flat window function in Figure 2.1, and selecting the top 4 of B = 16 samples. In part (a), the algorithm begins with time domain access to Pσ, τ, bx given by (Pσ, τ, bx)i = xσ(i−τ)ωσbi, which permutes the spectrum of x by permuting the samples in the time domain. In part (b), the algorithm computes the time domain signal y = G · Pσ, τ, bx. The spectrum of y (pictured) is large around the large coordinates of Pσ, τ, bx. The algorithm then computes , which is the rate B subsampling of as pictured in part (c). During estimation loops, the algorithm estimates x̂i based on the value of the nearest coordinate in , namely . During location loops (part (d)), the algorithm chooses J, the top dk (here, 4) coordinates of , and selects the elements of [n] that are closest to those coordinates (the shaded region of the picture). It outputs the set I of preimages of those elements. In this example, the two coordinates on the left landed too close in the permutation and form a “hash collision.” As a result, the algorithm misses the second from the left coordinate in its output. Our guarantee is that each large coordinate has a low probability of being missed if we select the top O(k) samples.

Lemma 3.2 Define to be the error tolerated in Lemma 3.1. Then for any i ∈ [n] with |x̂i| ≥ 4E


Proof The proof can be found in Appendix A.2 ■

3.2.2 Non-Iterative Sparse Fourier Transform

Our SFT 1.0 algorithm shown in Algorithm 3.1 is parameterized by and δ. It runs L = O(log n) iterations of the inner loop, with parameters and d = O(1/) as well as δ.7

Lemma 3.3 The algorithm runs in time .

Proof To analyze this algorithm, note that


or |I′| ≤ 2dkn/B. Therefore, the running time of both the location and estimation inner loops is . Computing I′ and computing the medians both take linear time, namely O(Ldkn/B). Thus, the total running time is . Plugging in and d = O(1/), this running time is . We require B = Ω(k/), however; for k > ∊n/log(n/δ), this would cause the run time to be larger. But in this case, the predicted run time is Ω(n log n) already, so the standard FFT is faster and we can fall back on it. ■

Theorem 3.1 Running the algorithm with parameters ∊, δ < 1 gives ′ satisfying


with probability 1 − 1/n and running time .

Proof Define


Lemma 3.2 says that in each location iteration r, for any i with |x̂i| ≥ 4E,


Thus, E[si] ≥ 3L/4, and each iteration is an independent trial, so by a Chernoff bound the chance that si < L/2 is at most 1/2Ω(L) < 1/n3. Therefore by a union bound, with probability at least 1 − 1/n2, iI′ for all i with |x̂i| ≥ 4E.

Next, Lemma 3.1 says that for each estimation iteration r and index i,


Therefore, with probability in at least 2L/3 of the iterations.

Since real is the median of the real , there must exist two r with but one real above real and one below. Hence, one of these r has , and similarly for the imaginary axis. Then


By a union bound over I′, with probability at least 1 − 1/n2 we have for all iI′. Since all iI′ have and |x̂i| ≤ 4E with probability 1 − 1/n2, with total probability 1 − 2/n2 we have


Rescaling and δ gives our theorem. ■

3.2.3 Extension

In this section, we present an extension of the SFT 1.0 algorithm which adds a heuristic to improve the runtime. We refer to this new algorithm as SFT 2.0 and it is shown in Algorithm 3.2. The idea of the heuristic is to apply the aliasing filter to restrict the locations of the large coefficients. The algorithm is parameterized by M that divides n. It performs the aliasing filter as a preprocessing step to SFT 1.0 and uses its output to restrict the frequency locations in the set Ir outputted by the location loops, as shown in Algorithm 3.2.

Observe that . Thus,


This means that the filter is very efficient, in that it has no leakage at all. Also, it is simple to compute. Unfortunately, it cannot be “randomized” using Pσ, τ, b: after permuting by σ and b, any two colliding elements j and j′ (i.e., such that j = j′ mod M) continue to collide. Nevertheless, if x̂j is large, then j mod M is likely to lie in T—at least heuristically on random input.

SFT 2.0 assumes that all “large” coefficients j have j mod M in T. That is, we restrict our sets Ir to contain only coordinates i with i mod MT. We expect that rather than the previous dkn/B. This means that our heuristic will improve the runtime of the inner loops from O(B log(n/δ) + dkn/B) to , at the cost of O(M log M) preprocessing.

Algorithm 3.2 SFT 2.0: Non-iterative Sparse Fourier Transform with heuristic for


Note that on worst case input, SFT 2.0 may give incorrect output with high probability. For example, if xi = 1 when i is a multiple of n/M and 0 otherwise, then y = 0 with probability 1 − M/n and the algorithm will output 0 over supp(x). However, in practice the algorithm works for “sufficiently random” x.

Claim 3.1 As a heuristic approximation, SFT 2.0 runs in O((k2n log(n/δ)/)1/3 log n) as long as k2n log(n/δ).

Justification. First we will show that the heuristic improves the inner loop running time to , then we will optimize the parameters M and B.

Heuristically, one would expect each of the Ir to be a factor smaller than if we did not require the elements to lie in T modulo M. Hence, we expect each of the Ir and I′ to have size . Then in each location loop, rather than spending O(dkn/B) time to list our output, we spend time—plus the time required to figure out where to start listing coordinates from each of the dk chosen elements J of . We do this by sorting J and {σi | iT} (mod M), then scanning through the elements. It takes O(M + dk) time to sort O(dk) elements in [M], so the total runtime of each location loop is . The estimation loops are even faster, since they benefit from |I′| being smaller but avoid the M + dk penalty.

The full algorithm does O(M log M) preprocessing and runs the inner loop L = O(log n) times with d = O(1/). Therefore, given parameters B and M, the algorithm takes time. Optimizing over B, we take


time. Then, optimizing over M, this becomes


time. If k < 2n log(n/δ), the first term dominates.

Note that this is an factor smaller than the running time of SFT 1.0.

1. This fact was implicit in Cormode and Muthukrishnan [2006]. For an explicit statement and proof see Gilbert and Indyk [2010, remarks after Theorem 2].

2. Throughout this book, we will use the terms “Binning” and “Bucketization” interchangeably.

3. One can randomize the positions of the frequencies by sampling the signal in time domain appropriately as we showed in Section 2.2.2.

4. The Dirichlet kernel is the discrete version of the sinc function.

5. A more efficient filter can be obtained by replacing the Gaussian function with a Dolph-Chebyshev function. (See Figure 2.1 for an illustration.)

6. Although it is plausible that one could combine our filters with the binary search technique of Gilbert et al. [2005a] and achieve an algorithm with a O(k logc n) runtime, our preliminary analysis indicates that the resulting algorithm would be slower. Intuitively, observe that for n = 222 and k = 211, the values of and k log2 n = 45056 are quite close to each other.

7. Note that B is chosen in order to minimize the running time. For the purpose of correctness, it suffices that Bck/ for some constant c.

The Sparse Fourier Transform

Подняться наверх