Читать книгу Hardness of Approximation Between P and NP - Aviad Rubinstein - Страница 9
Оглавление1
The Frontier of Intractability
The combination of vast amounts of data, unprecedented computing power, and clever algorithms allows today’s computer systems to drive autonomous cars, beat the best human players at chess and Go, and live stream videos of cute cats across the world. Yet computers can fail miserably at predicting outcomes of social processes and interactions, elections being only the most salient example. And this is hardly surprising, as there are two potent reasons for such unpredictability: One reason is that human agents are driven by emotions and hormones and billions of neurons interacting with a complex environment, and as a result their behavior is extremely hard to model mathematically. The second reason is perhaps a bit surprising, and certainly closer to the concerns of this book: Even very simple and idealized models of social interaction, in which agents have extremely clear-cut objectives, can be intractable.
To have any hope of reasoning about human behavior on national and global scales, we need a mathematically sound theory. Game theory is the mathematical discipline that models and analyzes the interaction between agents (e.g., voters) who have different goals and are affected by each other’s actions. The central solution concept in game theory is the Nash equilibrium. It has an endless list of applications in economics, politics, biology, etc. (e.g., [Aumann 1987]).
By Nash’s theorem [Nash 1951], an equilibrium always exists. Furthermore, once at a Nash equilibrium, players have no incentive to deviate. The main missing piece in the puzzle is:
How do players arrive at an equilibrium in the first place?
After many attempts by economists for more than six decades1 (e.g., [Brown 1951, Foster and Young 2006, Hart and Mas-Colell 2003, Kalai and Lehrer 1993, Lemke and Howson 1964, Robinson 1951, Scarf 1967, Shapley 1964]), we still don’t have a satisfying explanation.
About ten years ago, the study of this fundamental question found a surprising answer in computer science: Chen et al. [2009b] and Daskalakis et al. [2009a] proved that finding a Nash equilibrium is computationally intractable.2 And if no centralized, specialized algorithm can find an equilibrium, it is even less likely that distributed, selfish agents will naturally converge to one. This casts doubt over the entire solution concept.
For the past decade, the main remaining hope for Nash equilibrium has been approximation. The central open question in this field has been:
Is there an efficient algorithm for finding an approximate Nash equilibrium?
We give a strong negative resolution to this question: our main result rules out efficient approximation algorithms for finding Nash equilibria.3
Our theorem is the latest development in a long and intriguing technical story. The first question we computer scientists ask when encountering a new algorithmic challenge is: is it in P, the class of polynomial time (tractable) problems; or is it NP-hard, like Satisfiability and the Traveling Salesperson Problem (where the best known algorithms require exponential time)? Approximate Nash equilibrium falls into neither category; its complexity lies between P and NP-hard—hence the title of our book. Let us introduce two (completely orthogonal) reasons why we do not expect it to be NP-hard.
The first obstacle for NP-hardness is the totality of Nash equilibrium. When we say that Satisfiability or the Traveling Salesperson Problems are NP-hard, we formally mean that it is NP-hard to determine whether a given formula has a satisfying assignment, or whether a given network allows the salesperson to complete her travels within a certain budget. In contrast, by Nash’s theorem an equilibrium always exists. Deciding whether an equilibrium exists is trivial, but we still don’t know how to find one. This is formally captured by an intermediate complexity class called PPAD.
The second obstacle for NP-hardness is an algorithm (the worst kind of obstacle for intractability). The best known algorithms for solving NP-hard or PPAD-hard problems require exponential ≈ 2n) time, and there is a common belief (formulated a few years ago as the “Exponential Time Hypothesis” [Impagliazzo et al. 2001]) that much faster algorithms do not exist. In contrast, an approximate Nash equilibrium can be found in quasi-polynomial (≈ nlog n) time. Notice that this again places the complexity of approximate Nash equilibrium between the polynomial time solvable problems in P and the exponential time required by NP-hard problems. Therefore, approximate Nash equilibrium is unlikely to be NP-hard—or even PPAD-hard, since we know of no quasi-polynomial algorithm for any other PPAD-hard problem.
As illustrated in the last two paragraphs, approximate Nash equilibrium is very far from our standard notion of intractability, NP-hardness. In some sense, it is one of the computationally easiest problems that we can still prove to be intractable—it lies at the frontier of our understanding of intractability. Unsurprisingly, the techniques we had to master to prove the intractability of approximate Nash equilibrium are useful for many other problems. In particular, we also prove hardness of approximation for several other interesting problems that either belong to the class PPAD or have quasi-polynomial time algorithms.
1.1 PPAD: Finding a Needle You Know Is in the Haystack
Consider a directed graph G. Each edge contributes to the degree of the two vertices incident on it. Hence the sum of the degrees is twice the number of edges, and in particular it is even. Now, given an odd-degree vertex v, there must exist another odd-degree vertex. But can you find one? There is, of course, the trivial algorithm, which simply brute-force enumerates over all the graph vertices.
Now suppose G further has the property4 that every vertex has in- and out-degree at most 1. Thus, G is a disjoint union of lines and cycles. If v has an outgoing edge but no incoming edge, it is the beginning of a line. Now the algorithm for finding another odd degree node is a bit more clever—follow the path to the end—but its worst case is still linear in the number of nodes.
In the worst case, both algorithms run in time linear in the number of vertices. When the graph is given explicitly, e.g., as an adjacency matrix, this is quite efficient compared to the size of the input. But what if the graph is given as black-box oracles S and P that return, for each vertex, its successor and predecessor in the graph? The amount of work5 required for finding another odd-degree vertex by the exhaustive algorithm, or the path-following algorithm, is still linear in the number of vertices, but now this number is exponential in the natural parameter of the problem, the length of the input to the oracle, which equals the logarithm of the number of nodes. In the oracle model, it is not hard to prove that there are no better algorithms.
Let us consider the computational analog of the black-box oracle model: the oracles S and P are implemented as explicit (“white-box”) circuits, which are given as the input to the algorithm. This is the END-OF-A-LINE problem, which is the starting point for most of our reductions. The computational class PPAD (which stands for Polynomial Parity Argument in Directed graphs) is the class of search problems reducible to END-OF-A-LINE.
Definition 1.1
END-OF-A-LINE [Daskalakis et al. 2009a]. Given two circuits S and P, with m input bits and m output bits each, such that P(0m) = 0m ≠ S(0m), find an input x ∈ {0, 1}m such that P(S(x)) ≠ x or S(P(x)) ≠ x ≠ 0m.
How hard is END-OF-A-LINE? We believe that it is likely to be very hard, almost as hard as the black-box variant. This belief is backed by relativized separations [Morioka 2003] and cryptographic hardness [Bitansky et al. 2015, Garg et al. 2016, Hubácek and Yogev 2017], as well as zero algorithmic progress, even on special cases, since it was first introduced in Papadimitriou [1994]. More importantly, we are embarrassingly bad at gaining insight to a circuit’s functionality by looking at its structure (with some notable exceptions in cryptography [Barak 2004])—hence it seems reasonable to conjecture that the white-box variant is easier than the blackbox problem.
Even more embarrassing is our inability to prove computational intractability. Essentially all “proofs” of computational intractability (including the ones in this book) are conditional; i.e., we assume that some problem is hard (e.g., END-OF-A-LINE or SATISFIABILITY), and reduce it to a new problem we want to “prove” is hard. The intractability of the new problem only holds conditioned on the intractability of the original hard problem. Without first resolving whether P = NP, we cannot prove that END-OF-A-LINE is indeed hard. Furthermore, even merely proving that END-OF-A-LINE is NP-hard would already imply unlikely consequences like NP = coNP [Megiddo and Papadimitriou 1991]. The ultimate reason we believe that END-OF-A-LINE is intractable is that we have no idea how to prove it.
Once we believe that END-OF-A-LINE is indeed intractable, any problem that is PPAD-complete, i.e., “at least as hard as END-OF-A-LINE,” is also intractable. The celebrated works of Chen et al. [2009b] and Daskalakis et al. [2009a] prove that finding an exact Nash equilibrium is PPAD-complete. In Section 1.3 we describe some variants of approximate Nash equilibrium that are also PPAD-complete.
We conclude this section with a few problems (other than variants of Nash equilibrium) that we show are also PPAD-hard:
• Finding an approximate fixed point of an implicit function; Brouwer’s fixed point theorem, which guarantees the existence of a fixed point, lays the mathematical foundation for the rest of our PPAD-complete problems.
• Finding an approximate market equilibrium, which is the conceptual foundation of neoclassical economics.
• Finding an Approximate Competitive Equilibrium from Equal Incomes (A-CEEI), which is an algorithmic problem of practical interest due to its use for allocating classes to students (CourseMatch).
1.1.1 Brouwer’s Fixed Point
Brouwer’s fixed point theorem, together with its generalizations (in particular, Kakutani’s fixed point theorem), is the basis for many of the equilibrium concepts in game theory and economics. It states that any continuous function f from a compact convex set (in particular, the n-dimensional hypercube) to itself has a fixed point, i.e., a point x* such that f (x*) = x*. Just like in the case of a Nash equilibrium and the odd-degree vertex, the existence does not come with an algorithm for finding the fixed point. Brouwer eventually became so unsatisfied with the non-constructive nature of his proof that he founded intuitionism, a formalism of mathematics mandating constructive proofs [Iemhoff 2016].
Before we discuss the computational tractability of finding a fixed point, there are two subtle but important technicalities that we have to clear. The first is that finding an exact fixed point may be “obviously” intractable when all the fixed points are irrational. Fortunately there is a natural notion6 of approximate fixed point: find x such that f (x) ≈ x. The second issue is that we want a finite description of the input to our algorithm. Naively, one can define the value of the function on a grid, but then the set is no longer convex and a fixed point might not exist at all (see, e.g., Roughgarden and Weinstein [2016]). It turns out that a better way to define the input is as an arithmetic circuit: this gives a local and easy-to-verify guarantee that the function is indeed continuous.
Admittedly, there is something dissatisfying about proving computational hardness of “circuit problems”: we are going to prove that this problem is PPAD-hard, i.e., no easier than END-OF-A-LINE—but the reason we believe in the first place that END-OF-A-LINE is intractable is that we don’t know how to gain insight to the functionality of the circuit from staring at its structure. Nevertheless, we begin with the complexity of Brouwer’s fixed point because: this is a fundamental question; it is the starting point of many of our reductions; and, as we discuss later, even the black-box hardness in this case is highly non-trivial.
We know of two algorithms for computing an approximate Brouwer fixed point: there is Scarf’s algorithm [Scarf 1967], but its running time may be exponential in the description of the function; the same holds for brute-force search.
On the complexity side, Daskalakis et al. [2009a] proved that even in three dimensions, finding an exponentially close approximation (x such that ∥f(x) – x∥∞ ≤ 2−n, where n is the size of the input) is PPAD-complete; Chen and Deng [2009] proved the same problem continues to be hard even with only two dimensions. Notice that with a constant number of dimensions, any weaker approximation desideratum becomes trivial for brute-force search. Chen et al. [2009b] showed that in n dimensions, polynomial approximations are also PPAD-complete. We will later prove that even constant approximations are PPAD-complete, i.e., there exists some absolute constant ε > 0 such that it’s hard to find an x for which ∥f(x) − x∥∞ ≤ ε; this is already a significant improvement over the existing state of the art.
We can prove an even stronger form of inapproximability for finding a Brouwer fixed point: so far, we characterized the approximate fixed point with ℓ∞-norm, which means that f(x) and x must be close in every coordinate. Our main result in this regard (Theorem 4.2) is that even if we only require that f(x) and x are close in most of the coordinates, finding such x and f(x) remains PPAD-complete.
1.1.2 Market Equilibrium
Supply and demand is central to our modern understanding of economics: when demand exceeds supply, raising the price makes it less attractive to consumers and more attractive to producers, thereby reducing demand and increasing supply. Vice versa if the supply exceeds the demand. This simple idea marks the birth of neoclassical economics [Jevons 1866, Menger 1871, Walras 1874], and continues to inspire us today when we reason about free market economies. Since the consumption and production of one good depends on other goods, we are interested in a market equilibrium: a vector of prices where the supply and demand of every good matches.
The supply and demand argument has many weak spots, one of which is Giffen goods [Marshall 1895]: Consider a poor 19th-century English family consuming 10,000 calories a day from meat and bread. They prefer to eat meat, but the bread is cheaper—so they spend their daily income on 6 loaves of bread and 4 pounds of meat (each provides 1,000 calories). What happens if the price of bread increases? They have less free money to spend on the preferred luxury good (meat), which increases their demand for the Giffen good (bread). The existence of Giffen goods in practice has been contentiously debated for over a century [Edgeworth 1909, Stigler 1947], with evidence ranging from econometric analysis of historical market data to experiments in lab rats [Battalio et al. 1991].
Despite counterintuitive issues like Giffen goods, Arrow and Debreu proved that, under quite general conditions, a market equilibrium always exists [Debreu and Arrow 1954]. In particular, in the Arrow-Debreu model, agents sell the goods they own and use the revenue to buy other goods they want; this is in contrast to Fisher markets, where agents with a monetary budget purchase from a centralized seller. Market equilibria in both models exist, but can we find them? As in the case of Nash equilibrium, this question is of particular importance because if a centralized, omniscient algorithm cannot compute an equilibrium, it is hard to expect a distributed market with selfish agents to converge to one. In the words of Kamal Jain [Jain 2004, Nisan 2009b]:
If your laptop cannot find it, neither can the market.
In the same paper, Jain also gave an algorithm for computing Arrow-Debreu’s equilibrium when consumers’ utilities are linear. Since then, there has been a long sequence of algorithms for computing or approximating market equilibria (e.g., [Birnbaum et al. 2011, Codenotti et al. 2005, Cole and Fleischer 2008, Devanur et al. 2008, Garg et al. 2015, Garg and Kapoor 2006, Jain 2004, Jain and Vazirani, 2010]), for certain models and utility functions, as well as intractability results in other settings [Chen et al. 2009a, Deng and Du 2008, Garg et al. 2017, Vazirani and Yannakakis 2011]. In particular, Chen et al. [2013] consider a setting of markets that exhibit non-monotonicity: when the price of one product increases, its demand increases. Giffen goods, described above, are one example of non-monotone markets; Chen et al. [2013] construct examples in Arrow-Debreu markets when the increased price of a product increases the revenue of the seller, who now has more available money and demands more of the same good. Chen et al. show that for essentially any class of utility function that exhibits some non-monotonicity, computing the Arrow-Debreu market equilibrium is PPAD-hard.
We extend the PPAD-hardness proof of Chen et al. [2013] and show that even approximate market equilibrium is PPAD-hard (Theorem 9.1). We note that although our inapproximability factor is stronger than that showed by Chen et al., the results are incomparable as ours only holds for the stronger notion of “tight” approximate equilibrium, by which we mean the more standard definition that bounds the two-sided error of the market equilibrium. Chen et al., in contrast, prove that even if we allow arbitrary excess supply, finding a (1/n)-approximate equilibrium is PPAD-hard. Furthermore, for the interesting case of constant elasticity of substitution (CES) utilities with parameter ρ < 0, they show that there exist markets where every (1/2)-tight equilibrium requires prices that are doubly exponentially large (and thus require an exponential-size representation). Indeed, for a general non-monotone family of utility functions, the problem of computing a (tight or not) approximate equilibrium may not belong to PPAD. Nevertheless, the important family of additively separable, concave piecewise-linear utilities is known to satisfy the non-monotone condition [Chen et al. 2013], and yet the computation of (exact) market equilibrium is in PPAD [Vazirani and Yannakakis 2011]. Therefore, we obtain as a corollary that computing an ε-tight approximate equilibrium for Arrow-Debreu markets with additively separable, concave piecewise-linear utilities is PPAD-complete.
1.1.3 A-CEEI (CourseMatch)
University courses have limited capacity, and some are more popular than others. This creates an interesting allocation problem. Imagine that each student has ordered all the possible schedules—bundles of courses—from most desirable to least desirable, and the capacities of the classes are known. What is the best way to allocate seats in courses to students? There are several desiderata for a course allocation mechanism:
Fairness. In what sense is the mechanism “fair”?
Efficiency. Are seats in courses allocated to the students who want them the most?
Feasibility. Are any courses oversubscribed?
Truthfulness. Are students motivated to honestly report their preferences to the mechanism?
Computational efficiency. Can the allocation be computed from the data in polynomial time?
Competitive Equilibrium from Equal Incomes (CEEI) [Foley 1967, Thomson and Varian 1985, Varian 1974] is a venerable mechanism with many attractive properties: In CEEI all agents are allocated the same amount of “funny money”; next they declare their preferences, and then a price equilibrium is found that clears the market. The market clearing guarantees Pareto efficiency and feasibility. The mechanism has a strong, albeit technical, ex post fairness guarantee that emerges from the notion that agents who miss out on a valuable, competitive item will have extra funny money to spend on other items at equilibrium. Truthfulness is problematic—as usual with market mechanisms—but potential incentives for any individual agent to deviate are mitigated by the large number of agents. However, CEEI only works when the resources to be allocated are divisible and the utilities are relatively benign. This restriction has both benefits and drawbacks. It ensures computational feasibility, because CEEI can be computed in polynomial time with a linear or convex program, depending on the utilities involved [Devanur et al. 2008, Ghodsi et al. 2011, Varian 1974]; on the other hand, it is easy to construct examples in which a CEEI does not exist when preferences are complex or the resources being allocated are not divisible. Indeed, both issues arise in practice in a variety of allocation problems, including shifts to workers, landing slots to airplanes, and the setting that we focus on here, courses to students [Budish 2011, Varian 1974].
It was shown in Budish [2011] that an approximation to a CEEI solution, called A-CEEI, exists even when the resources are indivisible and agent preferences are arbitrarily complex, as required by the course allocation problems one sees in practice. The approximate solution guaranteed to exist is approximately fair (in that the students are given almost the same budget), and approximately Pareto efficient and feasible (in that all courses are filled close to capacity, with the possible exception of courses with more capacity than popularity). This result seems to be wonderful news for the course allocation problem. However, there is a catch: Budish’s proof is non-constructive, as it relies on Kakutani’s fixed point theorem.
A heuristic search algorithm for solving A-CEEI was introduced in Othman et al. [2010]. The algorithm resembles a traditional tâtonnement process, in which the prices of courses that are oversubscribed are increased and the prices of courses that are undersubscribed are decreased. A modified version of this algorithm that guarantees courses are not oversubscribed is currently used by the Wharton School (University of Pennsylvania) to assign their MBA students to courses [Budish et al. 2014]. While it has been documented that the heuristic algorithm often produces much tighter approximations than the theoretical bound, on some instances it fails to find even the guaranteed approximation [Budish 2011 (Section 9)].
Thus A-CEEI is a problem where practical interest motivates theoretical inquiry. We have a theorem that guarantees the existence of an approximate equilibrium—the issue is finding it. Can the heuristic algorithms currently used to assign Wharton MBAs to their courses be replaced by a fast and rigorous algorithm for finding an approximate CEEI? Theorem 10.3 answers this question in the negative, showing that computing an A-CEEI is PPAD-complete.
1.2 Quasi-Polynomial Time and the Birthday Paradox
The following bilinear optimization meta-problem captures a wide range of applications, from areas like statistics Sparse Principle Component Analysis (Sparse PCA), graph theory (Clique), and game theory (Nash equilibrium):
For all the applications we consider, once we fix some y*, finding the best feasible x that maximizes x⊤Ay* is a tractable problem. (Similarly, if we were given a good x*, finding a matching y is easy.) But optimizing x and y simultaneously is NP-hard. What about approximations?
Caratheodory’s theorem states that a point v in the convex hull of n points in ℝd can be written as a convex combination of d + 1 points. In general, d + 1 points are necessary, but this number can be reduced drastically if we are willing to settle for approximation. In particular, Barman7 [2015] proves an approximate Caratheodory’s theorem that requires only r = O(ρ/ε2) points to express a point v̂ such that ∥v̂ − v∥p < ε, assuming the n points belong to a unit ℓp-ball. In particular, v̂ can be written as an average over a multi-set of r out of the n points.
Viewing the columns of A as n vectors in ℝn, Barman observes that (after proper normalization) the point v = Ay* is in their convex hull. If we only want to approximately solve the bilinear optimization (1.1), we drastically reduce the dimension of the problem by enumerating over all choices of v̂, and for each v̂ solving the optimization problem Av̂. It turns out that in order to guarantee a good additive approximation of (1.1), we need to set p ≈ log n. The dominant term in the running time of this meta-algorithm is the enumeration over all choices of v̂ (all multi-sets of the columns of A), which takes approximately , i.e., quasi-polynomial time.
The above meta-algorithm provides evidence that the approximate variant of all those problems is much easier than solving them exactly: in particular we believe that NP-hard (respectively, PPAD-hard) problems like 3-SAT (resp. END-OF-A-LINE) require approximately 2n time. This belief is formulated by the Exponential Time Hypothesis, or ETH [Impagliazzo et al. 2001] (resp. ETH for PPAD [Babichenko et al. 2016]). The reasons we believe in ETH are similar to the ones outlined in the previous section for our belief that END-OF-A-LINE is intractable: on one hand, we have a combination of unconditional lower bounds in analogous models and little or no progress on algorithms; on the other hand, we are “forced” to believe it because we have no idea how to prove it (in particular, ETH implies the much weaker P ≠ NP).
However, quasi-polynomial time is still both unrealistic to implement in practice, and does not meet our gold standard of polynomial time in theory. The main question we address in this section is:
Can the quasi-polynomial time be improved to polynomial time?
For Sparse PCA this is indeed possible [Alon et al. 2013, Asteris et al. 2015, Chan et al. 2016]. But for several other problems we can prove that, assuming ETH, quasi-polynomial time is actually necessary:
• Finding a k-subgraph that is almost a clique; this is one of the most fundamental approximation problems in theoretical computer science.
• Finding and counting stable communities in a friendship graph; this problem received a lot of attention in recent years with the emergence of online social networks.
• Finding an approximately optimal signaling scheme; this resolves a recent open question by Dughmi.
• Computing the VC and Littlestone’s Dimensions of a binary concept class, two of the most fundamental quantities in learning theory.
A common approach to all our proofs is the birthday repetition framework due to Aaronson et al. [2014]: construct a reduction from 3-SAT to any of the above problems, with reduction size . Then, assuming ETH, one needs approximately time to solve the problem on the larger instance. A key step in the reduction is to consider subsets of variables; then by the birthday paradox any two subsets are likely to share at least one variable (hence the name “birthday repetition”).
1.2.1 Densest k -Subgraph
k-CLIQUE is one of the most fundamental problems in computer science: given a graph, decide whether it has a fully connected induced subgraph on k vertices. Since it was proven NP-complete by Karp [1972], extensive research has investigated the complexity of its relaxations.
We consider two natural relaxations of k-CLIQUE that have received significant attention from both algorithmic and complexity communities: The first one is to relax the “k” requirement, i.e., looking for a smaller subgraph: Given an n-vertex graph G containing a clique of size k, find a clique of size at least δk for some parameter 0 < δ < 1.
The second natural relaxation is to relax the “Clique” requirement, replacing it with the more modest goal of finding a subgraph that is almost a clique: Given an n-vertex graph G containing a clique of size k, find an induced subgraph of G of size k with (edge) density at least 1 − ε, for some parameter 0 < ε < 1.
The first relaxation has been a motivating example throughout a long line of research that laid the foundations for NP-hardness of approximation [Arora and Safra 1998, Arora et al. 1998, Feige et al. 1996, Håstad 1999, Khot 2001, Zuckerman 2007]. In particular, we now know that it is NP-hard to distinguish between a graph that has a clique of size k, and a graph whose largest induced clique is of size at most k′ = δk, where δ = 1/n1−ε [Zuckerman 2007]. Until our work, the computational complexity of the second relaxation remained largely open. There are a couple of (very different) quasi-polynomial algorithms that guarantee finding a (1 − ε)-dense k-subgraph in every graph containing a k-clique: the meta-algorithm by Barman, which we outlined above, and an older algorithm due to Feige and Seltser [1997], but nothing non-trivial was known about hardness of approximation.
In Chapter 12 we prove that, assuming ETH, even if one makes both relaxations the problem remains intractable. In particular, even if the graph contains a clique of size k, it takes quasi-polynomial time to find a (1 − ε)-dense δk-subgraph, for constant ε > 0 and δ = o(1).
1.2.2 Community Detection
Identifying communities is a central graph-theoretic problem with important applications to sociology and marketing (when applied to social networks), biology and bioinformatics (when applied to protein interaction networks), and more (see, e.g., Fortunato’s classic survey [Fortunato 2010]). Defining what exactly is a community remains an interesting problem on its own (see Arora et al. [2012] and Borgs et al. [2016] for excellent treatment from a theoretical perspective). Ultimately, there is no single “right” definition, and the precise meaning of community should be different for social networks and protein interaction networks.
In Chapter 13 we focus on the algorithmic questions arising from one of the simplest and most canonical definitions, which has been considered by several theoretical computer scientists [Arora et al. 2012, Balcan et al. 2013, Braverman et al. 2017, Mishra et al. 2008].
Definition 1.2
(α, β)-community. Given an undirected graph G = (V, E), an (α, β)-community is a subset S ⊆ V that satisfies:
Strong ties inside the community. For every v ∈ S, |{v} × S| ⋂ E ≥ α. |S|; and
Weak ties to nodes outside the community. For every u ∉ S, |{u} × S| ∩ E ≤ β. |S|.
This problem has been considered by several researchers before: Mishra et al. [2008] gave a polynomial-time algorithm for finding (α, β)-communities that contain a vertex with very few neighbors outside the community. Balcan et al. [2013] give a polynomial-time algorithm for enumerating (α, β)-communities in the special case where the degree of every node is Ω(n). Arora et al. [2012] consider several semi-random models where the edges inside the community are generated at random, according to the expected degree model. For the general case, the latter paper by Arora et al. gave a simple quasi-polynomial (nO(log n)) time for detecting (α, β)-communities whenever α − β is at least some positive constant. (Their algorithm is essentially identical to the meta-algorithm for bilinear optimization that we outlined above.)
We show that, for every constants α > β ∈ (0,1], community detection requires quasi-polynomial time (assuming ETH). For example, when α = 1 and β = 0.01, this means that we can hide a clique C, such that every single vertex not in C is connected to at most 1% of C. Our main result is actually a much stronger inapproximability: even in the presence of a (1, o(1))-community, finding any (β + o(1), β)-community is hard.
Unlike all quasi-polynomial approximation schemes mentioned above, Arora et al.’s algorithm has the unique property that it can also exactly count all the (α, β)-communities. Our second result is that counting even the number of (1, o(1))-communities requires quasi-polynomial time. A nice feature of this result is that we can base it on the much weaker #ETH assumption, which asserts that counting the satisfying assignment for a 3-SAT instance requires time 2Ω(n). (Note, for example, that #ETH is likely to be true even if P = NP.)
1.2.3 VC and Littlestone’s Dimensions
A common and essential assumption in learning theory is that the concepts we want to learn come from a nice, simple concept class, or (in the agnostic case) that they can at least be approximated by a concept from a simple class. When the concept class is sufficiently simple, there is hope for good (i.e., sample-efficient and low-error) learning algorithms.
There are many different ways to measure the simplicity of a concept class. The most influential measure of simplicity is the VC Dimension [Vapnik and Chervonenkis 1971], which captures learning in the Probably Almost Correct (PAC) model. We also consider Littlestone’s Dimension [Littlestone 1987], which corresponds to minimizing mistakes in online learning (see Section 2.5 for definitions). When either dimension is small, there are algorithms that exploit the simplicity of the class, to obtain good learning guarantees.
In Chapter 14 we consider the algorithmic challenge of computing either dimension. In particular, we study the most optimistic setting, where the entire universe and concept class are given as explicit input (a binary matrix whose (x, c)-th entry is 1 iff element x belongs to concept c). In this model, both dimensions can be computed in quasi-polynomial time. Interestingly, the algorithm does not go through the bilinear optimization problem; instead, it exploits the fact that for concept class C, both dimensions are bounded by log |C|. Two decades ago, it was shown that quasi-polynomial time is indeed necessary for both dimensions [Frances and Litman 1998, Papadimitriou and Yannakakis 1996]. The computational intractability of computing the (VC, Littlestone’s) dimension of a concept class suggests that even in cases where a simple structure exists, it may be inaccessible to computationally bounded algorithms.
Theorems 14.1 and 14.2 extend the results of Frances and Litman [1998] and Papadimitriou and Yannakakis [1986] to show that the VC and Littlestone’s Dimensions cannot even be approximately computed in polynomial time.
1.2.4 Signaling
Many classical questions in economics involve extracting information from strategic agents. Lately, there has been growing interest within algorithmic game theory in signaling: the study of how to reveal information to strategic agents (see, e.g., [Cheng et al. 2015, Dughmi 2014, Dughmi et al. 2013, Emek et al. 2014, Miltersen and Sheffet 2012] and references therein). Signaling has been studied in many interesting economic and game theoretic settings. Among them, ZERO-SUM SIGNALING proposed by Dughmi [2014] stands out as a canonical problem that cleanly captures the computational nature of signaling. In particular, focusing on zero-sum games clears away issues of equilibrium selection and computational tractability of finding an equilibrium.
Definition 1.3
ZERO-SUM SIGNALING [Dughmi 2014]. Alice and Bob play a Bayesian zero-sum game where the payoff matrix M is drawn from a publicly known prior. The signaler Sam privately observes the state of nature (i.e., the payoff matrix), and then publicly broadcasts a signal φ(M) to both Alice and Bob. Alice and Bob Bayesian-update their priors according to φ(M)’s and play the Nash equilibrium of the expected game; but they receive payoffs according to the true M. Sam’s goal is to design an efficient signaling scheme φ (a function from payoff matrices to strings) that maximizes Alice’s expected payoff.
Dughmi’s main result proves that assuming the hardness of the PLANTED CLIQUE problem, there is no additive Fully Polynomial Time Approximation Scheme (FPTAS) for ZERO-SUM SIGNALING. The main open question left by Dughmi [2014] is whether an additive PTAS exists. Here we answer this question in the negative: we prove that assuming the ETH [Impagliazzo et al. 2001], obtaining an additive-∈-approximation (for some constant ∈ > 0) requires quasi-polynomial time . This result is tight thanks to a recent quasi-polynomial time algorithm by Cheng et al. [2015]. Another important advantage of our result is that it replaces the hardness of PLANTED CLIQUE with a more believable worst-case hardness assumption (see, e.g., the discussion in Braverman et al. [2015]).
1.3 Approximate Nash Equilibrium
The main result in this book rules out the PTAS (polynomial time approximation schemes) for two-player Nash equilibrium. Consider a game between two players, each choosing between a large number (n) of actions. The inputs to the algorithm are two n × n matrices with entries in [0, 1]. The goal is to find, for every constant ε, an ε-approximate Nash equilibrium; i.e., a mixed strategy for each player, such that either player can gain at most ε by deviating to a different strategy.
This has been the central open question in equilibrium computation for the past decade. There were good reasons to be hopeful: there was a quasi-polynomial time [Lipton et al. 2003], a series of improved approximation ratios [Bosse et al. 2010, Daskalakis et al. 2007, 2009b, Kontogiannis et al. 2009, Tsaknakis and Spirakis 2008], and several approximation schemes for special cases [Alon et al. 2013, Barman 2015, Daskalakis and Papadimitriou 2009, Kannan and Theobald 2007]. Our main result settles this question in the negative:
Theorem 1.1
Main theorem. There exists a constant ϵ > 0 such that, assuming ETH for PPAD, finding an ϵ-approximate Nash equilibrium in a two-player n × n game requires time .
We supplement Theorem 1.1 with a series of other hardness of approximation results for Nash equilibrium in related settings, further establishing the point that even approximate Nash equilibria are intractable. First, we consider different sources of complexity. Our main result shows intractability of approximate Nash equilibria when the complexity of the game arises from each player choosing among many actions. In Theorems 5.1 and 5.2, we prove that in games where each player only has two actions, the complexity can arise from a large number of players; finding an approximate Nash equilibrium in n-player, binary action games is PPAD-hard (settling another open question from Daskalakis’s thesis [Daskalakis 2008]). Alternatively, even if there are only two players and the number of actions is constant, a different source of complexity can be the players’ uncertainty; finding an approximate Bayesian Nash equilibrium in such incomplete information games is also PPAD-hard (Corollary 8.1).
We also prove intractability in different models: query complexity, communication complexity, and uncoupled dynamics (settling a long list of open questions from Babichenko [2012, 2016], Chen et al. [2017], Fearnley et al. [2013], Hart and Mansour [2010], Hart and Nisan [2013], Nisan [2009a], Roughgarden and Weinstein [2016]. The main advantage of these results is that they are unconditional, i.e., they do not rely on complexity assumptions such as ETH for PPAD, or P ≠ NP. In particular, in the setting where each player knows her own utility function, even computationally unbounded players have to communicate almost all their private information in order to reach even an approximate equilibrium.
1. In fact, “more than six decades” is an understatement: Irving Fisher’s thesis from 1891 dealt with the closely related question of convergence to market equilibria [Brainard and Scarf 2000].
2. Assuming P ≠ PPAD; see the discussion in the next section about this assumption.
3. Under a complexity assumption stronger than P ≠ PPAD that we call the Exponential Time Hypothesis (ETH) for PPAD; see Section 1.2 for details.
4. This guarantee is actually without loss of generality [Papadimitriou 1994], but this is not so important for our purposes.
5. Here work is measured by the number of oracle calls rather than running time; indeed this model will be the starting point of our reductions for query and communication complexity.
6. There is also a stricter notion that requires x to be close to an x* for which f(x*) = x* exactly. See, e.g., Etessami and Yannakakis [2010].
7. Both the approximate Caratheodory’s theorem and the resulting algorithm have been described by many researchers (e.g., [Arora et al. 2012, Lipton et al. 2003]); however, the presentation in Barman [2015] is our favorite.