Theory of Computing Blog Aggregator

The Paul G. Allen School of Computer Science & Engineering at the University of Washington has multiple tenure-track positions in a wide variety of technical areas in both Computer Science and Computer Engineering. We welcome applicants in all research areas in Computer Science and Computer Engineering. Applications received by December 15, 2017 will be given priority consideration.


by shacharlovett at November 22, 2017 01:51 AM UTC

UBC CS has positions: Postdoc in data understanding/intelligence with Prof. Rachel Pottinger (Applications or questions should be sent via email to, 3 assistant professors in any area ( ) and a tenure-track teaching position ( ). Vancouver is beautiful, the faculty is strong, and Rachel is an excellent mentor. 

by metoo ( at November 21, 2017 08:17 PM UTC

For researchers interested in data science, broadly construed, including those with both a methodological and applications focus, and who are independent and will seek out collaborations with other fellows and Harvard faculty. Fellows provided with the opportunity to pursue their research agenda in an intellectually vibrant environment with ample mentorship.

Normal fellowship duration: 2 years.


by shacharlovett at November 21, 2017 07:12 PM UTC

To give a Hilldale Lecture and learn about fairness and dichotomies

UB CSE50 anniversary source

Jin-Yi Cai was kind enough to help get me, Dick, invited last month to give the Hilldale Lecture in the Physical Sciences for 2017-2018. These lectures are held at The University of Wisconsin-Madison and are supported by the Hilldale Foundation. The lectures started in 1973-1974, which is about the time I started at Yale University—my first faculty appointment.

Today Ken and I wish to talk about my recent visit, discuss new ideas of algorithmic fairness, and then appreciate something about Jin-Yi’s work on “dichotomies” between polynomial time and {\mathsf{\#P}}-completeness.

The Hilldale Lectures have four tracks: Arts & Humanities and Physical, Biological, and Social Sciences. Not all have a speaker each year. The Arts & Humanities speaker was Yves Citton of the University of Paris last April, and in Social Sciences, Peter Bearman of Columbia spoke on Sept. 28, three weeks before I. Last year’s speaker in the Physical Sciences track was Frank Shu of Berkeley and UCSD on the economics of climate change. Before him came my former colleagues Peter Sarnak and Bill Cook. Dick Karp was invited in 2004, and mathematicians Barry Mazur and Persi Diaconis came between him and Cook.

I am delighted and honored to be in all this company. We all may not seem to have much to do with the physical sciences but let’s see. I spoke on ({\cdots}) a subject for another time—let this post be about my hosts.

My Visit

The other highlight of my visit was meeting with the faculty of the CS department and also with the graduate students. I hope they enjoyed our discussions as much as I did.

One topic that came up multiple times is the notion of fair algorithms. This is a relatively new notion and is being studied by several researchers at Wisconsin. The area has its own blog. The authors of that blog (one I know well uses his other blog’s name as his name there—are we to become blogs?) also wrote a paper titled, “On the (Im)Possibility of Fairness,” whose abstract we quote:

What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the “observed” space) and outputs (the “decision” space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction. We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.

There is a notion of fairness in distributed algorithms but this is different. The former is about the allocation of system resources so that all tasks receive due processing attention. The latter has to do with due process in social decision making where algorithmic models have taken the lead. Titles of academic papers cited in a paper by three of the people I met in Madison and someone from Microsoft (see also their latest from OOPSLA 2017) speak why the subject has arisen:

  • “Hiring by algorithm: predicting and preventing disparate impact.”

  • “Predictive policing: The role of crime forecasting in law enforcement operations.”

  • “Three naive Bayes approaches for discrimination-free classification.”

  • “Algorithmic transparency via quantitative input influence.”

  • “Discrimination in online ad delivery.”

Also among the paper’s 29 references are newspaper and magazine articles whose titles state the issues with less academic reserve: “Websites vary prices, deals based on users’ information”; “Who do you blame when an algorithm gets you fired?”; “Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks”; “The dangers of letting algorithms enforce policy.” Evan a May, 2014 statement by the Obama White House is cited.

Yet also among the references are papers familiar in theory: “Satisfiability modulo theories”; “Complexity of polytope volume computation” (by Leonid Khachiyan no less), “On the complexity of computing the volume of a polyhedron”; “Hyperproperties” (by Michael Clarkson and Fred Schneider), “On probabilistic inference by weighted model counting.” What’s going on?

The Formal Method

What’s going on can be classed as a meta-example of the subject’s own purpose:

How does one formalize a bias-combating concept such as fairness without instilling the very kind of bias one is trying to combat?

We all can see the direction of bias in the above references. You might think that framing concepts to apply bias in the other direction might be OK but there’s a difference. Bias in a measuring apparatus is more ingrained than bias in results. What we want to do—as scientists—is to formulate criteria that are framed in terms apart from those of the applications in a simple, neutral, and natural manner. Then we hope the resulting formal definition distinguishes the outcomes we desire from those we do not and stays robust and consistent in its applications.

This is the debate—at ‘meta’ level—that Ken and I see underlying the two papers we’ve highlighted above. We blogged about Kenneth Arrow’s discovery of the impossibility of formalizing a desirable “fairness” notion for voting systems. The blog guys don’t find such a stark impossibility theorem but they say that to avoid issues with analyzing inputs and outcomes, one has to attend also to some kind of “hidden variables.” The paper by Madison people tries to ground a framework in formal methods for program verification, which is connects to probabilistic inference via polytope volume computations.

Many other ingredients from theory can be involved. The basic idea is determining sensitivity of outcomes to various facets of the inputs. The inputs are weighted for relevance to an objective. Fairness is judged according to how well sensitivity corresponds to relevance and also to how the distribution of subjects receiving favorable decisions breaks according to low-weight factors such as gender. Exceptions may be made by encoding some relations as indelible constraints—the Madison plus Microsoft paper gives as an example that a Catholic priest must be male.

Thus we see Boolean sensitivity measures, ideas of juntas and dictators, constraint satisfaction problems, optimization over polytopes—lots of things I’ve known and sometimes studied in less-particular contexts. My Madison hosts brought up how Gaussian distributions are robust for this analysis because they have several invariance properties including under rotations of rectangles. This recent Georgetown thesis mixes in even more theory ideas. The meta-question is:

Can all these formal ingredients combine to yield the desired outcomes in ways whose scientific simplicity and naturalness promote confidence in them?

Thus wading in with theory to a vast social area like this strikes us as a trial of “The Formal Method.” Well, there is the Hilldale Social Sciences track…

Did “hidden variables” bring quantum to your mind? We are going there next, with Ken writing now.

A Magic Madison Upper Bound

We covered Jin-Yi’s work in 2014 and 2012 and 2009. So you could say in 2017 we’re “due” but we’ll take time for more-topical remarks. All of these were on dichotomy theorems. For a wide class {\mathcal{C}} of counting problems in {\mathsf{\#P}} his dichotomy is that every problem in {\mathcal{C}} either belongs to polynomial time or is {\mathsf{\#P}}-complete. There are no in-between cases—that’s the meaning of dichotomy.

Jin-Yi’s answer to a question last month between Dick and me recently brought home to us both how wonderfully penetrating and hair-trigger this work is. Dick had added some contributions to the paper covered in the 2009 post and was included on the paper’s final 2010 workshop version. Among its results is the beautiful theorem highlighted at the end of that post:

Theorem 1 There is an algorithm that given any {K > 0} and formula for a quadratic polynomial {f(x_1,\dots,x_n)} over {\mathbb{Z}_K} computes the exponential sum

\displaystyle  Z(f) = \sum_{x_1,\dots,x_n \in \mathbb{Z}_K} \omega^{f(x_1,\dots,x_n)} = \sum_{j=0}^{K-1} F_j \omega^j

exactly in time that is polynomial in both {n} and {\log K}. Here {\omega = e^{2\pi i/K}} and {F_j} means the number of arguments on which {f} takes the value {j} modulo {K}.

That the time is polynomial in {\log K} not {K} is magic. We can further compute the individual weights {F_j} by considering also the resulting expressions for {Z(2f),Z(3f),\dots,Z((K-1)f)}. Together with {\sum_j F_j = K^n} they give us {K} equations in the {K} unknowns {F_j} in the form of a Vandermonde system, which is always solvable. Solving that takes time polynomial in {K}, though, and we know no faster way of computing any given {F_j}.

When {K} is fixed, however, polynomial in {n} is all we need to say. So the upshot is that for any fixed modulus, solution counting for polynomials of degree {d = 2} is in {\mathsf{P}}. Andrzej Ehrenfeucht and Marek Karpinski proved this modulo primes and also that the solution-counting problem for degree {d = 3} is {\mathsf{\#P}}-complete even for {K = 2}. So the flip from {\mathsf{P}} to {\mathsf{\#P}}-complete when {d} steps from {2} to {3} is an older instance of dichotomy. The newer one, however, is for polynomials of the same degree {d = 2}.

Dichotomy and Quantum

I wrote a post five years ago on his joint work with Amlan Chakrabarti for reducing the simulation of quantum circuits to counting solutions in {\{0,1\}^n}. One motive is which subclasses of quantum circuits might yield tractable cases of counting. The classic—pun intended—case is the theorem that all circuits of so-called Clifford gates can be simulated in classical polynomial time (not even randomized). I observed that such circuits yield polynomials {f} over {\mathbb{Z}_4} that are sums of terms of the form

\displaystyle  x^2 \qquad\text{or}\qquad 2xy.

These terms are invariant on replacing {x} by {x+2} or {y} by {y+2} modulo {4}. Hence for such {f} there is an exactly {2^n}-to-{1} correspondence between solutions in {\{0,1\}^n} and those in {\mathbb{Z}_4^n}. Since counting the latter is in {\mathsf{P}} by Theorem 1, the theorem for Clifford gates follows.

Adding any non-Clifford gate makes a set that is universal—i.e., has the full power of {\mathsf{BQP}}. The gates I thought of at the time of my post all bumped the degree up from {2} to {3} or more. A related but different representation by David Bacon, Wim van Dam, and Alexander Russell gives a dichotomy of linear versus higher degree. The controlled-phase gate, however, is non-Clifford but in my scheme produces polynomials {f'} as sums of terms of the form

\displaystyle  x^2 \qquad\text{or}\qquad xy.

Those are quadratic too, so Theorem 1 counts all the solutions in polynomial time. Does this make {\mathsf{BQP = P}}? The hitch is that quantum needs counting binary solutions—and having {xy} not {2xy} defeats the above exact correspondence.

I thought maybe the counting problem for quadratic-and-binary could be intermediate—perhaps at the level of {\mathsf{BQP}} itself. But Jin-Yi came right back with the answer that his dichotomy cuts right there: this 2014 paper with his students Pinyan Lu and Mingji Xia has a general framework for CSPs that drops down to say the problem is {\mathsf{\#P}}-complete. A more-recent paper of his with Heng Guo and Tyson Williams lays out the connection to Clifford gates specifically, proving an equivalence to the condition called “affine” in his framework which renders counting tractable. Thus the state of play is:

  1. Counting all solutions of quadratic polynomials over {\mathbb{Z}_4} is easy.

  2. Counting the binary solutions however is hard: {\mathsf{\#P}}-complete.

  3. Counting the binary solutions when the coefficient of all terms of the form {xy} is {2} is easy again.

Thus the difference between easy and general quantum circuits hangs on that ‘{2}‘—a coefficient, not an exponent—as does factoring (not?) being in {\mathsf{P}}. Of course this doesn’t mean quantum circuits are {\mathsf{\#P}}-complete—they are generally believed not to be even {\mathsf{NP}}-hard—but that saying {f'} has terms of the form {x^2} and {xy} captures ostensibly more than {\mathsf{BQP}}.

Might there be some other easily-identified structural properties of the {f'} produced (say) by circuits of Hadamard and controlled-phase gates that make the counting problem intermediate between {\mathsf{P}} and {\mathsf{\#P}}, if not exactly capturing {\mathsf{BQP}}? Well, the dichotomy grows finer and richer and stronger with each new paper by Jin-Yi and his group. This feeds in to Jin-Yi’s most subtle argument for believing {\mathsf{\#P \neq P}} as we said it here, echoing reasons expounded by Scott Aaronson and others for {\mathsf{NP \neq P}}: if the classes were equal there would be no evident basis for our experiencing such fine and sharp distinctions.

Open Problems

Trying to make up for blog pause while we were busy with a certain seasonal event, we offer several open problems:

  • Is there a ‘fair’ way to define algorithmic fairness?

  • What are the most important structural and numerical properties from theory to use in such a definition and key results about it?

  • Can solution counting modulo {K} be solved in time polynomial in {n} and {\log K}? We may suppose the polynomial {f} has zero constant term, so that we get questions both for {j} fixed and {j} given with the instance.

  • What structural properties of the polynomials {f} arising from quantum circuits might capture {\mathsf{BQP}} or at least evade the dichotomy?

  • The exponential sum in Theorem 1 has long-known physical meaning as a partition function. It is hence reasonable to expect that algebraic-geometric properties of the polynomial {f} involved—and of polynomial ideals formed by its derivatives—govern physical properties. The final point of Ken’s 2012 post was to note that the property of geometric degree underlies the super-linear arithmetic complexity lower bounds of Walter Baur and Volker Strassen, which we expounded here. When {f} comes from a quantum circuit, what might this property say about its computations?

[“stabilizer theorem”->”theorem for Clifford gates”; added links to OOPSLA 2017 paper and May 2017 Cai-Guo-Williams paper]

by RJLipton+KWRegan at November 21, 2017 04:59 AM UTC

I just received an advanced copy of my very first book: “Gina Says: Adventures in the Blogsphere String War” published by Word Scientific. It is a much changed version compared to the Internet version of 8 years ago and it contains beautiful drawings by my daughter Neta Kalai. How exciting!

Here is the World Scientific page, and here is its Amazon page (paperback)  and Amazon page (Kindle).


by Gil Kalai at November 20, 2017 05:14 PM UTC

By now as you've read from Luca or Scott or PhD Comics or a variety of other sources on the dangerous changes to the tax code that passed the US House of Representatives last week. Among a number of university unfriendly policies, the tax code will eliminate the tax exemption for graduate student tuition for students supported with teaching or research duties, nearly every PhD student in STEM fields. The CRA, ACM, IEEE, AAAI, SIAM and Usenix put out a joint statement opposing this tax increase on graduate students. This is real.

Without other changes, a tax on tuition will make grad school unaffordable to most doctoral students. In computer science where potential PhD students can typically get lucrative jobs in industry, we'll certainly see a precipitous drop in those who choose to continue their studies. Universities will have to adjust by lower tuition, if finances and state law allows, and raising stipends. US government science funding will at best remain flat so in almost any scenario we'll see far fewer students pursue PhD degrees particularly in CS and STEM fields. Keep in mind we already don't come close to producing enough CS PhD students entering academia to meet the dramatically growing demand and these moves could frustrate faculty who also might head off to industry.

The current senate proposal leaves the exemption in place though no one can predict what will happen the the two bills get reconciled. In the best case scenario this bill goes the same way as the failed health care reform but republicans seem desperate to pass something major this fall. So reach out to your representatives, especially your senators, and express the need to leave in the exemption.

by Lance Fortnow ( at November 20, 2017 01:15 PM UTC

An projective configuration is a collection of points and lines in the plane, so that each two points belong to the same number of lines and each two lines contain the same number of points. When that same number is three for both points and lines, then the numbers of points and lines must be equal; call this number and call the configuration an -configuration. Here’s an example, with :


What makes this particular configuration interesting is a property it shares with the Fano configuration (which I can’t draw with straight lines in the plane) but not with some other configurations that I can draw, such as the Pappus configuration or Desargues configuration. That property is that, if I give the points two colors, as I did in the illustration, then at least one of the lines will be monochromatic. For instance, there’s an all-red line in the illustration. Let’s call a configuration with this property “uncolorable”, for short.

Uncolorability of the Fano configuration follows from the fact that every two points belong to a three-point line. If its seven points are colored with four red points and three yellow points, then each of the six pairs of red points must form a separate line with one yellow point, or else we would have a red line. But if they do, there is only one line left out of the seven that can cover all three pairs of yellow points, so we have a yellow line instead. Similar reasoning shows that five-two and six-one splits also don’t work.

If you have two uncolorable - and -configurations, you can glue them into a single uncolorable -configuration, as follows: delete one point from the -configuration and one line from the -configuration, leaving three lines and three points with too few incidences. Then, move the points and lines around (preserving their other incidences) so that the three neighboring lines from the configuration each contain one of the three neighboring points from the configuration. It may not be clear how to perform this geometrically in all cases, but at least the abstract pattern of incidences that one should obtain as a result is clear.

If we could color the resulting glued configuration, avoiding monochromatic lines, with the three gluing points not all given the same color, then the same coloring could be used on the configuration, and the deleted line would also be non-monochromatic. On the other hand, if we could color the glued configuration, avoiding monochromatic lines and giving the three gluing points the same color as each other, then the same coloring could be used on the configuration. In this case, giving the same color as the gluing points to the deleted point preserves the non-monochromatic coloring of the lines. So if the - and -configurations are both uncolorable without monochromatic lines, then the glued -configuration is also uncolorable.

In 1894, Steinitz showed in his dissertation that any abstract -configuration can be drawn in the plane with at most one line non-straight. Using this fact, for any abstract glued configuration, one can realize the configuration making the non-straight line be the deleted one. By projective duality, one can realize the configuration in such a way that all its triples of lines meet at the points of the configuration except for at one point, the deleted point. Using a projective transformation to match up the three points of one configuration with the three lines of the other shows that every glued configuration can be realized by straight lines in the plane.

The configuration shown in the illustration is what you get when you glue together two copies of the Fano configuration. (In moving it around to make it more legible, I lost track of which points and lines came from which side of the gluing, and I don’t know whether that can be reconstructed from the drawing or whether there are multiple valid partitions. The partition is certainly not given by the colors of the points.) Since the Fano configuration is uncolorable, so is this -configuration, and so are the infinitely many configurations obtained by gluing together more than two Fanos.


by David Eppstein at November 19, 2017 05:31 PM UTC

If and when you emerged from your happiness bubble to read the news, you’ll have seen (at least if you live in the US) that the cruel and reckless tax bill has passed the House of Representatives, and remains only to be reconciled with an equally-vicious Senate bill and then voted on by the Republican-controlled Senate.  The bill will add about $1.7 trillion to the national debt and raise taxes for about 47.5 million people, all in order to deliver a massive windfall to corporations, and to wealthy estates that already pay some of the lowest taxes in the developed world.

In a still-functioning democracy, those of us against such a policy would have an intellectual obligation to seek out the strongest arguments in favor of the policy and try to refute them.  By now, though, it seems to me that the Republicans hold the public in such contempt, and are so sure of the power of gerrymandering and voter restrictions to protect themselves from consequences, that they didn’t even bother to bring anything to the debate more substantive than the schoolyard bully’s “stop punching yourself.”  I guess some of them still repeat the fairytale about the purpose of tax cuts for the super-rich being to trickle down and help everyone else—but can even they advance that “theory” anymore without stifling giggles?  Mostly, as far as I can tell, they just brazenly deny that they’re doing what they obviously are doing: i.e., gleefully setting on fire anything that anyone, regardless of their ideology, could recognize as the national interest, in order to enrich a small core of supporters.

But none of that is what interests me in this post—because it’s “merely” as bad as, and no worse than, what one knew to expect when a coalition of thugs, kleptocrats, and white-nationalist demagogues seized control of Hamilton’s and Jefferson’s experiment.  My concern here is only with the “kill shot” that the Republicans have now aimed, with terrifying precision, at the system that’s kept American academic science the envy of the world in spite of the growing dysfunction all around it.

As you’ve probably heard, one of the ways Republicans intend to pay for their tax giveaway, is to change the tax code so that graduate students will now need to pay taxes on “tuition”—a large sum of money (as much as $50,000/year) that PhD students never actually see, that can easily exceed the stipends they do see, and that’s basically just an accounting trick that serves the internal needs of universities and granting agencies.  Again, to eliminate any chance of misunderstanding: PhD students, who are effectively low-wage employees, already pay taxes on their actual stipends.  The new proposal is that they’ll also have to pay taxes on a whopping, make-believe “X” on their payroll sheet that’s always exactly balanced out by “-X.”

For detailed analyses of the impacts, see, e.g. Luca Trevisan’s post or Inside Higher Ed or the Chronicle of Higher Ed or Vox or NPR.  Briefly, though, the proposal would raise taxes by a few thousand dollars per year, or in some cases as much as $10,000 per year (!), on PhD students who already live hand-to-mouth-to-ramen-bowl, with the largest impact falling on students in STEM fields.  For many students who aren’t independently wealthy, this could push a PhD beyond the realm of affordability, and cause them to leave academia or to do their graduate work in other countries.

“But isn’t there some workaround?”  Indeed, financial ignoramus that I am, my first reaction was to ask: if PhD tuition is basically an accounting fiction anyway, then why can’t the universities just declare that the tuition in question no longer exists, or is now zero dollars?  Feel free to explain further in the comments if you understand this stuff, but as far as I can tell, the answer is: because PhD tuition is used to calculate how much “tax” the universities can take from professors’ grant money.  If universities could no longer take that tax, and they had no other way to make up for it, then except for the richest few universities, they’d have to scale back research and teaching pretty drastically.  To avoid that outcome, the universities would be relying on the granting agencies to let them keep taking the overhead they needed to operate, even though the “PhD tuition” no longer existed.  But the granting agencies aren’t set up for this: you just can’t throw a bomb into one part of a complicated bureaucratic machine built up over decades, and expect the machine to continue working with no disruption to science.

But more ominously: as my friend Daniel Harlow and many others pointed out, it’s hard to look at the indefensible, laser-specific meanness of this policy, without suspecting that for many in Congress, the destruction of American higher education isn’t a regrettable byproduct, but the goal—just another piece of red meat to throw to the base.  If so, then we’d expect Congress to direct federal granting agencies not to loosen their rules about overhead, thereby forcing the students to pay the tax, and achieving the desired destruction.  (Note that the Trump administration has already made tightening overhead rules—i.e., doing the exact opposite of what would be needed to counteract the new tax—a central focus of its attempt to cut federal research funding.)

OK, two concluding thoughts:

  1. When Republicans in Congress defended Trump’s travel ban, they at least had the craven excuse that they were only following the lead of the populist strongman who’d taken over their party.  Here they don’t even have that.  As far as I know, this targeted destruction of American higher education was Congress’s initiative, not Trump’s—which to me, underscores again the feather-thinness of any moral distinction between the Vichy GOP leadership and the administration with which it collaborates.  Trump didn’t emerge from nowhere.  It took decades of effort—George W. Bush, Sarah Palin, Karl Rove, Rush Limbaugh, Mitch McConnell, and all the rest—to transform the GOP into the pure seething cauldron of anti-intellectual resentment and hatred that we know today.
  2. Given the existential risk to American higher education, why didn’t I blog about this earlier?  The answer is embarrassing to admit, and reflects no credit on me.  It’s simply that I didn’t believe it—even given all the other stuff that could “never happen in the US,” until it happened this past year.  I didn’t believe it, not because it was too far from me but because it was too close—because if true, it would mean the crippling of the research world in which I’ve spent most of my life since age 15, so therefore it couldn’t be true.  Surely even the House Republicans would realize they’d screwed up this time, and would take out this crazy provision before the full bill was voted on?  Or surely there’s some workaround that makes the whole thing less awful than it sounds?  There has to be … right?

Anyway, what else is there to say, except to call your representative, if you’re American and still have the faith in the system that such an act implies.

by Scott at November 17, 2017 11:44 PM UTC

Many believe that BPP $=$ P "should" hold for Turing machines. We even have some "witnesses" for this: otherwise some "strange" things would happen; see e.g. this paper by Implagliazzo and Wigderson. But the proof of BPP $=$ P remains still not in sight. The main difficulty seems here to lie in the uniformity: we have one randomized algorithm working for inputs of every dimension (length) $n$, and we want to also find one deterministic algorithm working for inputs of every dimension $n$.

It is long known, at least since Adleman's theorem, that BPP $\subseteq$ P/poly holds in the non-uniform setting (even for algorithms computing functions over infinite domains, like arithmetic circuits working over all real numbers). The disadvantage of these results is, however, that after the derandomizing, we get a sequence of deterministic algorithms, that can be different for inputs from different dimensions $n$.

But if we are unable to show BPP $=$ P in the model of Turing machines, can we at least show this in some restricted uniform, but still "practically relevant" models of computation? One of such models is that of dynamic programming algorithms. At a high (extremely high) level, the model of DP algorithms can be described as follows. We parameterize a given problem $P$ by one (or more) parameters $n$ ("dimension" of inputs), consider the resulting "subproblems" $P_1,P_2,\ldots$ and give one (or more) recurrence equations

$P_n(x)$ = minimum or maximum of some arithmetic combination of the subproblems $P_m(x)$ for $m < n $.

In a randomized DP algorithm, we allow some additional random variables be used as parameters in the recursion equations. Even if very vaguely defined, this model is uniform: we have one DP recurrence for all dimension $n$. But the model is "clearly" much weaker than the (universal) model of Turing machines: no loops, no "crazy" operations - just min or max combined with arithmetic operations.

Question: Is BPP = P known for (at least restricted) DP algorithms?

What, say, about DP algorithms that only use $\min,+$ or $\max,+$ operations in their recursion equations? Actually, I am not aware of any proof of BPP = P in any "non-pathological", at least "somewhat interesting" but uniform model of computation (whatever these "non-pathological" and "somewhat interesting" should mean).

N.B. I have no "stomach feeling" concerning the issue of uniformity in computation. So, any hints/references even to "well-known" results, are welcome.

by Stasys at November 17, 2017 09:14 PM UTC

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lectures 12-13, Scribe: Giorgos Zirdelis

In these lectures we study the communication complexity of some problems on groups. We give the definition of a protocol when two parties are involved and generalize later to more parties.

Definition 1. A 2-party c-bit deterministic communication protocol is a depth-c binary tree such that:

  • the leaves are the output of the protocol
  • each internal node is labeled with a party and a function from that party’s input space to \{0,1\}

Computation is done by following a path on edges, corresponding to outputs of functions at the nodes.

A public-coin randomized protocol is a distribution on deterministic protocols.

2 2-party communication protocols

We start with a simple protocol for the following problem.

Let G be a group. Alice gets x \in G and Bob gets y \in G and their goal is to check if x \cdot y = 1_G, or equivalently if x = y^{-1}.

There is a simple deterministic protocol in which Alice simply sends her input to Bob who checks if x \cdot y = 1_G. This requires O(\log |G|) communication complexity.

We give a randomized protocol that does better in terms on communication complexity. Alice picks a random hash function h: G \rightarrow \{0,1\}^{\ell }. We can think that both Alice and Bob share some common randomness and thus they can agree on a common hash function to use in the protocol. Next, Alice sends h(x) to Bob, who then checks if h(x)=h(y^{-1}).

For \ell = O(1) we get constant error and constant communication.

3 3-party communication protocols

There are two ways to extend 2-party communication protocols to more parties. We first focus on the Number-in-hand (NIH), where Alice gets x, Bob gets y, Charlie gets z, and they want to check if x \cdot y \cdot z = 1_G. In the NIH setting the communication depends on the group G.

3.1 A randomized protocol for the hypercube

Let G=\left ( \{0,1\}^n, + \right ) with addition modulo 2. We want to test if x+y+z=0^n. First, we pick a linear hash function h, i.e. satisfying h(x+y) = h(x) + h(y). For a uniformly random a \in \{0,1\}^n set h_a(x) = \sum a_i x_i \pmod 2. Then,

  • Alice sends h_a(x)
  • Bob send h_a(y)
  • Charlie accepts if and only if \underbrace {h_a(x) + h_a(y)}_{h_a(x+y)} = h_a(z)

The hash function outputs 1 bit. The error probability is 1/2 and the communication is O(1). For a better error, we can repeat.

3.2 A randomized protocol for \mathbb {Z}_m

Let G=\left (\mathbb {Z}_m, + \right ) where m=2^n. Again, we want to test if x+y+z=0 \pmod m. For this group, there is no 100% linear hash function but there are almost linear hash function families h: \mathbb {Z}_m \rightarrow \mathbb {Z}_{\ell } that satisfy the following properties:

  1. \forall a,x,y we have h_a(x) + h_a(y) = h_a(x+y) \pm 1
  2. \forall x \neq 0 we have \Pr _{a} [h_a(x) \in \{\pm 2, \pm 1, 0\}] \leq 2^{-\Omega (\ell )}
  3. h_a(0)=0

Assuming some random hash function h (from a family) that satisfies the above properties the protocol works similar to the previous one.

  • Alice sends h_a(x)
  • Bob sends h_a(y)
  • Charlie accepts if and only if h_a(x) + h_a(y) + h_a(z) \in \{\pm 2, \pm 1, 0\}

We can set \ell = O(1) to achieve constant communication and constant error.


To prove correctness of the protocol, first note that h_a(x) + h_a(y) + h_a(z) = h_a(x+y+z) \pm 2, then consider the following two cases:

  • if x+y+z=0 then h_a(x+y+z) \pm 2 = h_a(0) \pm 2 = 0 \pm 2
  • if x+y+z \neq 0 then \Pr _{a} [h_a(x+y+z) \in \{\pm 2, \pm 1, 0\}] \leq 2^{-\Omega (\ell )}

It now remains to show that such hash function families exist.

Let a be a random odd number modulo 2^n. Define

\begin{aligned} h_a(x) := (a \cdot x \gg n-\ell ) \pmod {2^{\ell }} \end{aligned}

where the product a \cdot x is integer multiplication. In other words we output the bits n-\ell +1, n-\ell +2, \ldots , n of the integer product a\cdot x.

We now verify that the above hash function family satisfies the three properties we required above.

Property (3) is trivially satisfied.

For property (1) we have the following. Let s = a\cdot x and t = a \cdot y and u=n-\ell . The bottom line is how (s \gg u) + (t \gg u) compares with (s+t) \gg u. In more detail we have that,

  • h_a(x+y) = ((s+t) \gg u) \pmod {2^{\ell }}
  • h_a(x) = (s \gg u) \pmod {2^{\ell }}
  • h_a(x) = (t \gg u) \pmod {2^{\ell }}

Notice, that if in the addition s+t the carry into the u+1 bit is 0, then

\begin{aligned} (s \gg u) + (t \gg u) = (s+t) \gg u \end{aligned}


\begin{aligned} (s \gg u) + (t \gg u) + 1 = (s+t) \gg u \end{aligned}

which concludes the proof for property (1).

Finally, we prove property (2). We start by writing x=s \cdot 2^c where s is odd. Bitwise, this looks like (\cdots \cdots 1 \underbrace {0 \cdots 0}_{c~ \textrm {bits}}).

The product a \cdot x for a uniformly random a, bitwise looks like ( \textit {uniform} ~ 1 \underbrace {0 \cdots 0}_{c~\textrm {bits}}). We consider the two following cases for the product a \cdot x:

  1. If a \cdot x = (\underbrace {\textit {uniform} ~ 1 \overbrace {00}^{2~bits}}_{\ell ~bits} \cdots 0), or equivalently c \geq n-\ell + 2, the output never lands in the bad set \{\pm 2, \pm 1, 0\} (some thought should be given to the representation of negative numbers – we ignore that for simplicity).
  2. Otherwise, the hash function output has \ell - O(1) uniform bits. Again for simplicity, let B = \{0,1,2\}. Thus,
    \begin{aligned} \Pr [\textrm {output} \in B] \leq |B| \cdot 2^{-\ell + O(1)} \end{aligned}

    In other words, the probability of landing in any small set is small.

4 Other groups

What happens in other groups? Do we have an almost linear hash function for 2 \times 2 matrices? The answer is negative. For SL_2(q) and A_n the problem of testing equality with 1_G is hard.

We would like to rule out randomized protocols, but it is hard to reason about them directly. Instead, we are going to rule out deterministic protocols on random inputs. For concreteness our main focus will be SL_2(q).

First, for any group element g \in G we define the distribution on triples, D_g := (x,y, (x \cdot y)^{-1} g), where x,y \in G are uniformly random elements. Note the product of the elements in D_g is always g.

Towards a contradiction, suppose we have a randomized protocol P for the xyz=^? 1_G problem. In particular, we have

\begin{aligned} \Pr [P(D_1)=1] \geq \Pr [P(D_h)=1] + \frac {1}{10}. \end{aligned}

This implies a deterministic protocol with the same gap, by fixing the randomness.

We reach a contradiction by showing that for every deterministic protocols P using little communication (will quantify later), we have

\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq \frac {1}{100}. \end{aligned}

We start with the following lemma, which describes a protocol using product sets.

Lemma 1. (The set of accepted inputs of) A deterministic c-bit protocol can be written as a disjoint union of 2^c “rectangles,” that is sets of the form A \times B \times C.

Proof. (sketch) For every communication transcript t, let S_t \subseteq G^3 be the set of inputs giving transcript t. The sets S_t are disjoint since an input gives only one transcript, and their number is 2^c, i.e. one for each communication transcript of the protocol. The rectangle property can be proven by induction on the protocol tree. \square

Next, we show that these product sets cannot distinguish these two distributions D_1,D_h, and for that we will use the pseudorandom properties of the group G.

Lemma 2. For all A,B,C \subseteq G and we have

\begin{aligned} |\Pr [A \times B \times C(D_1)=1] - \Pr [A \times B \times C(D_h)=1]| \leq \frac {1}{d^{\Omega (1)}} .\end{aligned}

Recall the parameter d from the previous lectures and that when the group G is SL_2(q) then d=|G|^{\Omega (1)}.

Proof. Pick any h \in G and let x,y,z be the inputs of Alice, Bob, and Charlie respectively. Then

\begin{aligned} \Pr [A \times B \times C(D_h)=1] = \Pr [ (x,y) \in A \times B ] \cdot \Pr [(x \cdot y)^{-1} \cdot h \in C | (x,y) \in A \times B] \end{aligned}

If either A or B is small, that is \Pr [x \in A] \leq \epsilon or \Pr [y \in B] \leq \epsilon , then also \Pr [P(D_h)=1] \leq \epsilon because the term \Pr [ (x,y) \in A \times B ] will be small. We will choose \epsilon later.

Otherwise, A and B are large, which implies that x and y are uniform over at least \epsilon |G| elements. Recall from Lecture 9 that this implies \lVert x \cdot y - U \rVert _2 \leq \lVert x \rVert _2 \cdot \lVert y \rVert _2 \cdot \sqrt {\frac {|G|}{d}}, where U is the uniform distribution.

By Cauchy–Schwarz we obtain,

\begin{aligned} \lVert x \cdot y - U \rVert _1 \leq |G| \cdot \lVert x \rVert _2 \cdot \lVert y \rVert _2 \cdot \sqrt {\frac {1}{d}} \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}}. \end{aligned}

The last inequality follows from the fact that \lVert x \rVert _2, \lVert y \rVert _2 \leq \sqrt {\frac {1}{\epsilon |G|}}.

This implies that \lVert (x \cdot y)^{-1} - U \rVert _1 \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}} and \lVert (x \cdot y)^{-1} \cdot h - U \rVert _1 \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}}, because taking inverses and multiplying by h does not change anything. These two last inequalities imply that,

\begin{aligned} \Pr [(x \cdot y)^{-1} \in C | (x,y) \in A \times B] = \Pr [(x \cdot y)^{-1} \cdot h \in C | (x,y) \in A \times B] \pm \frac {2}{\epsilon } \frac {1}{\sqrt {d}} \end{aligned}

and thus we get that,

\begin{aligned} \Pr [P(D_1)=1] = \Pr [P(D_h)=1] \pm \frac {2}{\epsilon } \frac {1}{\sqrt {d}}. \end{aligned}

To conclude, based on all the above we have that for all \epsilon and independent of the choice of h, it is either the case that

\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq 2 \epsilon \end{aligned}


\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq \frac {2}{\epsilon } \frac {1}{\sqrt {d}} \end{aligned}

and we will now choose the \epsilon to balance these two cases and finish the proof:

\begin{aligned} \frac {2}{\epsilon } \frac {1}{\sqrt {d}} = 2 \epsilon \Leftrightarrow \frac {1}{\sqrt {d}} = \epsilon ^2 \Leftrightarrow \epsilon = \frac {1}{d^{1/4}}. \end{aligned}


The above proves that the distribution D_h behaves like the uniform distribution for product sets, for all h \in G.

Returning to arbitrary deterministic protocols P, write P as a union of 2^c disjoint rectangles by the first lemma. Applying the second lemma and summing over all rectangles we get that the distinguishing advantage of P is at most 2^c/d^{1/4}. For c \leq (1/100) \log d the advantage is at most 1/100 and thus we get a contradiction on the existence of such a correct protocol. We have concluded the proof of this theorem.

Theorem 3. Let G be a group, and d be the minimum dimension of an irreducible representation of G. Consider the 3-party, number-in-hand communication protocol f : G^3 \to \{0,1\} where f(x,y,z) = 1 \Leftrightarrow x \cdot y \cdot z = 1_G. Its randomized communication complexity is \Omega (\log d).

For SL_2(q) the communication is \Omega (\log |G|). This is tight up to constants, because Alice can send her entire group element.

For the group A_n the known bounds on d yield communication \Omega ( \log \log |G|). This bound is tight for the problem of distinguishing D_1 from D_h for h\neq 1, as we show next. The identity element 1_G for the group A_n is the identity permutation. If h \neq 1_G then h is a permutation that maps some element a \in G to h(a)=b \neq a. The idea is that the parties just need to “follow” a, which is logarithmically smaller than G. Specifically, let x,y,z be the permutations that Alice, Bob and Charlie get. Alice sends x(a) \in [n]. Bob gets x(a) and sends y(x(a)) \in [n] to Charlie who checks if z(y(x(a))) = 1. The communication is O(\log n). Because the size of the group is |G|=\Theta (n!) = \Theta \left ( \left ( \frac {n}{e} \right )^n \right ), the communication is O(\log \log |G|).

This is also a proof that d cannot be too large for A_n, i.e. is at most (\log |G|)^{O(1)}.

5 More on 2-party protocols

We move to another setting where a clean answer can be given. Here we only have two parties. Alice gets x_1,x_2,\ldots ,x_n, Bob gets y_1,y_2,\ldots ,y_n, and they want to know if x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n = 1_G.

When G is abelian, the elements can be reordered as to check whether (x_1 \cdot x_2 \cdots x_n) \cdot (y_1 \cdot y_2 \cdots y_n) = 1_G. This requires constant communication (using randomness) as we saw in Lecture 12, since it is equivalent to the check x \cdot y = 1_G where x=x_1 \cdot x_2 \cdots x_n and y=y_1 \cdot y_2 \cdots y_n.

We will prove the next theorem for non-abelian groups.

Theorem 1. For every non-abelian group G the communication of deciding if x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n = 1_G is \Omega (n).

Proof. We reduce from unique disjointness, defined below. For the reduction we will need to encode the And of two bits x,y \in \{0,1\} as a group product. (This question is similar to a puzzle that asks how to hang a picture on the wall with two nails, such that if either one of the nails is removed, the picture will fall. This is like computing the And function on two bits, where both bits (nails) have to be 1 in order for the function to be 1.) Since G is non-abelian, there exist a,b \in G such that a \cdot b \neq b\cdot a, and in particular a \cdot b \cdot a^{-1} \cdot b^{-1} = h with h \neq 1. We can use this fact to encode And as

\begin{aligned} a^x \cdot b^y \cdot a^{-x} \cdot b^{-y}= \begin {cases} 1,~~\text {if And(x,y)=0}\\ h,~~\text {otherwise} \end {cases}. \end{aligned}

In the disjointness problem Alice and Bob get inputs x,y \in \{0,1\}^n respectively, and they wish to check if there exists an i \in [n] such that x_i \land y_i =1. If you think of them as characteristic vectors of sets, this problem is asking if the sets have a common element or not. The communication of this problem is \Omega (n). Moreover, in the variant of this problem where the number of such i’s is 0 or 1 (i.e. unique), the same lower bound \Omega (n) still applies. This is like giving Alice and Bob two sets that either are disjoint or intersect in exactly one element, and they need to distinguish these two cases.

Next, we will reduce the above variant of the set disjointness to group products. For x,y \in \{0,1\}^n we product inputs for the group problem as follows:

\begin{aligned} x & \rightarrow (a^{x_1} , a^{-x_1} , \ldots , a^{x_n}, a^{-x_n} ) \\ y & \rightarrow (b^{y_1} , b^{-y_1}, \ldots , b^{y_n}, b^{-y_n}). \end{aligned}

Now, the product x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n we originally wanted to compute becomes

\begin{aligned} \underbrace {a^{x_1} \cdot b^{y_1} \cdot a^{-x_1} \cdot b^{-y_1}}_{\text {1 bit}} \cdots \cdots a^{x_n} \cdot b^{y_n} \cdot a^{-x_n} \cdot b^{-y_n}. \end{aligned}

If there isn’t an i \in [n] such that x_i \land y_i=1, then each product term a^{x_i} \cdot b^{y_i} \cdot a^{-x_i} \cdot b^{-y_i} is 1 for all i, and thus the whole product is 1.

Otherwise, there exists a unique i such that x_i \land y_i=1 and thus the product will be 1 \cdots 1 \cdot h \cdot 1 \cdots 1=h, with h being in the i-th position. If Alice and Bob can test if the above product is equal to 1, they can also solve the unique set disjointness problem, and thus the lower bound applies for the former. \square

We required the uniqueness property, because otherwise we might get a product h^c that could be equal to 1 in some groups.

by Emanuele at November 17, 2017 03:02 PM UTC

Inadequate Equilibria: Where and How Civilizations Get Stuck is a little gem of a book: wise, funny, and best of all useful (and just made available for free on the web).  Eliezer Yudkowsky and I haven’t always agreed about everything, but on the subject of bureaucracies and how they fail, his insights are gold.  This book is one of the finest things he’s written.  It helped me reflect on my own choices in life, and it will help you reflect on yours.

The book is a 120-page meditation on a question that’s obsessed me as much as it’s obsessed Yudkowsky.  Namely: when, if ever, is it rationally justifiable to act as if you know better than our civilization’s “leading experts”?  And if you go that route, then how do you answer the voices—not least, the voices in your own head—that call you arrogant, hubristic, even a potential crackpot?

Yudkowsky gives a nuanced answer.  To summarize, he argues that contrarianism usually won’t work if your goal is to outcompete many other actors in a free market for a scarce resource that they all want too, like money or status or fame.  In those situations, you really should ask yourself why, if your idea is so wonderful, it’s not already being implemented.  On the other hand, contrarianism can make sense when the “authoritative institutions” of a given field have screwed-up incentives that prevent them from adopting sensible policies—when even many of the actual experts might know that you’re right, but something prevents them from acting on their knowledge.  So for example, if a random blogger offers a detailed argument for why the Bank of Japan is pursuing an insane fiscal policy, it’s a-priori plausible that the random blogger could be right and the Bank of Japan could be wrong (as actually happened in a case Yudkowsky recounts), since even insiders who knew the blogger was right would find it difficult to act on their knowledge.  The same wouldn’t be true if the random blogger said that IBM stock was mispriced or that P≠NP is easy to prove.

The high point of the book is a 50-page dialogue between two humans and an extraterrestrial visitor.  The extraterrestrial is confused about a single point: why are thousands of babies in the United States dying every year, or suffering permanent brain damage, because (this seems actually to be true…) the FDA won’t approve an intravenous baby food with the right mix of fats in it?  Just to answer that one question, the humans end up having to take the alien on a horror tour through what’s broken all across the modern world, from politicians to voters to journalists to granting agencies, explaining Nash equilibrium after Nash equilibrium that leaves everybody worse off but that no one can unilaterally break out of.

I do have two criticisms of the book, both relatively minor compared to what I loved about it.

First, Yudkowsky is brilliant in explaining how institutions can produce terrible outcomes even when all the individuals in them are smart and well-intentioned—but he doesn’t address the question of whether we even need to invoke those mechanisms for more than a small minority of cases.  In my own experience struggling against bureaucracies that made life hellish for no reason, I’d say that about 2/3 of the time my quest for answers really did terminate at an identifiable “empty skull”: i.e., a single individual who could unilaterally solve the problem at no cost to anyone, but chose not to.  It simply wasn’t the case, I don’t think, that I would’ve been equally obstinate in the bureaucrat’s place, or that any of my friends or colleagues would’ve been.  I simply had to accept that I was now face-to-face with an alien sub-intelligence—i.e., with a mind that fetishized rules made up by not-very-thoughtful humans over demonstrable realities of the external world.

Second, I think the quality of the book noticeably declines in the last third.  Here Yudkowsky recounts conversations in which he tried to give people advice, but he redacts all the object-level details of the conversations—so the reader is left thinking that this advice would be good for some possible values of the missing details, and terrible for other possible values!  So then it’s hard to take away much of value.

In more detail, Yudkowsky writes:

“If you want to use experiment to show that a certain theory or methodology fails, you need to give advocates of the theory/methodology a chance to say beforehand what they think they predict, so the prediction is on the record and neither side can move the goalposts.”

I only partly agree with this statement (which might be my first substantive disagreement in the book…).

Yes, the advocates should be given a chance to say what they think the theory predicts, but then their answer need not be taken as dispositive.  For if the advocates are taken to have ultimate say over what their theory predicts, then they have almost unlimited room to twist themselves in pretzels to explain why, yes, we all know this particular experiment will probably yield such-and-such result, but contrary to appearances it won’t affect the theory at all.  For science to work, theories need to have a certain autonomy from their creators and advocates—to be “rigid,” as David Deutsch puts it—so that anyone can see what they predict, and the advocates don’t need to be continually consulted about it.  Of course this needs to be balanced, in practice, against the fact that the advocates probably understand how to use the theory better than anyone else, but it’s a real consideration as well.

In one conversation, Yudkowsky presents himself as telling startup founders not to bother putting their prototype in front of users, until they have a testable hypothesis that can be confirmed or ruled out by the users’ reactions.  I confess to more sympathy here with the startup founders than with Yudkowsky.  It does seem like an excellent idea to get a product in front of users as early as possible, and to observe their reactions to it: crucially, not just a binary answer (do they like the product or not), confirming or refuting a prediction, but more importantly, reactions that you hadn’t even thought to ask about.  (E.g., that the cool features of your website never even enter into the assessment of it, because people can’t figure out how to create an account, or some such.)

More broadly, I’d stress the value of the exploratory phase in science—the phase where you just play around with your system and see what happens, without necessarily knowing yet what hypothesis you want to test.  Indeed, this phase is often what leads to formulating a testable hypothesis.

But let me step back from these quibbles, to address something more interesting: what can I, personally, take from Inadequate Equilibria?  Is academic theoretical computer science broken/inadequate in the same way a lot of other institutions are?  Well, it seems to me that we have some built-in advantages that keep us from being as broken as we might otherwise be.  For one thing, we’re overflowing with well-defined problems, which anyone, including a total outsider, can get credit for solving.  (Of course, the “outsider” might not retain that status for long.)  For another, we have no Institutional Review Boards and don’t need any expensive equipment, so the cost to enter the field is close to zero.  Still, we could clearly be doing better: why didn’t we invent Bitcoin?  Why didn’t we invent quantum computing?  (We did lay some of the intellectual foundations for both of them, but why did it take people outside TCS to go the distance?)  Do we value mathematical pyrotechnics too highly compared to simple but revolutionary insights?  It’s worth noting that a whole conference, Innovations in Theoretical Computer Science, was explicitly founded to try to address that problem—but while ITCS is a lovely conference that I’ve happily participated in, it doesn’t seem to have succeeded at changing community norms much.  Instead, ITCS itself converged to look a lot like the rest of the field.

Now for a still more pointed question: am I, personally, too conformist or status-conscious?  I think even “conformist” choices I’ve made, like staying in academia, can be defended as the right ones for what I wanted to do with my life, just as Eliezer’s non-conformist choices (e.g., dropping out of high school) can be defended as the right ones for what he wanted to do with his.  On the other hand, my acute awareness of social status, and when I lacked any—in contrast to what Eliezer calls his “status blindness,” something that I see as a tremendous gift—did indeed make my life unnecessarily miserable in all sorts of ways.

Anyway, go read Inadequate Equilibria, then venture into the world and look for some $20 bills laying on the street.  And if you find any, come back and leave a comment on this post explaining where they are, so a conformist herd can follow you.

by Scott at November 16, 2017 10:19 PM UTC

Here is another video of a smashing short talk by my dear friend Itai Benjamini with beautiful conjectures proposing an important new step in the connection between percolation and conformal geometry.

Here is the link to Itai’s original paper Percolation and coarse conformal uniformization. It turns out that the missing piece for proving the conjecture is a very interesting (innocent looking) theorem of Russo-Seymour-Welsh type.

This conjecture appears also in a paper Itai wrote with me Around two theorems and a lemma by Lucio Russo. Our paper  is part of a special issue of Mathematics and Mechanics of Complex Systems (M&MoCS),  in honor of Lucio Russo, in occasion of his retirementis.  In addition to the conjectural Russo-Seymour-Welsh type theorems, we also present some developments, connections, and problems related to Russo’s lemma and Russo’s 0-1 law.

Lucio Russo

by Gil Kalai at November 16, 2017 07:40 PM UTC

The Department of Computer Science, in collaboration with the Institute for Advanced Study, offers two 3-year positions for outstanding new Ph.D.’s working in theoretical aspects of computer science, including machine learning. Combining research with teaching duties, these positions come with attractive benefits and working conditions.
Requisition No: D-18-COS-00002


by shacharlovett at November 16, 2017 07:13 PM UTC

The Department of Computer Science at Princeton University is seeking applications for postdoctoral or more senior research positions in theoretical computer science and theoretical machine learning. Positions are for one year starting in September 2018 with the possibility of renewal contingent upon satisfactory performance and continued funding.

Requisition No: D-18-COS-00001


by shacharlovett at November 16, 2017 07:09 PM UTC

About two and a half months have passed since I started working at the Gran Sasso Science Institute (GSSI). I still have much to learn about the functioning of the Italian university system (and even of the GSSI), and it is already clear that there will be many of its aspects that I will probably never learn. However, my impression so far is that the level of bureaucracy in Italy is definitely higher than that in the countries where I have previously worked.

On page 88 of this interesting article, the historian Rogers Hollingsworth writes:
"One of the factors influencing creativity at the level of the nation state is the institutional environment in which scientists conduct research. I code scientific institutional environments as ranging from weak to strong. Weak institutional environments exert only modest influence (1) on the appointment of scientific personnel of research organizations, (2) in determining whether a particular scientific discipline will exist in a research organization, (3) over the level of funding for research organizations, (4) in prescribing the level of training necessary for a scientific appointment (e.g., the habilitation), and (5) over scientific entrepreneurship (e.g., the norms of individualism that socialize young people to undertake high-risk research projects). Strong institutional environments are at the opposite end of the continuum on each of these characteristics. Weak institutional environments have tended to facilitate greater scientific creativity in a society than strong institutional environments..."
(The emphasis is mine.) According to the above description, the Italian scientific institutional environment can only be classified as strong, for good or for worse. Fortunately, the GSSI is a centre for advanced studies and an international PhD school. Even though it has to follow the Italian law, with all its quirks, it is more nimble and less rigid  than the average Italian university. Moreover, most of the bureaucracy is hidden to the junior members of our faculty, such as the assistant professors, who can largely concentrate on the scientific work.
I have always appreciated the work that Italian researchers have been carrying out over the years, but what I have seen so far has increased my esteem for them even more. In spite of the extremely rigid system within which they are forced to operate, they manage to be productive and to maintain a strong commitment to their research work, achieving a high level of scientific production, both in quality and in quantity. The GSSI hosts a number of top-class academics in its four fields of research (computer science, mathematics, physics and social sciences) and this creates a stimulating environment for faculty and students alike.

The computer science group at the GSSI is still too small, but it seems to me that it punches well above its weight. Its members are very dedicated and have done an amazing job since the beginning of the GSSI adventure. (It was a humbling learning experience for me to present their achievements to the Scientific Advisory Board of the GSSI last Monday.)

It is clear that our group needs to grow, but we want to do so well. If you work in one of the research areas that we cover, at their intersection or in sister ones, and you think you'd be interested in working in L'Aquila with our faculty, do drop one of us a line, sending your CV and an expression of interest. We are keen to recruit strong researchers at all levels who can help us build the best international research centre in computer science we can. I can vouch that the computer science group at the GSSI provides young researchers with early-career autonomy in a nurturing environment, which isn't very common in Italy (as far as I know), and with the opportunity to become involved in the supervision of doctoral students. There are also resources for inviting collaborators and organizing thematic events, amongst other things.

It will be interesting to see how the computer science group will develop at the GSSI over the coming year. 

by Luca Aceto ( at November 16, 2017 06:28 PM UTC

During my time as Area Dean for computer science, the number of computer science majors at Harvard more than doubled.  Growth has continued, and according to the Crimson, we're now clearly the second largest concentration at Harvard.  (It looks like we just barely surpassed Government last year, and now we're clearly larger.  Economics is still bigger.)

When you consider that Applied Math is the 4th largest major group, and that's a group that is largely composed of people who do some mix of Math/Computer Science/Economics, I think the numbers actually underestimate the growth in CS at Harvard. 

It's been an exciting time since I started at Harvard in 1999.  One could see what was coming.  Though, sadly, the administration still seems far from having caught up -- we (like many other CS departments) really could use some more resources coming our way.   Universities are a bit slow to respond (sometimes for good reasons), and the speed of these changes is, in context, just remarkable.  And, I'd have to say, have added to the fun of being a CS professor.  Back in 1999, CS was incorrectly perceived to be a backwater at Harvard, small and unnoticed, but where there was a lot of excitement and energy.  Fast forward less than two decades, and now we're clearly at the core of Harvard and its future. 

by Michael Mitzenmacher ( at November 16, 2017 02:02 PM UTC

In the Spring of 2018 the US News and World Report should release their latest rankings of US graduate science programs including computer science. These are the most cited of the deluge of computer science rankings we see out there. The US News rankings have a long history and since they are reputation based they roughly correspond to how we see CS departments though some argue that reputation changes slowly with the quality of a department.

US News and World Report also has a new global ranking of CS departments. The US doesn't fare that well on the list and the ranking of the US programs on the global list are wildly inconsistent with the US list. What's going on?

75% of the global ranking is based on statistics from Web of Science. Web of Science captures mainly journal articles where conferences in computer science typically have a higher reputation and more selectivity. In many European and Asian universities hiring and promotion often depend heavily on publications and citations in Web of Science encouraging their professor to publish in journals thus leading to higher ranked international departments.

The CRA rightly put out a statement urging the CS community to ignore the global rankings, though I wished they made a distinction between the two different US News rankings.

I've never been a fan of using metrics to rank CS departments but there is a relatively new site, Emery Berger's Computer Science Rankings, based on the number of publications in major venues. CS Rankings passes the smell test for both their US and global lists and is relatively consistent with the US News reputation-based CS graduate rankings.

Nevertheless I hope CS Rankings will not become the main ranking system for CS departments. Departments who wish to raise their ranking would hire faculty based mainly on their ability to publish large number of papers in major conferences. Professors and students would then focus on quantity of papers and this would in the long run discourage risk-taking long-range research, as well as innovations in improving diversity or educating graduate students.

As Goodhart's Law states, "when a measure becomes a target, it ceases to be a good measure". Paradoxically CS Rankings can lead to good rankings of CS departments as long as we don't treat it as such.

by Lance Fortnow ( at November 16, 2017 01:42 PM UTC

This work introduces a model of distributed learning in the spirit of Yao's communication complexity model. We consider a two-party setting, where each of the players gets a list of labelled examples and they communicate in order to jointly perform some learning task. To naturally fit into the framework of learning theory, we allow the players to send each other labelled examples, where each example costs one unit of communication. This model can also be thought of as a distributed version of sample compression schemes. We study several fundamental questions in this model. For example, we define the analogues of the complexity classes P, NP and coNP, and show that in this model P equals the intersection of NP and coNP. The proof does not seem to follow from the analogous statement in classical communication complexity; in particular, our proof uses different techniques, including boosting and metric properties of VC classes. This framework allows to prove, in the context of distributed learning, unconditional separations between various learning contexts, like realizable versus agnostic learning, and proper versus improper learning. The proofs here are based on standard ideas from communication complexity as well as learning theory and geometric constructions in Euclidean space. As a corollary, we also obtain lower bounds that match the performance of algorithms from previous works on distributed classification.

at November 16, 2017 02:44 AM UTC

by David Eppstein at November 15, 2017 10:07 PM UTC

If you follow the fairness in machine learning literature, you might notice after awhile that there are two distinct families of fairness definitions:

  1. Statistical definitions of fairness (sometimes also called group fairness), and
  2. Individual notions of fairness. 
The statistical definitions are far-and-away the most popular, but the individual notions of fairness offer substantially stronger semantic guarantees. In this post, I want to dive into this distinction a little bit. Along the way I'll link to some of my favorite recent fairness papers, and end with a description of a recent paper from our group which tries to get some of the best properties of both worlds.

Statistical Definitions of Fairness
At a high level, statistical definitions of fairness partition the world into some set of "protected groups", and then ask that some statistical property be approximately equalized across these groups. Typically, the groups are defined via some high level sensitive attribute, like race or gender. Then, statistical definitions of fairness ask that some relevant statistic about a classifier be equalized across those groups. For example, we could ask that the rate of positive classification be equal across the groups (this is sometimes called statistical parity). Or, we could ask that the false positive and false negative rates be equal across the groups (This is what Hardt, Price, and Srebro call "Equalized Odds"). Or, we could ask that the positive predictive value (also sometimes called calibration) be equalized across the groups.  These are all interesting definitions, and there are very interesting impossibility results that emerge if you think you might want to try and satisfy all of them simultaneously.

These definitions are attractive: first of all, you can easily check whether a classifier satisfies them, since verifying that these fairness constraints hold on a particular distribution simply involves estimating some expectations, one for each protected group. It can be computationally hard to find a classifier that satisfies them, while minimizing the usual convex surrogate losses --- but computational hardness pervades all of learning theory, and hasn't stopped the practice of machine learning. Already there are a number of practical algorithms for learning classifiers subject to these statistical fairness constraints --- this recent paper by Agarwal et al. is one of my favorites. 

But what are the semantics of these constraints, and why do they correspond to "fairness"? You can make a different argument for each one, but here is the case for equalizing false positive and negative rates across groups (equalized odds), which is one of my favorites: 

You can argue (making the big assumption that the data is not itself already corrupted with bias --- see this paper by Friedler et al. for a deep dive into these assumptions...) that a classifier potentially does harm to an individual when it misclassifies them. If the (say) false positive rate is higher amongst black people than white people, then this means that if you perform the thought experiment of imagining yourself as a uniformly random black person with a negative label, the classifier is more likely to harm you than if you imagine yourself as a uniformly random white person with a negative label. On the other hand, if the classifier equalizes false positive and negative rates across populations, then behind the "veil of ignorance", choosing whether you want to be born as a uniformly random black or white person (conditioned on having a negative label), all else being equal, you should be indifferent!

However, in this thought experiment, it is crucial that you are a uniformly random individual from the population. If you know some fact about yourself that makes you think you are not a uniformly random white person, the guarantee can lack bite. Here is a simple example:

Suppose we have a population of individuals, with two features and a label. Each individual has a race (either black or white) and a gender (either male or female). The features are uniformly random and independent. Each person also has a label (positive or negative) that is also uniformly random, and independent of the features. We can define four natural protected groups: "Black people', "White people", "Men", and "Women". 

Now suppose we have a classifier that labels an individual as positive if and only if they are either a black man or a white woman. Across these four protected groups, this classifier appears to be fair: the false positive rate and false negative rate are all exactly 50% on all four of the protected groups, so it satisfies equalized odds. But this hides a serious problem: on two subgroups (black men and white women), the false positive rate is 100%! If being labelled positive is a bad thing, you would not want to be a random (negative) member of those sub-populations.

Of course, we could remedy this problem by simply adding more explicitly protected subgroups. But this doesn't scale well: If we have d protected binary features, there are 2^(2^d) such subgroups that we can define. If the features are real valued, things are even worse, and we quickly get into issues of overfitting (see the next section).   

Individual Definitions of Fairness
At a high level, the problem with statistical definitions of fairness is that they are defined with respect to group averages, so if you are not an "average" member of your group, they don't promise you anything. Individual definitions of fairness attempt to remedy this problem by asking for a constraint that binds at the individual level. The first attempt at this was in the seminal paper "Fairness Through Awareness", which suggested that fairness should mean that "similar individuals should be treated similarly". Specifically, if the algorithm designer knows the correct metric by which individual similarity should be judged, the constraint requires that individuals who are close in that metric should have similar probabilities of any classification outcome. This is a semantically very strong definition. It hasn't gained much traction in the literature yet, because of the hard problem of specifying the similarity metric. But I expect that it will be an important definition going forward --- there are just some obstacles that have to be overcome. I've been thinking about it a lot myself. 

Another definition of individual fairness is one that my colleagues and I at Penn introduced in this paper, which we have taken to calling "Weakly Meritocratic Fairness". I have previously blogged about it here. We define it in a more general context, and in an online setting, but it makes sense to think about what it means in the special case of batch binary classification as well. In that case, it reduces to three constraints:
  1. For any two individuals with a negative label, the false positive rate on those individuals must be equal
  2. For any two individuals with a positive label, the false negative rate on those individuals must be equal
  3. For any pair of individuals A and B such that A has a negative label, and B has a positive label, the true positive rate on B can't be lower than the false positive rate on A. (Assuming that positive is the more desirable label --- otherwise this is reversed). 
Again, we are talking about false positive rates --- but now there is no distribution --- i.e. nothing to take an average over. Instead, the constraints bind on every pair of individuals. As a result, the "rate" we are talking about is calculated only over the randomness of the classifier itself. 

This constraint in particular implies that false positive rates are equal across any sub-population that you might define, even ex-post! Problem solved! Why don't we all just use this definition of fairness? Here is why:

Weakly meritocratic fairness is extremely similar (identical, except for the added condition 3) to asking for equalized odds fairness on the (potentially infinitely large) set of "protected groups" corresponding to singletons --- everyone forms their own protected group! As a result, you shouldn't hope to be able to satisfy a fairness definition like this without making some assumption on the data generating process. Even if on your sample of data, it looks like you are satisfying this definition of fairness, you are overfitting --- the complexity of this class of groups is too large, and your in-sample "fairness" won't generalize out of sample. If you are willing to make assumptions on the data generating process, then you can do it! In our original paper, we show how to do it if you assume the labels obey a linear relationship to the features. But assumptions of this sort generally won't hold exactly in practice, which makes this kind of approach difficult to implement in practical settings. 

The "Fairness Through Awareness" definition at a high level suffers from the same kind of problem: you can only use it if you make a very strong assumption (namely that you have what everyone agrees is the "right" fairness metric). And this is the crux of the problem with definitions of individual fairness: they seem to require such strong assumptions, that we cannot actually realize them in practice. 

A Middle Ground
Asking for statistical fairness across a small number of coarsely defined groups leaves us open to unfairness on structured subgroups we didn't explicitly designate as protected (what you might call "Fairness Gerrymandering").  On the other hand, asking for statistical fairness across every possible division of the data that can be defined ex-post leaves us with an impossible statistical problem. (Consider, that every imperfect classifier could be accused of being "unfair" to the subgroup that we define, ex-post, to be the set of individuals that the classifier misclassifies! But this is just overfitting...)

What if we instead ask for statistical fairness across exponentially many (or infinitely many) subgroups, defined over a set of features we think should be protected, but ask that this family of subgroups itself has bounded VC-dimension? This mitigates the statistical problem --- we can now in principle train "fair" classifiers according to this definition without worrying about overfitting, assuming we we have enough data (proportional to the VC-dimension of the set of groups we want to protect, and the set of classifiers we want to learn). And we can do this without making any assumptions about the data. It also mitigates the "Fairness Gerrymandering" problem --- at least now, we can explicitly protect an enormous number of detailed subgroups, not just coarsely defined ones. 

This is what we (Michael Kearns, Seth Neel, Steven Wu, and I) propose in our new paper. In most of the paper, we investigate the computational challenges surrounding this kind of fairness constraint, when the statistical fairness notion we want is equality of false positive rates, false negative rates, or classification rates (statistical parity):
  • First, it is no longer clear how to even check whether a fixed classifier satisfies a fairness constraint of this sort, without explicitly enumerating all of the protected groups (and recall, there might now be exponentially or even uncountably infinitely many such subgroups). We call this the Auditing problem. And indeed, in the worst case, this is a hard problem: we show that it is equivalent to weak agnostic learning, which brings with it a long list of computational hardness results from learning theory. It is hard to audit even for simple subgroup structures, definable by boolean conjunctions over features, or linear threshold functions. However, this connection also suggests an algorithmic approach to auditing. The fact that learning linear separators is NP hard in the worst case hasn't stopped us from doing this all the time, with simple heuristics like logistic regression and SVMs, as well as more sophisticated techniques. These same heuristics can be used to solve the auditing problem. 
  • Going further, lets suppose those heuristics work --- i.e. we have oracles which can optimally solve agnostic learning problems over some collection of classifiers C, and can optimally solve the auditing problem over some class of subgroups G. Then we give an algorithm that only has to maintain small state, and provably converges to the optimal distribution over classifiers in C  that equalizes false positive rates (or false negative rates, or classification rates...) over the groups G. Our algorithm draws inspiration from the Agarwal et al. paper: "A Reductions Approach to Fair Classification". 
  • And we can run these algorithms! We use a simple linear regression heuristic to implement both the agnostic learning and auditing "oracles", and run the algorithm to learn a distribution over linear threshold functions (defined over 122 attributes) that approximately equalizes false positive rates across every group definable by a linear threshold function over 18 real valued racially associated attributes, in the "Communities and Crime" dataset. It seems to work and do well! 

In Conclusion...
I've long been bothered by the seemingly irreconcilable gap between individual and group notions of fairness. The individual notions of fairness have the right semantics, but they are unimplementable. The group notions of fairness promise something too weak. Going forward, I think these two schools of thought on fairness have to be reconciled. I think of our work as a small step in that direction. 

by Aaron Roth ( at November 15, 2017 08:09 PM UTC

In this paper, we study quantum OBDD model, it is a restricted version of read-once quantum branching programs, with respect to "width" complexity. It is known that the maximal gap between deterministic and quantum complexities is exponential. But there are few examples of functions with such a gap. We present a method (called "reordering"), which allows us to transform a Boolean function $f$ into a Boolean function $f'$, such that if for $f$ we have some gap between quantum and deterministic OBDD complexities for the natural order over the variables of $f$, then for any order we have almost the same gap for the function $f'$. Using this transformation, we construct a total function REQ such that the deterministic OBDD complexity of it is at least $2^{\Omega(n / \log n)}$, and the quantum OBDD complexity of it is at most $O(n^2)$. It is the biggest known gap for explicit functions not representable by OBDDs of a linear width. We also prove the quantum OBDD width hierarchy for complexity classes of Boolean functions. Additionally, we show that shifted equality function can also give a good gap between quantum and deterministic OBDD complexities. Moreover, we prove the bounded error probabilistic OBDD width hierarchy for complexity classes of Boolean functions. And using "reordering" method we extend a hierarchy for k-OBDDs of polynomial width, for $k = o(n / \log^3 n)$. We prove a similar hierarchy for bounded error probabilistic k-OBDDs of polynomial, superpolynomial and subexponential width.

at November 15, 2017 04:24 PM UTC

Today, Shtetl-Optimized is extremely lucky to have the special guest blogger poly: the ‘adviser’ in the computational complexity class P/poly (P with polynomial-sized advice string), defined by Richard Karp and Richard Lipton in 1982.

As an adviser, poly is known for being infinitely wise and benevolent, but also for having a severe limitation: namely, she’s sensitive only to the length of her input, and not to any other information about it.  Her name comes from the fact that her advice is polynomial-size, which is the problem that prevents her from simply listing the answers to every possible question in a gigantic lookup table, the way she’d like to.

Without further ado, let’s see what advice poly is able to offer her respondents.

Dear poly,

When my husband and I first started dating, we were going at it like rabbits!  Lately, though, he seems to have no interest in sex.  That’s not normal for a guy, is it?  What can I do to spice things up in the bedroom?

Frustrated Wife

Dear Frustrated Wife,

Unfortunately, I don’t know exactly what your question is.  All I was told is that the question was 221 characters long.  But here’s something that might help: whenever you’re stuck in a rut, sometimes you can “shake things up” with the use of randomness.  So, please accept, free of charge, the following string of 221 random bits:


Well, it’s not really “random,” since everyone else with a 221-character question would’ve gotten the exact same string.  But it’s random enough for many practical purposes.  I hope it helps you somehow … good luck!


Dear poly,

I’m a 29-year-old autistic male: a former software entrepreneur currently worth about $400 million, who now spends his time donating to malaria prevention and women’s rights in the developing world.  My issue is that I’ve never been on a date, or even kissed anyone.  I’m terrified to make an advance.  All I read in the news is an endless litany of male sexual misbehavior: Harvey Weinstein, Louis C. K., Leon Wieseltier, George H. W. Bush, Roy Moore, the current president (!), you name it.  And I’m consumed by the urge not to be a pig like those guys.  Like, obviously I’m no more likely to start stripping or masturbating or something in front of some woman I just met, than I am to morph into a koala bear.  But from reading Slate, Salon, Twitter, my Facebook news feed, and so forth, I’ve gotten the clear sense that there’s nothing I could do that modern social mores would deem appropriate and non-creepy—at least, not a guy like me, who wasn’t lucky enough to be born instinctively understanding these matters.  I’m grateful to society for enabling my success, and have no desire to break any of its written or unwritten rules.  But here I genuinely don’t know what society wants me to do.  I’m writing to you because I remember you from my undergrad CS classes—and you’re the only adviser I ever encountered whose advice could be trusted unconditionally.

Yours truly,
Sensitive Nerd

Dear Sensitive Nerd,

I see your that letter is 1369 characters long.  Based on that, here are a few things I can tell you that might be helpful:

  • The Riemann Hypothesis is true.
  • ZFC set theory is consistent.
  • The polynomial hierarchy is contained in PP.

Write me a 3592-character letter the next time, and I’ll give you an even longer list of true mathematical statements!  (I actually know how to solve the halting problem—no joke!—but am condemned to drip, drip, drip out the solutions, a few per input length.)

But I confess: no sooner did I list these truths than I reflected that they, or even a longer list, might not help much with your problem, whatever it might have been.  It’s even possible to have a problem for which no amount of truth helps in solving it.  So, I dunno: maybe try not worrying so much, and write back to let me know if that helped?  (Not that I expect to understand your reply, or would be able to change any of my advice at this point even if I did.)

Good luck!

Dear poly,


Unhappy in Unary

Dear Unhappy in Unary,

Finally, someone who writes to me in a language I can understand!  Your question is 11 characters long.  I understand that to be a code expressing that you’re bankrupt, and are filing for Chapter 11 bankruptcy protection.  Financial insolvency isn’t easy for anyone.  But here’s some advice: put everything you have into Bitcoin, and sell out a year from now.  Unfortunately, I don’t know exactly when you’re writing to me, but at least at the time my responses were hardwired in, this was some damn good advice.

You’re welcome,

poly’s polynomial-sized advice column is syndicated in newspapers nationwide, and can also be accessed by simply moving your tape head across your advice tape. You’re welcome to comment on this post, but I might respond only to the lengths of the comments, rather than anything else about them. –SA

by Scott at November 15, 2017 11:22 AM UTC

Did you know that, for every embedded planar graph, it’s possible to partition the edges into two subsets, one of which forms an Eulerian subgraph of the given graph and the other of which forms an Eulerian subgraph of the dual?

Here by “Eulerian” I mean that all vertex degrees are even, but I allow disconnected subgraphs and vertices of degree zero. In fact, by Euler’s formula, there must be at least two degree-zero vertices somewhere (both in the primal, both in the dual, or split between the two graphs). The dual might have self-loops or parallel edges, but the result still holds as long as we count the degree of a self-loop as being two. I also allow either of the two sets of edges in the partition to be empty.

For example, the graph of the dodecahedron can be covered by three cycles, two of length five on two opposite faces and one of length ten between them. These three cycles form an Eulerian subgraph, and the remaining ten edges give an Eulerian subgraph of the dual consisting of two 5-cycles and two isolated vertices.

Partition of the edges of a dodecahedron into Eulerian subgraphs in the primal and dual graph

Probably this is an exercise in a graph theory textbook somewhere, but I don’t know the reference. I also don’t know a direct proof for this fact. Instead, I can show that it follows from a related Eulerian partition problem on arbitrary graphs. This time, we partition vertices instead of edges. Every graph has a partition of its vertices into two subsets such that the two subgraphs induced by these two sets are both Eulerian. For instance, here’s a partition of the dodecahedron into an induced pair of 5-cycles (two opposite faces of the dodecahedron), and a complementary induced 10-cycle (the equator midway between these two faces):

Partition of the vertices of a dodecahedron into Eulerian induced subgraphs

You can go from a vertex partition of a planar graph to an edge partition by letting the primal Eulerian subgraph of the edge partition be the union of the two induced subgraphs in the vertex partition. As you go around any face of the planar graph, the vertices must alternate between the two sides of the vertex partition, implying that the remaining edges (the ones not part of either induced subgraph) must have an even number around every face, and form an Eulerian subgraph of the dual. In the other direction, suppose you have a partition of the edges of a planar graph into Eulerian and dual-Eulerian subgraphs. Then you can 2-color the faces of the dual-Eulerian subgraph, and this 2-coloring gives you a partition of the vertices of the given graph which partitions the Eulerian side into two induced subgraphs.

Because it doesn’t talk about dual graphs, the vertex partition formulation makes sense for any graph. Most small graphs are pretty easy, but it makes an amusing puzzle to find a partition into two even-degree induced subgraphs for the Clebsch graph, the graph you get from a four-dimensional hypercube by adding its long diagonals.

Construction of the Clebsch graph from a hypercube

Here’s an algorithm for finding Eulerian vertex partitions, and an algorithmic proof that these partitions always exist.

  1. While the graph contains an odd-degree vertex , complement the neighborhood of (adding an edge between every two non-adjacent neighbors, and removing the edge between every two adjacent neighbors) and then remove from the graph.

  2. Once all remaining vertices have even degree, use the trivial vertex partition on the remaining Eulerian graph: put all vertices into one side of the partition, and make the other side be the empty set.

  3. Add back the vertices that were removed, in the reverse of the removal order. For each such vertex , after adding it back, complement the neighborhood of again (restoring the graph to the state it was in just prior to the removal of ) and then look at how the neighbors of are partitioned. Because has an odd total number of neighbors, one side of the partition must have an odd number of neighbors and the other side must have an even number. Place on the side of the partition that has an even number of neighbors.

Consider what happens to the degree of a vertex in its induced subgraph, when a vertex that has as its neighbor is added back to the graph. If is on the odd side of the partition of the neighbors of , then complementing the neighborhood of will cause an even number of vertices (the other ones on the same side of the partition) to change between being induced neighbors and non-neighbors of . Since had an even number of induced neighbors before this complement operation, it continues to have an even number of them after the operation. On the other hand, if is on the even side, complementing the neighborhood of will cause an odd number of vertices to change between being induced neighbors and non-neighbors of , changing the degree of in its induced subgraph to change from even to odd. But then, will be added to the induced subgraph as a neighbor of , causing its degree to become even again. And of course, since we added to the even subset of its neighbors, its own degree in the induced subgraph also becomes even. Because we have a valid Eulerian partition after step 2 and the partition remains valid after each successive step, the algorithm terminates with a valid partition of the whole graph.

This is vaguely reminiscent of an unpublished paper on counting spanning trees that I wrote some years ago. In that paper, I connected the spanning tree counting problem to the existence of a different kind of Eulerian vertex partition: a partition such that the subgraph of edges that connect one side of the partition to the other should have even degree everywhere. These kinds of partitions don’t always exist, but the paper showed that they exist if and only if the graph has an even number of spanning trees. It makes me wonder whether there’s a stronger connection here than just vague resemblance, and whether there’s some interesting structure to the set of all Eulerian partitions of a graph.


by David Eppstein at November 14, 2017 02:51 PM UTC

For any unsatisfiable CNF formula $F$ that is hard to refute in the Resolution proof system, we show that a gadget-composed version of $F$ is hard to refute in any proof system whose lines are computed by efficient communication protocols---or, equivalently, that a monotone function associated with $F$ has large monotone circuit complexity. Our result extends to monotone real circuits, which yields new lower bounds for the Cutting Planes proof system.

at November 14, 2017 03:01 AM UTC

Is there any ongoing project to formally verify the theorems and proofs of complexity theory using a proof assistant like Coq? Are there any boundaries to doing this?

by Samuel Schlesinger at November 13, 2017 04:25 PM UTC

The School of Mathematics welcomes applications from post-doctor, mid-career, and senior mathematicians and theoretical computer scientists, and strongly encourages applications from women and minorities. Stipends, on-campus housing and other resources are available for periods of 4-11 months for individual researchers in all mathematical subject areas. The School supports 40 post-docs each yr.


by shacharlovett at November 13, 2017 03:16 PM UTC

A Pangram is a sentence that contains every letter of the alphabet

The classic is:

                                      The quick brown fox jumps over the lazy dog.

(NOTE- I had `jumped' but a reader pointed out that there was no s, and that `jumps' is the correct word)

which is only 31 letters.

I could give a pointer to lists of such, but you can do that yourself.

My concern is:

a) are there any pangrams that have actually been uttered NOT in the context of `here is a pangram'

b) are there any that really could.

That is- which pangrams are natural?  I know this is an ill defined question.

Here are some candidates for natural pangrams

1) Pack my box with five dozen liquor jugs

2) Amazingly few discotheques provide jukeboxes

3) Watch Jeopardy! Alex Trebek's fun TV quiz game

4) Cwm fjord bank glyphs vext quiz
(Okay, maybe that one is not natural as it uses archaic words. It means
``Carved symbols in a mountain hollow on the bank of an inlet irritated an
eccentric person'  Could come up in real life. NOT. It uses every letter
exactly once.)

How can you measure how natural they are?

For the Jeopardy one I've shown it to people and asked them

``What is unusual about this new slogan for the show Jeopardy?''

and nobody gets it. more important- they believe it is the new slogan.

So I leave to the reader:

I) Are the other NATURAL pangrams?

II) How would you test naturalness of such?

Pinning down `natural' is hard. I did a guest post in 2004 before I was an official co-blogger, about when a problem (a set for us) is natural, for example the set all regular expressions with squaring (see here).

by GASARCH ( at November 13, 2017 02:48 PM UTC

Department of Computer Science, University of Copenhagen is offering associate professorship in algorithms expected to commence 1 June 2018 or as soon as possible thereafter.

Please find the full announcement at


by shacharlovett at November 13, 2017 12:21 PM UTC

Department of Computer Science, University of Copenhagen is offering tenure-track assistant professorship in algorithms expected to commence 1 June 2018 or as soon as possible thereafter.

Please find the full announcement at


by shacharlovett at November 13, 2017 10:58 AM UTC

As some of you already know there has been some movement at MSR lately, specifically for the theory group. We have now branched out into two groups, one in Machine Learning and Optimization -MLO- (with Zeyuan Allen-Zhu, myself, Ofer Dekel, Yuval Peres, Ilya Razenshteyn, Lin Xiao, and until recently postdoc Yin Tat Lee) and one in Algorithms (Nikhil Devanur, Sergey Yekhanin, postdocs Janardhan Kulkarni and Xiaorui Sun). The MLO group is part of the new MSR AI organization headed by Eric Horvitz, with grand goals at all levels of AIs (including a broad view of the foundations of AI which very much includes the type of mathematics showcased on this blog). Both groups are now hiring at all levels. From my point of view MSR is a unique place where you can devote all your time and energy to your research. The growth opportunities are truly amazing here: constant stream of world-class experts in algorithms, optimization, probability; lots of resources for travel, interns, postodcs; closeness to real-world products and possibilities to see your research directly applied. Below is the official call for application of the MLO group (the Algorithms’ one is here). Looking forward to receive your application!

Call for applications for researcher and postdoc positions in Machine Learning and Optimization

Application deadline: December 1st, 2017

The Machine Learning and Optimization Group at Microsoft Research Redmond invites applications for researcher positions at all levels (postdoc, full-time junior, full-time senior). The current full-time research members are Zeyuan Allen-Zhu, Sebastien Bubeck, Ofer Dekel, Yuval Peres, Ilya Razenshteyn, and Lin Xiao.

All applicants working on machine learning and optimization, including their algorithmic and mathematical foundations, will be considered. We expect candidates to have demonstrated excellence in research and have developed their own original research agenda. Our current areas of focus include statistical and online learning, convex and non-convex optimization, high dimensional data analysis, combinatorial optimization and its applications in AI, statistics, and probability. We are also looking to expand our domain of expertise to other areas of modern machine learning, including more applied research areas.

Microsoft Research offers wonderful resources to develop your research, opportunities for collaborations across all MSR labs and top academic institutions, and the potential to realize your ideas in products and services used worldwide. Our group is part of Microsoft Research AI, a new organization that brings together the breadth of talent across Microsoft Research to pursue game-changing advances in artificial intelligence.

Please apply directly on the Microsoft Research Careers website and include Ofer Dekel as a Microsoft Research contact. In addition, send a copy of your application to To be assured full consideration, all your materials, including at least two reference letters, should arrive by December 1st, 2017. We recommend including a brief academic research statement and links to publications.

Microsoft is an equal opportunity employer. We welcome applications from all qualified candidates, and in particular from women and underrepresented groups.

by Sebastien Bubeck at November 12, 2017 10:34 PM UTC

March 19-22, 2018 UC San Diego The 3.5-day Spring school will bring TCS researchers up to speed on the current excitement in quantum computing. The past decade had marked tremendous experimental progress, from one or two-qubit devices to dozens of qubits and more. What are the theoretical models for such devices, and what are … Continue reading Quantum Computation school

by shacharlovett at November 12, 2017 05:59 PM UTC

Perhaps in my previous post I should have explained more precisely why I think many things would be better if women were in control. I tried to summarize many statistics, papers, and books that I read through the years, but some may have found my language too sweeping. Let me try to be a bit more precise now.

First, polling conducted during the past four decades has shown that typically, men favor the United States’ going to war to resolve disputes much more than women do.

Second, women are more concerned about climate change than men, and they are more willing to make major lifestyle changes to do something about it.

Finally, it’s also a fact that women live longer, see e.g. this. The advantage is quite noticeable: about 5 years. I won’t give a statistic for what the consequences of this are, instead I’ll conduct the following mental experiment. Suppose population X has expected lifespan 1000 years, while population Y has expected lifespan 100 years. I think population X would be more interested in renewable energies, sustainable practices, et cetera.

by Emanuele at November 12, 2017 01:52 PM UTC

The next TCS+ talk will take place this coming Wednesday, November 15th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 18:00 UTC). Vinod Vaikuntanathan from MIT will speak about “Program Obfuscation and Random CSPs: The Love-Hate Relationship” (abstract below).

Please make sure you reserve a spot for your group to join us live by signing up on the online form. As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

Abstract: A recent line of work shows how to construct indistinguishability obfuscation under two assumptions: (a) that there exist k-linear maps for some constant k; and (b) that certain random O(k)-constraint satisfaction problems (CSPs) are hard in an appropriate sense. The latest of these works (by Lin and Tessaro) assumes the existence of 3-linear maps and the hardness of certain random 3-CSPs. We have 1-linear maps since the 1970s; 2-linear maps since the 1990s; but the existence of 3-linear maps is wide open. On the other hand, we do have reasonable constructions of “secure” random 3-CSPs. The first part of the talk will describe these developments.

Much more surprising was a result (from the same work of Lin and Tessaro) which showed a construction from 2-linear maps and the hardness of random 2-CSPs over a large alphabet. Overnight, the burden of existence of IO went from the question of whether 3-linear maps exist to the completely unrelated question of whether random 2-CSPs over large alphabets is hard. In a nutshell, they require the existence of pseudo-random generators G: \Sigma^n \to {0,1}^m for some poly(n)-size alphabet \Sigma where each output bit depends on at most two input alphabet symbols, and which achieve sufficiently large stretch. In the second part of the talk, we will present a polynomial-time algorithm that breaks these random CSPs.

Based on joint work with Alex Lombardi (MIT) and Rachel Lin (UCSB).

by plustcs at November 10, 2017 05:45 PM UTC

There is a faculty opening in algorithms and complexity at Department of Computer Science, University of Oxford. Applicants at any level welcome.


by shacharlovett at November 10, 2017 12:24 PM UTC

A soon-to-be professor asked me recently if I could share some ideas on on how to advise students. I started to write some notes only to realize that I had already posted on the topic in 2006.
Have students work on problems that interest them not just you. I like to hand them a proceedings of a recent conference and have them skim abstracts to find papers they enjoy. However if they stray too far from your research interests, you will have a hard time pushing them in the right directions. And don't work on their problems unless they want you to.
Keep your students motivated. Meet with them on a regular basis. Encourage students to discuss their problems and other research questions with other students and faculty. Do your best to keep their spirits high if they have trouble proving theorems or are not getting their papers into conferences. Once they lose interest in theory they won't succeed.
Feel free to have them read papers, do some refereeing and reviewing, give talks on recent great papers. These are good skills for them to learn. But don't abuse them too much.
Make sure they learn that selling their research is as important as proving the theorems. Have them write the papers and make them rewrite until the paper properly motivates the work. Make them give practice talks before conferences and do not hold back on the criticism.
Some students will want to talk about some personal issues they have. Listen as a friend and give some suggestions without being condescending. But if they have a serious emotional crisis, you are not trained for that; point them to your university counseling services.
Once it becomes clear a student won't succeed working with you, or won't succeed as a theorist or won't succeed in graduate work, cut them loose. The hardest thing to do as an advisor is to tell a student, particular one that tries hard, that they should go do something else. It's much easier to just keep them on until they get frustrated and quit, but you do no one any favors that way.
Computer science evolves dramatically but the basic principles of advising don't. This advise pretty much works now as well as it did in 2006, in the 80's when I was in the student or even the 18th century. Good advising never goes out of style.

Of course I don't and can't hand out a physical proceedings to a student to skim. Instead I point to on-line proceedings but browsing just doesn't have the same feel.

Looking back I would add some additional advice. Push your students and encourage them to take risks with their research. If they aren't failing to solve their problems, they need to try harder problems. We too often define success by having your paper accepted into a conference. Better to have an impact on what others do.

Finally remember that advising doesn't stop at the defense. It is very much a parent-child relationship that continues long after graduation. Your legacy as a researcher will eventually come to an end. Your legacy as an advisor will live on through those you advise and their students and so on to eternity.

by Lance Fortnow ( at November 09, 2017 07:08 PM UTC

In 2007, we introduced Google Street View, enabling you to explore the world through panoramas of neighborhoods, landmarks, museums and more, right from your browser or mobile device. The creation of these panoramas is a complicated process, involving capturing images from a multi-camera rig called a rosette, and then using image blending techniques to carefully stitch them all together. However, many things can thwart the creation of a "successful" panorama, such as mis-calibration of the rosette camera geometry, timing differences between adjacent cameras, and parallax. And while we attempt to address these issues by using approximate scene geometry to account for parallax and frequent camera re-calibration, visible seams in image overlap regions can still occur.
Left: A Street View car carrying a multi-camera rosette. Center: A close-up of the rosette, which is made up of 15 cameras. Right: A visualization of the spatial coverage of each camera. Overlap between adjacent cameras is shown in darker gray.
Left: The Sydney Opera House with stitching seams along its iconic shells. Right: The same Street View panorama after optical flow seam repair.
In order to provide more seamless Street View images, we’ve developed a new algorithm based on optical flow to help solve these challenges. The idea is to subtly warp each input image such that the image content lines up within regions of overlap. This needs to be done carefully to avoid introducing new types of visual artifacts. The approach must also be robust to varying scene geometry, lighting conditions, calibration quality, and many other conditions. To simplify the task of aligning the images and to satisfy computational requirements, we’ve broken it into two steps.

Optical Flow
The first step is to find corresponding pixel locations for each pair of images that overlap. Using techniques described in our PhotoScan blog post, we compute optical flow from one image to the other. This provides a smooth and dense correspondence field. We then downsample the correspondences for computational efficiency. We also discard correspondences where there isn’t enough visual structure to be confident in the results of optical flow.

The boundaries of a pair of constituent images from the rosette camera rig that need to be stitched together.
An illustration of optical flow within the pair’s overlap region.
Extracted correspondences in the pair of images. For each colored dot in the overlap region of the left image, there is an equivalently-colored dot in the overlap region of the right image, indicating how the optical flow algorithm has aligned the point. These pairs of corresponding points are used as input to the global optimization stage. Notice that the overlap covers only a small portion of each image.
Global Optimization
The second step is to warp the rosette’s images to simultaneously align all of the corresponding points from overlap regions (as seen in the figure above). When stitched into a panorama, the set of warped images will then properly align. This is challenging because the overlap regions cover only a small fraction of each image, resulting in an under-constrained problem. To generate visually pleasing results across the whole image, we formulate the warping as a spline-based flow field with spatial regularization. The spline parameters are solved for in a non-linear optimization using Google’s open source Ceres Solver.
A visualization of the final warping process. Left: A section of the panorama covering 180 degrees horizontally. Notice that the overall effect of warping is intentionally quite subtle. Right: A close-up, highlighting how warping repairs the seams.
Our approach has many similarities to previously published work by Shum & Szeliski on “deghosting” panoramas. Key differences include that our approach estimates dense, smooth correspondences (rather than patch-wise, independent correspondences), and we solve a nonlinear optimization for the final warping. The result is a more well-behaved warping that is less likely to introduce new visual artifacts than the kernel-based approach.
Left: A close-up of the un-repaired panorama. Middle: Result of kernel-based interpolation. This fixes discontinuities but at the expense of strong wobbling artifacts due to the small image overlap and limited footprint of kernels. Right: Result of our global optimization.
This is important because our algorithm needs to be robust to the enormous diversity in content in Street View’s billions of panoramas. You can see how effective the algorithm is in the following examples:
Tower Bridge, London
Christ the Redeemer, Rio de Janeiro
An SUV on the streets of Seattle
This new algorithm was recently added to the Street View stitching pipeline. It is now being used to restitch existing panoramas on an ongoing basis. Keep an eye out for improved Street View near you!

Special thanks to Bryan Klingner for helping to integrate this feature with the Street View infrastructure.

by Research Blog ( at November 09, 2017 06:30 PM UTC

The Algorithms and Randomness Center (ARC) at Georgia Tech is recruiting postdocs. These are 2-year positions with no teaching requirements. We are seeking applicants with expertise in the areas of algorithms, computational complexity, discrete mathematics, optimization, or theoretical machine learning.


by shacharlovett at November 09, 2017 02:28 PM UTC

A few summers ago, I had an opportunity to visit Copenhagen, and work with Rasmus Pagh for a month.   I (and the family) liked it so much we went back the next summer for a hashing workshop.  While algorithms and data structure people were already familiar with Denmark's Aarhus and MADALGO, it was clear to me that Copenhagen was on the rise as an algorithmics center.

That's now been formalized, thanks to a huge new grant for algorithms research centered around a collection of fantastic researchers in Copenhagen -- and they're looking for graduate students, post-docs, visitors, and faculty.  Rasmus sent me a blurb that I'm posting below, introducing the new Basic Algorithms Research Copenhagen, or BARC.

This is wonderful in so many ways.  First, it is always great generally for our community when algorithmic work is recognized and funded.  Second, this is a fantastic group of researchers -- for example, obviously I am biased, but Rasmus and Mikkel Thorup, both individually and together, have just been knocking it out of the park over and over again for years in algorithms and data structures (particularly with regard to hashing).  Third, for many, this is a marvelous opportunity.  Copenhagen is just a wonderful city, and the chance to go and work there with the top people is something many people will enjoy.  (And don't we do better work when we're someplace we enjoy being?)

With all that, let me put up the blurb on BARC, and point you to the home page at and the Facebook page at  (Note there are deadlines coming up for some of the positions.)

BARC (Basic Algorithms Research Copenhagen) is new center for foundational research in design and analysis of algorithms. The center, which will be inaugurated onDecember 1, is led by Mikkel Thorup. It aims to "attract top talent from around the world to an ambitious, creative, collaborative, and fun environment” in Copenhagen, Denmark. Other core scientists in the center are Stephen Alstrup, Thore Husfeldt, and Rasmus Pagh. The center is part of a larger effort increasing the number of faculty members in algorithms research, and making Copenhagen a hub for algorithms research. The BARC center has received funding of over 6 million USD for the next 6 years from VILLUM Fonden to employ graduate students and post-docs, and to attract a stream of long- and short-term visitors. People interested in BARC should visit the web pages at or follow @barcdk on social media — the first assistant/associate professor, post-doc, and PhD student positions are already announced, with deadlines in November and December.

by Michael Mitzenmacher ( at November 09, 2017 01:02 PM UTC