Theory of Computing Blog Aggregator

The Murder by Robot in R.U.R. (Image from Wikipedia.)The robot that runs amok and turns on its maker has been a staple of fiction and film for at least a century. The plotline goes back to Karel ?apek’s 1921 play R.U.R., with earlier shadows of the same idea in Mary Shelley’s Frankenstein and the golem stories of Jewish folklore. Nowadays we have Arnold Schwarzenegger dressed up as The Terminator.

A number of thoughtful people (including Stephen Hawking, Nick Bostrom, Bill Gates) believe we should take the threat of AI insurrection seriously. They argue that in decades to come we could very well create some sort of conscious entity that might decide the planet would be a nicer place without us.

In the meantime there are lesser but more urgent threats—machines that would not exterminate our species but might make our lives a lot less fun. An open letter released earlier this week, at the International Joint Conference on AI, calls for an international ban on autonomous weapon systems.

Autonomous weapons select and engage targets without human intervention. They might include, for example, armed quadcopters that can search for and eliminate people meeting certain pre-defined criteria, but do not include cruise missiles or remotely piloted drones for which humans make all targeting decisions. Artificial Intelligence (AI) technology has reached a point where the deployment of such systems is — practically if not legally — feasible within years, not decades, and the stakes are high: autonomous weapons have been described as the third revolution in warfare, after gunpowder and nuclear arms.

When I last checked, the letter had 2,414 signers who identify themselves as AI/robotics researchers, and 14,078 other endorsers. I’ve added my name to the latter list.

A United Nations declaration, or even a multilateral treaty, is not going to totally prevent the development and use of such weapons. The underlying technologies are too readily accessible. The self-driving car that can deliver the kids to soccer practice can also deliver a bomb. The chip inside a digital camera that recognizes a smiling face and automatically trips the shutter might also recognize a soldier and pull the trigger. As the open letter points out:

Unlike nuclear weapons, they require no costly or hard-to-obtain raw materials, so they will become ubiquitous and cheap for all significant military powers to mass-produce. It will only be a matter of time until they appear on the black market and in the hands of terrorists, dictators wishing to better control their populace, warlords wishing to perpetrate ethnic cleansing, etc.

What would Isaac Asimov say about all this?

I was lucky enough to meet Asimov, though only once, and late in his life. He was in a hospital bed, recovering from heart surgery. He handed me his business card:


Natural Resource

No false modesty in this guy. But despite this braggadocio, he could equally well have handed out cards reading:


Gentle Soul

Asimov was a Humanist with a capital H, and he endowed the robots in his stories with humanistic ethics. They were the very opposite of killer machines. Their platinum-iridium positronic brains were hard-wired with rules that forbade harming people, and they would intervene to prevent people from harming people. Several of the stories describe robots struggling with moral dilemmas as they try to reconcile conflicts in the Three Laws of Robotics.

Asimov wanted to believe that when technology finally caught up with science fiction, all sentient robots and other artificial minds would be equipped with some version of his three laws. The trouble is, we seem to be stuck at a dangerous intermediate point along the path to such sentient beings. We know how to build machines capable of performing autonomous actions—perhaps including lethal actions—but we don’t yet know how to build machines capable of assuming moral responsibility for their actions. We can teach a robot to shoot, but not to understand what it means to kill.

Ever since the 1950s, much work on artificial intelligence and robotics has been funded by military agencies. The early money came from the Office of Naval Research (ONR) and from ARPA, which is now DARPA, the Defense Advanced Research Projects Agency. Military support continues today; witness the recently concluded DARPA Robotics Challenge. As far as I know, none of the projects currently under way in the U.S. aims to produce a “weaponized robot.” On the other hand, as far as I know, that goal has never been renounced either.

by Brian Hayes at July 31, 2015 06:36 PM UTC

Let $U_{k,N}$ denote the Boolean function which takes as input $k$ strings of $N$ bits each, representing $k$ numbers $a^{(1)},\dots,a^{(k)}$ in $\{0,1,\dots,2^{N}-1\}$, and outputs 1 if and only if $a^{(1)} + \cdots + a^{(k)} \geq 2^N.$ Let THR$_{t,n}$ denote a monotone unweighted threshold gate, i.e., the Boolean function which takes as input a single string $x \in \{0,1\}^n$ and outputs $1$ if and only if $x_1 + \cdots + x_n \geq t$. The function $U_{k,N}$ may be viewed as a monotone function that performs addition, and THR$_{t,n}$ may be viewed as a monotone function that performs counting. We refer to circuits that are composed of THR gates as monotone majority circuits. The main result of this paper is an exponential lower bound on the size of bounded-depth monotone majority circuits that compute $U_{k,N}$. More precisely, we show that for any constant $d \geq 2$, any depth-$d$ monotone majority circuit computing $U_{d,N}$ must have size $\smash{2^{\Omega(N^{1/d})}}$. Since $U_{k,N}$ can be computed by a single monotone weighted threshold gate (that uses exponentially large weights), our lower bound implies that constant-depth monotone majority circuits require exponential size to simulate monotone weighted threshold gates. This answers a question posed by Goldmann and Karpinski (STOC'93) and recently restated by Hastad (2010, 2014). We also show that our lower bound is essentially best possible, by constructing a depth-$d$, size-$2^{O(N^{1/d})}$ monotone majority circuit for $U_{d,N}$. As a corollary of our lower bound, we significantly strengthen a classical theorem in circuit complexity due to Ajtai and Gurevich (JACM'87). They exhibited a monotone function that is in AC$^0$ but requires super-polynomial size for any constant-depth monotone circuit composed of unbounded fan-in AND and OR gates. We describe a monotone function that is in depth-$3$ AC$^0$ but requires exponential size monotone circuits of any constant depth, even if the circuits are composed of THR gates.

at July 31, 2015 03:38 PM UTC

Authors: Giovanni Pighizzini
Download: PDF
Abstract: We present two restricted versions of one-tape Turing machines. Both characterize the class of context-free languages. In the first version, proposed by Hibbard in 1967 and called limited automata, each tape cell can be rewritten only in the first $d$ visits, for a fixed constant $d\geq 2$. Furthermore, for $d=2$ deterministic limited automata are equivalent to deterministic pushdown automata, namely they characterize deterministic context-free languages. Further restricting the possible operations, we consider strongly limited automata. These models still characterize context-free languages. However, the deterministic version is less powerful than the deterministic version of limited automata. In fact, there exist deterministic context-free languages that are not accepted by any deterministic strongly limited automaton.

at July 31, 2015 12:40 AM UTC

Authors: Dhiraj Madan, Sandeep Sen
Download: PDF
Abstract: We develop new techniques for rounding packing integer programs using iterative randomized rounding. It is based on a novel application of multidimensional Brownian motion in $\mathbb{R}^n$. Let $\overset{\sim}{x} \in {[0,1]}^n$ be a fractional feasible solution of a packing constraint $A x \leq 1,\ \ $ $A \in {\{0,1 \}}^{m\times n}$ that maximizes a linear objective function. The independent randomized rounding method of Raghavan-Thompson rounds each variable $x_i$ to 1 with probability $\overset{\sim}{x_i}$ and 0 otherwise. The expected value of the rounded objective function matches the fractional optimum and no constraint is violated by more than $O(\frac{\log m} {\log\log m})$.In contrast, our algorithm iteratively transforms $\overset{\sim}{x}$ to $\hat{x} \in {\{ 0,1\}}^{n}$ using a random walk, such that the expected values of $\hat{x}_i$'s are consistent with the Raghavan-Thompson rounding. In addition, it gives us intermediate values $x'$ which can then be used to bias the rounding towards a superior solution.The reduced dependencies between the constraints of the sparser system can be exploited using {\it Lovasz Local Lemma}. For $m$ randomly chosen packing constraints in $n$ variables, with $k$ variables in each inequality, the constraints are satisfied within $O(\frac{\log (mkp\log m/n) }{\log\log (mkp\log m/n)})$ with high probability where $p$ is the ratio between the maximum and minimum coefficients of the linear objective function. Further, we explore trade-offs between approximation factors and error, and present applications to well-known problems like circuit-switching, maximum independent set of rectangles and hypergraph $b$-matching. Our methods apply to the weighted instances of the problems and are likely to lead to better insights for even dependent rounding.

at July 31, 2015 12:41 AM UTC

Authors: Maurice Margenstern
Download: PDF
Abstract: In this paper, we prove two results. First, there is a family of sequences of embedded quarters of the hyperbolic plane such that any sequence converges to a limit which is an end of the hyperbolic plane. Second, there is no algorithm which would allow us to check whether two given ends are equal or not.

at July 31, 2015 12:41 AM UTC

Authors: Moran Feldman, Rico Zenklusen
Download: PDF
Abstract: During the last decade, the matroid secretary problem (MSP) became one of the most prominent classes of online selection problems. Partially linked to its numerous applications in mechanism design, substantial interest arose also in the study of nonlinear versions of MSP, with a focus on the submodular matroid secretary problem (SMSP). So far, O(1)-competitive algorithms have been obtained for SMSP over some basic matroid classes. This created some hope that, analogously to the matroid secretary conjecture, one may even obtain O(1)-competitive algorithms for SMSP over any matroid. However, up to now, most questions related to SMSP remained open, including whether SMSP may be substantially more difficult than MSP; and more generally, to what extend MSP and SMSP are related.

Our goal is to address these points by presenting general black-box reductions from SMSP to MSP. In particular, we show that any O(1)-competitive algorithm for MSP, even restricted to a particular matroid class, can be transformed in a black-box way to an O(1)-competitive algorithm for SMSP over the same matroid class. This implies that the matroid secretary conjecture is equivalent to the same conjecture for SMSP. Hence, in this sense SMSP is not harder than MSP. Also, to find O(1)-competitive algorithms for SMSP over a particular matroid class, it suffices to consider MSP over the same matroid class. Using our reductions we obtain many first and improved O(1)-competitive algorithms for SMSP over various matroid classes by leveraging known algorithms for MSP. Moreover, our reductions imply an O(loglog(rank))-competitive algorithm for SMSP, thus, matching the currently best asymptotic algorithm for MSP, and substantially improving on the previously best O(log(rank))-competitive algorithm for SMSP.

at July 31, 2015 12:41 AM UTC

Authors: Muhibur Rasheed, Chandrajit Bajaj
Download: PDF
Abstract: The regular polyhedra have the highest order of 3D symmetries and are exceptionally at- tractive templates for (self)-assembly using minimal types of building blocks, from nano-cages and virus capsids to large scale constructions like glass domes. However, they only represent a small number of possible spherical layouts which can serve as templates for symmetric assembly. In this paper, we formalize the necessary and sufficient conditions for symmetric assembly using exactly one type of building block. All such assemblies correspond to spherical polyhedra which are edge-transitive and face-transitive, but not necessarily vertex-transitive. This describes a new class of polyhedra outside of the well-studied Platonic, Archimedean, Catalan and and Johnson solids. We show that this new family, dubbed almost-regular polyhedra, can be pa- rameterized using only two variables and provide an efficient algorithm to generate an infinite series of such polyhedra. Additionally, considering the almost-regular polyhedra as templates for the assembly of 3D spherical shell structures, we developed an efficient polynomial time shell assembly approximation algorithm for an otherwise NP-hard geometric optimization problem.

at July 31, 2015 12:41 AM UTC

Authors: Pasin Manurangsi, Dana Moshkovitz
Download: PDF
Abstract: In this paper, we present a polynomial-time algorithm that approximates sufficiently high-value Max 2-CSPs on sufficiently dense graphs to within $O(N^{\varepsilon})$ approximation ratio for any constant $\varepsilon > 0$. Using this algorithm, we also achieve similar results for free games, projection games on sufficiently dense random graphs, and the Densest $k$-Subgraph problem with sufficiently dense optimal solution. Note, however, that algorithms with similar guarantees to the last algorithm were in fact discovered prior to our work by Feige et al. and Suzuki and Tokuyama.

In addition, our idea for the above algorithms yields the following by-product: a quasi-polynomial time approximation scheme (QPTAS) for satisfiable dense Max 2-CSPs with better running time than the known algorithms.

at July 31, 2015 12:41 AM UTC

Authors: Lilla Tóthmérész
Download: PDF
Abstract: We define the analogue of linear equivalence of graph divisors for the rotor-router model, and use it to prove polynomial time computability of some problems related to rotor-routing. Using the connection between linear equivalence for chip-firing and for rotor-routing, we prove that the number of rotor-router unicycle-orbits equals the order of the Picard group. We also show that the rotor-router action of the Picard group on the set of spanning in-arborescences can be interpreted in terms of the linear equivalence.

at July 30, 2015 12:40 AM UTC

Authors: Pål Grønås Drange, Felix Reidl, Fernando Sánchez Villaamil, Somnath Sikdar
Download: PDF
Abstract: We study two clustering problems, Starforest Editing, the problem of adding and deleting edges to obtain a disjoint union of stars, and the generalization Bicluster Editing. We show that, in addition to being NP-hard, none of the problems can be solved in subexponential time unless the exponential time hypothesis fails.

Misra, Panolan, and Saurabh (MFCS 2013) argue that introducing a bound on the number of connected components in the solution should not make the problem easier: In particular, they argue that the subexponential time algorithm for editing to a fixed number of clusters (p-Cluster Editing) by Fomin et al. (J. Comput. Syst. Sci., 80(7) 2014) is an exception rather than the rule. Here, p is a secondary parameter, bounding the number of components in the solution.

However, upon bounding the number of stars or bicliques in the solution, we obtain algorithms which run in time $2^{5 \sqrt{pk}} + O(n+m)$ for p-Starforest Editing and $2^{O(p \sqrt{k} \log(pk))} + O(n+m)$ for p-Bicluster Editing. We obtain a similar result for the more general case of t-Partite p-Cluster Editing. This is subexponential in k for fixed number of clusters, since p is then considered a constant.

Our results even out the number of multivariate subexponential time algorithms and give reasons to believe that this area warrants further study.

at July 30, 2015 12:40 AM UTC

Authors: Donggu Kang, James Payor
Download: PDF
Abstract: We consider flow rounding: finding an integral flow from a fractional flow. Costed flow rounding asks that we find an integral flow with no worse cost. Randomized flow rounding requires we randomly find an integral flow such that the expected flow along each edge matches the fractional flow. Both problems are reduced to cycle canceling, for which we develop an $O(m \log(n^2/m))$ algorithm.

at July 30, 2015 12:41 AM UTC

Authors: Oswin Aichholzer, Vincent Kusters, Wolfgang Mulzer, Alexander Pilz, Manuel Wettstein
Download: PDF
Abstract: Given a set $P$ of $n$ labeled points in the plane, the radial system of $P$ describes, for each $p\in P$, the radial order of the other points around $p$. This notion is related to the order type of $P$, which describes the orientation (clockwise or counterclockwise) of every ordered triple of $P$. Given only the order type of $P$, it is easy to reconstruct the radial system of $P$, but the converse is not true. Aichholzer et al. ("Reconstructing Point Set Order Types from Radial Orderings", in Proc. ISAAC 2014) defined $T(R)$ to be the set of order types with radial system $R$ and showed that sometimes $|T(R)|=n-1$. They give polynomial-time algorithms to compute $T(R)$ when only given $R$.

We describe an optimal $O(kn^2)$ time algorithm for computing $T(R)$, where $k$ is the number of order types reported by the algorithm. The reporting relies on constructing the convex hulls of all possible point sets with the given radial system, after which sidedness queries on point triples can be answered in constant time. This set of convex hulls can be constructed in linear time. Our results generalize to abstract order types.

at July 30, 2015 12:00 AM UTC

We show that for any coprime $m,r$ there is a circuit of the form $\text{MOD}_m\circ \text{AND}_{d(n)}$ whose correlation with $\text{MOD}_r$ is at least $2^{-O\left( \frac{n}{d(n)} \right) }$. This is the first correlation lower bound for arbitrary $m,r$, whereas previously lower bounds were known for prime $m$. Our motivation is the question posed by Green et al. [11] to which the $2^{-O\left( \frac{n}{d(n)} \right) }$ bound is a partial negative answer. We first show a $2^{-\Omega(n)}$ correlation upper bound that implies a $2^{\Omega(n)}$ circuit size lower bound. Then, through a reduction we obtain a $2^{-O(\frac{n}{d(n)})}$ correlation lower bound. In fact, the $2^{\Omega(n)}$ size lower bound is for $\text{MAJ}\circ\text{ANY}_{o(n)}\circ\text{AND}\circ\text{MOD}_r\circ\text{AND}_{O(1)}$ circuits, which can be of independent interest.

at July 29, 2015 11:18 AM UTC

I was one of the organizers of the 2nd workshop on Fairness, Accuracy and Transparency in Machine Learning (FATML) at ICML 2015, and in my alternate career as moderator of data mining panels, I moderated the closing panel. The panelists were Fernando Diaz from MSR New York, Sorelle Friedler from Haverford College, Mykola Pechenizkiy from Eindhoven Instt. of Technology and Hanna Wallach from UMass-Amherst and MSR.

While my original intent was to do a review of the panel, it became clear that the panel discussion touched on themes that were bubbling up throughout the day. So what follows is organized by panel questions, but weaves in discussion from outside the panel as well.

This year's workshop, unlike the one at NIPS 2014, had a bit more of a technical focus: we had some of the early researchers in fairness work from Europe and Japan give talks on their work.  So as a counterweight, I thought I'd ask the panel to look beyond the presentations for the first question:
Question 1 
What is one thing you think we are missing (or should be paying more attention to) in our current discussions of fairness and discrimination  in terms of interaction with the social/policy/legal world OUTSIDE CS ?

Some themes that emerged:

...on the difference between Europe and the US

Mykola made an interesting observation based on his experience in educational data mining. Governments around Europe are very concerned about the use of student data (including health records and academic information) for any kind of data driven tools, and have severely regulated use of such data. As a result, the nascent field of educational data mining has been crippled by lack of access to data.

This is almost the opposite of the situation in the US, where data driven policy is the newest buzzword in town, and those of us who are interested in issues of fairness and transparency feel like we're constantly on the outside looking in (though attention to these issues is increasing).

...connecting to other communities

It's been clear from the beginning that discourse on fairness and transparency in machine learning must draw on the corresponding discourses in society at large. Which means that before we can start solving problems, we have to understand what the problems really are. This came through very strongly in the discussions. To paraphrase one of the panelists,  "Computer science likes to solve problems, and that's the problem !" (also known as  "slap a metric on it and optimize").

So what are the different communities we should be connecting to, and how ?

a) Connecting with social science

A major concern is the "prediction vs understanding" problem. For the most part, machine learning is about prediction: you classify, label, clustering, regress, rank and so on. But in the study of society and the dynamics of human interactions, the goal is not just to predict how humans might behave, but to understand their behavior. Which is to say, data analysis (or even fairness-aware analysis) has to be the first step in a larger conversation, rather than the last one.

While I don't think this issue is specific to fairness and transparency, it  plays a role in understanding the sources of inequality and discrimination. It's not enough to to detect examples of bias: what must happen next is an investigation of why the bias is happening.

(ed: personally, while I understand this concern, I don't think it's necessarily something computer scientists need to prioritize. This is after all what the social sciences do, and it doesn't make sense for us as computer scientists to merely try to acquire those skills. I think we need to be aware of the deeper issues of understanding a domain, but we also have strengths that we bring to the table and I'll say more about that later)

"galaxies don't care how they are studied, but people do"

Another point that was made over and over is that  issues of fairness and bias are not abstract: they affect actual people. Keeping the human in focus is important for the ethical underpinning of what we do, and even how we might design experiments.

b) connecting with journalists

Nick Diakopoulos gave a talk on "algorithmic accountability" in journalism. In addition to talking about what made research on fairness newsworthy:

  • discriminatory/unfair practices
  • mistakes that denies a service
  • censorship
  • activities that break the law or social norms
  • false prediction

he made the strong argument that (government) legitimacy comes from transparency, and talked about what that might entail in the age of data driven policy, including transparency involving data collection, the algorithms used, the inferences generated, and the humans involved in the process.

(ed: I don't think that our demands on transparency should be limited to government entities: the sad fact is that at least in the US, much of what would be considered basic internet infrastructure is controlled by private corporations, and they should be held to similar standards: if not for legitimacy, at least for fairness)

c) connecting with the law

Our fearless leader Solon Barocas made a number of interesting observations on the connection between algorithmic fairness  and the law, all the while disclaiming IANAL :). But his point (which he's made before) is worth repeating. One of the things that computer science can do well is  make precise concepts that might be defined vaguely or only indirectly through case law. And then we can get to work teasing out the relationships between different concepts (both abstractly and computationally). Indeed, the idea of a "reduction" between concepts in fairness might be one of the most useful things that computer science can uniquely contribute.

It's clear we're in a "let a thousand definitions bloom" phase in fairness research. And it's interesting to see the different reactions to this: on the social science side, there appears to be some nervousness that we're "playing games with math", but from Solon's comments this doesn't seem like a bad thing as long as we're also trying to connect the definitions together.

 Question 2 
In your view, what’s the next most pressing question we should be asking (limited to INSIDE CS to distinguish from the previous question) ?

...better definitions

It was very clear from the discussion that we need broader definitions of F-A-T beyond what's mathematically plausible. One particular example that's reminiscent of the metrics for privacy: There's a notion of "utility": how much can we make the data or the task "fair" without changing the underlying results produced by the "unfair" data/algorithm. The problem is that utility itself is not very well defined. Firstly, you might be benefiting from discriminatory policies, so your perceived "utility" itself is a problem. Trying to maintain this defeats the purpose of fairness. Secondly, even if this is not the case, the framing of the question as a tradeoff implies that these two notions are necessarily in opposition. That shortchanges the moral imperative of fairness and is different from the parallel situation in privacy. Finally, we measure utility in terms of classifier accuracy. But that's a very poor notion of overall task effectiveness. For example, is there a Bayesian perspective to bring to this ?

At any rate, since we are good at understanding tradeoffs in computer science, we should understand the different dimensions of the space of fairness preserving methods, rather than limiting ourselves to a one-dimensional false dichotomy of "fairness vs utility".

...better usable artifacts

Nick asked us the following question at the panel:

when a CEO or an agency head comes to us and asks "what should we do about this fairness stuff". what do we tell them ?

We didn't have a good response, and that was interesting. While we're beginning to explore the space of what's possible, we don't have clear examples of artifacts to hand over and say "use this".

As usual, the topic of benchmarking came up. I joke that when industry folks bring up the issue of benchmarking, I always ask "so where's the data" and they usually go very silent. But I do think there are useful data sets to be explored that come to us from the government. Some common data sets that get used are the US census data on salaries and a German data set on consumer credit. The entire data set from the Ricci court case is also available (even though it's tiny), and there are Bureau of Justice recidivism data sets to play with.

Of course this goes against the imperative coming from the social scientists to look at specific domains and ask meaningful questions in that domain. And I think we need to look more at the literature on fairness and bias over the decades and extract data that people have studied.

...better problems

For the most part, researchers have been considering binary classification as the suspect task. But of course there are much more general tasks that we could be considering: what about unsupervised learning ? what about structured prediction ? Is there a way to define fairness when you don't have a simple binary response variable and binary attributes ?

One final question I asked was this:
 Question 3 
do we have to solve the causality problem in order to talk about fairness ? 

This question was possibly not as well-posed as I would have liked, but it led to interesting discussions.

The law deals with intent, because the goal of the law is to assign responsibility. Algorithms are not agents and can't exhibit intent. Causality is a proxy for intent, in that if we can say that something caused something else, we can assign blame in a different way. In fact there were two talks at the workshop that talked about causality directly in the context of fairness.

But causality is a very hard problem. It's extremely subtle (if you doubt this, read through some of the examples Judea Pearl discusses in his book), and extremely controversial: different camps have their own view of how to mechanize causal inference, and the battles there make frequentists and Bayesians look like life-long friends.

In the discussion that followed, it became clear that there were really two ways of thinking about causality as it relates to fairness. The first way is to think about the underlying causal mechanisms that might lead to otherwise innocent features leading to biased outcomes: that is, how might zip code correlate with racial identity for example. The second way, which is closer to what I had in mind, is to think about the behavior of an algorithm causally: the use of these inputs or this algorithm *caused* a particular decision to be made. This second idea is not as far-fetched as it seems: some work in the database community has looked at trying to find which tuples "caused" a certain output to be generated from a query.

If you think it's not important to understand causality as it comes to automated methods, you might not want to drive a self-driving car or fly a plane. But as Solon suggested in the discussion, one way of getting around causality is to think about negligence with respect to algorithms: can we design reasonable best practices for predictive tools and argue that a failure to use these methods is negligence ? The legal ramifications of these idea have been explored in the context of robotics (article, and response) but more work is yet to be done.

...back to narratives

Another comment by Nick D, again connecting to the journalism perspective: narratives and story telling are a powerful way to explain the results of data mining. I haven't talked much about interpretability, which is an important part of the larger discussion of transparency and accountability. But one way to communicate the results of (say) a fairness audit would be to provide a human-interpretable linkage between the problematic attributes being used for prediction and the protected attribute. For more on this, see Michael Nielsen's very timely new Quanta article on machine-generated explanations.

It's clear from all the discussion that there's a lot of work to be done and a small but active community of people interested in pushing these issues forward. Fairness, and algorithmic bias, are hot topics in the news nowadays, and it's a good time to take advantage of this burst of interest.

by Suresh Venkatasubramanian ( at July 29, 2015 05:57 AM UTC

In the last post I had the following scenario:

Larry, Moe, and Curly are on Jeopardy.

Going into Final Jeopardy:

Larry has $50,000, Moe has $10,000, Curly has $10,000

Larry bets $29,999, Moe bets $10,000, Curly bets $10,000

These bets are ALL RATIONAL and ALL MATTER independent of what the category is. For example, these bets make sense whether the category is THE THREE STOOGES or CIRCUIT LOWER BOUNDS.

Explain why this is.

EXPLANATION: You were probably thinking of ordinary Jeopardy where the winner gets whatever he gets, and the losers take-home is based ONLY on their rank (2000 for second place, 1000 for first place). Hence Larry's bet seems risky since he may lose 29,999 and Moe and Curly's bets seem irrelevant (or barely relelvent- they both want to finish in second)

BUT- these are Larry, Moe, Curly, The Three Stooges. This is CELEBRITY JEOPARDY! The rules for money are different. First place gets MAX of what he wins, and 50,000. So Larry has NOTHING TO LOSE by betting 29,999.  Second and Third place BOTH get MAX of what they win and 10,000. So Moe and Curly have NOTHING TO LOSE by betting 10,000. (I suspect they do this because the money goes to a charity chosen by the celebrity).

SIDE NOTE: I saw Celebrity Jeopardy and wanted to verify the above before posting. So I looked on the web for the rules for Celebrity Jeopardy. THEY WERE NO WHERE TO BE FOUND! A friend of mine finally found a very brief you-tube clip of Penn Jillette wining  Celeb Jeopardy and a VERY BRIEF look at the final scores and how much money everyone actually got. Thats how I verified what I thought were the rules for celebrity jeopardy.

IF I am looking up a theorem in Recursive Ramsey theory and can't find it on the web I am NOT surprised at all since that would be somewhat obscure (9 times out of 10 when I look up something in Ramsey Theory it points to one of the Ramsey Theory Websites that I maintain. Usually is there!). But the rules for final Jeopardy -- I would think that is not so obscure. Rather surprised it was not on the web.

by GASARCH ( at July 29, 2015 03:44 AM UTC

Authors: N. S. Narayanaswamy, C. S. Rahul
Download: PDF
Abstract: Given an undirected graph G = (V, E) with n vertices, and a function f : V -> N, we consider the problem of finding a connected f -factor in G. In this work we design an algorithm to check for the existence of a connected f -factor, for the case where f (v) >= n/g(n), for all v in V and g(n) is polylogarithmic in n. The running time of our algorithm is O(n^{2g(n)}. As a consequence of this algorithm we conclude that the complexity of connected f -factor for the case we consider is unlikely to be NP-Complete unless the Exponential Time Hypothesis (ETH) is false. Secondly, under the assumption of the ETH, we show that it is also unlikely to be in P for g(n) in O((log n)^{1+eps} ) for any eps> 0. Therefore, our results show that for all eps> 0, connected f -factor for f (v) >= n/O(log n)^{1+eps}) is in NP-Intermediate unless the ETH is false. Further, for any constant c > 0, when g(n) = c, our algorithm for connected f -factor runs in polynomial time. Finally, we extend our algorithm to compute a minimum weight connected f -factor in edge weighted graphs in the same asymptotic time bounds.

at July 29, 2015 12:40 AM UTC

Authors: Eric J. Friedman, Adam S. Landsberg
Download: PDF
Abstract: We present an iterative algorithm for solving a class of \\nonlinear Laplacian system of equations in $\tilde{O}(k^2m \log(kn/\epsilon))$ iterations, where $k$ is a measure of nonlinearity, $n$ is the number of variables, $m$ is the number of nonzero entries in the graph Laplacian $L$, $\epsilon$ is the solution accuracy and $\tilde{O}()$ neglects (non-leading) logarithmic terms. This algorithm is a natural nonlinear extension of the one by of Kelner et. al., which solves a linear Laplacian system of equations in nearly linear time. Unlike the linear case, in the nonlinear case each iteration takes $\tilde{O}(n)$ time so the total running time is $\tilde{O}(k^2mn \log(kn/\epsilon))$. For sparse graphs where $m = O(n)$ and fixed $k$ this nonlinear algorithm is $\tilde{O}(n^2 \log(n/\epsilon))$ which is slightly faster than standard methods for solving linear equations, which require approximately $O(n^{2.38})$ time. Our analysis relies on the construction of a nonlinear "energy function" and a nonlinear extension of the duality analysis of Kelner et. al to the nonlinear case without any explicit references to spectral analysis or electrical flows. These new insights and results provide tools for more general extensions to spectral theory and nonlinear applications.

at July 29, 2015 12:41 AM UTC

Authors: Konrad Simon, Sameer Sheorey, David Jacobs, Ronen Basri
Download: PDF
Abstract: We suggest a novel shape matching algorithm for three-dimensional surface meshes of disk or sphere topology. The method is based on the physical theory of nonlinear elasticity and can hence handle large rotations and deformations. Deformation boundary conditions that supplement the underlying equations are usually unknown. Given an initial guess, these are optimized such that the mechanical boundary forces that are responsible for the deformation are of a simple nature. We show a heuristic way to approximate the nonlinear optimization problem by a sequence of convex problems using finite elements. The deformation cost, i.e, the forces, is measured on a coarse scale while ICP-like matching is done on the fine scale. We demonstrate the plausibility of our algorithm on examples taken from different datasets.

at July 29, 2015 12:42 AM UTC

Authors: Matúš Mihalák, Przemysław Uznański, Pencho Yordanov
Download: PDF
Abstract: We study the problem of enumerating all rooted directed spanning trees (arborescences) of a directed graph (digraph) $G=(V,E)$ of $n$ vertices. An arborescence $A$ consisting of edges $e_1,\ldots,e_{n-1}$ can be represented as a monomial $e_1\cdot e_2 \cdots e_{n-1}$ in variables $e \in E$. All arborescences $\mathsf{arb}(G)$ of a digraph then define the Kirchhoff polynomial $\sum_{A \in \mathsf{arb}(G)} \prod_{e\in A} e$. We show how to compute a compact representation of the Kirchhoff polynomial -- its prime factorization, and how it relates to combinatorial properties of digraphs such as strong connectivity and vertex domination. In particular, we provide digraph decomposition rules that correspond to factorization steps of the polynomial, and also give necessary and sufficient primality conditions of the resulting factors expressed by connectivity properties of the corresponding decomposed components. Thereby, we obtain a linear time algorithm for decomposing a digraph into components corresponding to factors of the initial polynomial, and a guarantee that no finer factorization is possible. The decomposition serves as a starting point for a recursive deletion-contraction algorithm, and also as a preprocessing phase for iterative enumeration algorithms. Both approaches produce a compressed output and retain some structural properties in the resulting polynomial. This proves advantageous in practical applications such as calculating steady states on digraphs governed by Laplacian dynamics, or computing the greatest common divisor of Kirchhoff polynomials. Finally, we initiate the study of a class of digraphs which allow for a practical enumeration of arborescences. Using our decomposition rules we observe that various digraphs from real-world applications fall into this class or are structurally similar to it.

at July 29, 2015 12:41 AM UTC

Authors: Kenneth Lange, Kevin L. Keys
Download: PDF
Abstract: The MM principle is a device for creating optimization algorithms satisfying the ascent or descent property. The current survey emphasizes the role of the MM principle in nonlinear programming. For smooth functions, one can construct an adaptive interior point method based on scaled Bregmann barriers. This algorithm does not follow the central path. For convex programming subject to nonsmooth constraints, one can combine an exact penalty method with distance majorization to create versatile algorithms that are effective even in discrete optimization. These proximal distance algorithms are highly modular and reduce to set projections and proximal mappings, both very well-understood techniques in optimization. We illustrate the possibilities in linear programming, binary piecewise-linear programming, nonnegative quadratic programming, $\ell_0$ regression, matrix completion, and inverse sparse covariance estimation.

at July 29, 2015 12:41 AM UTC

Authors: Shant Boodaghians, Adrian Vetta
Download: PDF
Abstract: Given a consumer data-set, the axioms of revealed preference proffer a binary test for rational behaviour. A natural (non-binary) measure of the degree of rationality exhibited by the consumer is the minimum number of data points whose removal induces a rationalisable data-set. We study the computational complexity of the resultant consumer rationality problem in this paper. We explain how to formulate this problem in terms of a directed revealed preference graph and show, for markets with a large number of commodities, that it is equivalent (in terms of approximation) to the directed feedback vertex set problem. Our main result is to obtain an exact threshold on the number of commodities that separates easy cases and hard cases. Specifically, for two-commodity markets the consumer rationality problem is polynomial time solvable; we prove this via a reduction to the vertex cover problem on perfect graphs. For three-commodity markets, however, the problem is NP-complete; we prove this using a reduction from planar 3-sat that is based upon oriented-disc drawings.

at July 29, 2015 12:40 AM UTC

The summer edition of SIGecom Exchanges has a new feature: an article profiling the CS-Econ 2016 junior job market candidates (cf. the call for profiles). The goal of this article is to (a) reduce the information inefficiency of the market and (b) present the candidates as a cohesive community. For best results, use the keyword index at the end of the article. The most obvious statistic to report: There are 30 job market candidates for 2016! Many thanks are due to Vasilis Gkatzelis for his efforts in making this happen.

Wordle created from candidate provided keywords.

by Jason Hartline at July 28, 2015 03:28 PM UTC

A new kind of ‘liar’ puzzle using Freestyle chess

By permission of Vierrae (Katerina Suvorova), source

Raymond Smullyan is probably the world’s greatest expert on the logic of lying and the logic of chess. He is still writing books well into his 10th decade. Last year he published a new textbook, A Beginner’s Guide to Mathematical Logic, and in 2013 a new puzzle book named for Kurt Gödel. His 1979 book The Chess Mysteries of Sherlock Holmes introduced retrograde analysis—taking back moves from positions to tell how they could possibly have arisen—to a wide public.

Today Ken and I wish to talk about whether we can ever play perfect chess—or at least better chess than any one chess program—by combining output from multiple programs that sometimes might “lie.”

We will start with Smullyan-style puzzles today, but they are prompted by an amazing and serious fact. Even though human players have been outclassed by computers for over a decade, humans judging between multiple programs have been able to beat those programs playing alone. This happens even when the human player is far from being a master—someone who would get crushed in minutes by programs available on smartphones. We want to know, how can this be?

By coincidence, yesterday’s New York Times Magazine has a feature on Terry Tao that likens discovering and proving theorems to “playing chess with the devil”—quoting Charles Fefferman:

The devil is vastly superior at chess, but […] you may take back as many moves as you like, and the devil may not. … If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but—aha!—you have your first clue.

On this blog we have previously likened perfect programs to “playing chess against God”—this was quoting Ken Thompson about endgames where perfect tables have been computed. Since the programs we consider here occasionally err—one can say lie—we will reserve the “devil” term in yesterday’s Times for them.

Freestyle Chess

We, that is I and my Tech colleague Dr. Kathryn Farley, just visited Ken at his home base in Buffalo and had a great time. Of course the weather up there is near perfect this time of year and his family was wonderful to us. Plus we got to visit Wegmans—our pick for the greatest supermarket chain in the world.

One afternoon I was honored to sit in on a video conference in which Ken presented some of his research on using computer programs to evaluate the play of humans. Joining him via the Internet were two experts in so-called freestyle chess where humans are allowed access to multiple chess programs during a game. One-on-one the programs totally dominate the humans—even on laptops programs such as Stockfish and Komodo have Elo ratings well above 3100 whereas the best humans struggle to reach even 2900—but the human+computer “Centaurs” had better results than the computers alone. In the audience were representatives of defense and industrial systems that involve humans and computers.

Ken got into freestyle chess not as a player but because of his work on chess cheating—see this for example. Freestyle chess says “go ahead and cheat, and let’s see what happens…” The audience was not interested in cheating but rather in how combining humans and computers changes the game. While chess programs are extremely strong players, they may have weaknesses that humans can help avoid. Thus, the whole point of freestyle chess is:

Are humans + computers > computers alone?

That is the central question. Taken out of the chess context it becomes a vital question as computers move more and more into our jobs and our control systems. The chess context attracts interest because it involves extreme performance that can can be precisely quantified, and at least until recently, the answer has been a clear “Yes.”

At the beginning of the video conference Ken spoke about the history of computer chess, giving his usual clear and precise presentation, and then reviewed his joint paper showing the good results for human-computer teams were no fluke—they really made better moves. Ken used some of the slides from the end of his TEDxBuffalo talk. Then two Freestyle experts spoke, including a winner of three tournaments who doesn’t even have a human chess rating, and an interesting discussion followed on how they actually used the computer chess programs.

I must admit at first I was a skeptic: how could weak players, humans, help strong players, computers? Part of it was that when two or more programs disagreed on the best move the human could make the choice. This kind-of means saying the programs whose move you don’t choose are wrong.

As Ken and I mulled over the idea of freestyle chess we realized it raises some interesting puzzles. I wrote a first draft, then Ken took over adding more detail and tricks to what follows. Let’s take a look at the puzzles now.

Puzzle I

Suppose Alice has one program {X} that is perfect except that there is one position {Q} in which {X} makes a error. To simplify, let’s suppose the only outcomes are win ({W}) or loss ({L}). An error means that {Q} is a winning position for the player to move—it could be Bob not Alice—but {X} chooses a move {m_1} leading to a position {R} that is losing for that player.

Let Alice start in a position {P}. She wants to play perfect chess using {X} as guide. Can she do so? The bad position {Q} might be reached in a game played from {P}, indeed might be {P} itself.

Alice of course cannot tell by herself whether a given position has value {W} or {L}, unless the position is right near the end of a game. But she has one advantage that {X} lacks. She can play {X} against itself from any position {R}. If {R} is beyond {Q}—at least if {Q} is not reachable in a correctly-played sequence of moves from {R}—then Alice will get the correct value {V(R)} of {R}.

This is like the power of a human centaur to try a program deeper on the moves it is suggesting. In symbols, Alice executes {X^*(R)}, which generates a game sequence

\displaystyle  G_R = (R,R_1,R_2,\dots,R_n)

of positions where {R_n} is checkmate for one side. The cost of this is {\Theta(n)}. You might think we could let {X} do the same thing, but {X} is not like “Satan” in Smullyan’s story, “Satan, Cantor, and Infinity.” {X} is not trying to out-psych Alice or correct itself; {X} is just given as-is and by-hook-or-by-crook makes that error in some position {Q}.

So Puzzle I is:

Can the “centaur” Alice + X play perfectly, even though neither Alice nor X plays perfectly alone? And at what cost compared to X?

Answer to Puzzle I

The answer is, yes she can. Let the legal moves in {P} be {m_1,\dots,m_\ell} going to positions {Q_1,\dots,Q_{\ell}} with {m_1} the move recommended by {X}. Her algorithm exemplifies a key idea that bridges to the more interesting puzzles.

  • Alice runs {X^*(Q_1),\dots,X^*(Q_{\ell})}.

  • If {X^*(Q_1) = W}, or if all values are {L}, then Alice plays {m_1}.

  • If {X^*(Q_1) = L} and {X^*(Q_i) = W} for some other {i} then Alice plays {m_i}.

We claim this algorithm, whose running time {O(n\ell)} we count as linear in {n}, plays perfectly in any {P}. If all values are {L} but {m_1} is wrong then some other {Q_i} is really a winning position. But then {X} is wrong both at {P} and at some position in the game from {Q_i} which is a contradiction. If {X^*(Q_1) = W} but {m_1} is wrong anyway then {X} is wrong both at {P} and somewhere at {Q_1} or beyond.

Finally if the “switch” {m_i} is wrong then {X} errs somewhere beyond {Q_i} and {X} either errs beyond {Q_1} or {X} erred by choosing {m_1} in {P} after all. (If {X^*(Q_i)} is wrong and all other {Q_j} are losing then {V_P = L} and so {m_i} wasn’t a mistake by Alice since it didn’t matter.)

There is one loophole that needs attention. {X^*(Q_1)} and {X^*(Q_i)} could both be wrong because their games go to a common position {R} in which {X} makes an error. However, that lone error cannot simultaneously flip a true {W} to {L} in {Q_1} and a true {L} to {W} in {Q_i}, because {Q_1} and {Q_i} have the same player (Bob) to move. There is also the possibility that play from {Q_i} could go through {Q_1} (perhaps via {P)} or through some other {Q_i}, which we leave you to analyze. We intend to rule out the latter, and we could also rule out the former by insisting that the “game tree” from positions {P} really be a tree not a DAG.

Boiled down, the idea is that {X^*(Q_1) = L \;\wedge\; X^*(Q_i) = W} where {X} goes to {Q_1} is an “error signature” that Alice can recognize. If {X} errs at {P} then the signature definitely happens because {X} is perfect at each {Q_i} and one of the {Q_i} must be winning. If the signature happens yet {X} did not err at {P} then {P} must be a losing position. Hence once the signature happens then Alice can trust {X} completely. The only way the error could possibly lie in her future is if she was losing at {P} but lucks into a winning position—but then the error happened on Bob’s move not hers.

We claim also that all this logic is unaffected if “draw” is a possible outcome. Indeed, we could play with the old Arabian rules that giving stalemate is a win—counting it 0.8 points say—and that occupying the center with one’s king after leaving the opponent with a bare king is a win—worth say 0.7 points. It all works with “{W}” being the true value of position {P} (aside from a total loss) and “{L}” being any inferior value.

Puzzle II

Now suppose {X} is allowed to make errors in two positions. Can Alice modify her algorithm to still play perfectly?

First suppose the errors are related. Call two positions {Q,R} related if {R} can occur in a game played from {Q} where all intervening moves are not errors. Per above we will always suppose that two positions {Q_i,Q_j} reached in one move from {P} are unrelated (else we would say the two options {m_i,m_j} are not fully distinct). Related errors are ones that occur in related positions.

If Alice knows this about {X}, then we claim she can solve Puzzle II. She plays the game {X^*(P)} through {Q_1} as before, but now she looks for the error signature at all nodes in the game. If she never finds it then she plays {m_1}. If she does, then she lets {R} be the last position at which it occurs. Then she knows that either {X} errs at {R} or {X} errs somewhere beyond {R}. Either way, she can use this knowledge to whittle down the possibilities at {P}. Or at least we think she can.

Notice, however, what has happened to Alice’s complexity. She is now running {X^*} at every node in a length-{n} game path. Her time is now quadratic in {n}. This is still not terrible, not an exponential blowup of backtracking. But in honor of what Alberto Apostolico cared about, we should care about it here. So there is really a second puzzle: can Alice play perfectly in time {O(n)}?

If the errors are unrelated then we would like Alice to carry out the same algorithm as for Puzzle I. The logic is not airtight, however, because of the case where there were unrelated errors at {Q_1} and {Q_i}. Worse, what if Alice doesn’t know whether the errors are related?

Puzzle III: Two Programs

Here comes the “Freestyle” idea of using multiple programs. Let us have two programs, {X} and {Y}. Suppose one of them can make up to {2} errors but the other is perfect. Alice does not know which is which. Now can she play perfectly—in {O(n)} time?

If the errors are related then she can localize to {P} and have the same linear time complexity as before. For simplicity let’s suppose there are just {2} legal moves, i.e., {\ell = 2}. Here is her algorithm:

  • Run both {X^*} and {Y^*} on the positions {Q_1} and {Q_2}, getting two pairs {(V^X(Q_1),V^X(Q_2))} and {(V^Y(Q_1),V^Y(Q_2))} of values.

  • If the values for {Q_1} are both {W} then one of them comes from the perfect program, so Alice plays to {Q_1}. Likewise if they are both {W} for {Q_2} then Alice goes to {Q_2.}

  • If the values are both {L} for {Q_1} then again by perfection the true value of {Q_1} is {L}, so Alice cannot err by going to {Q_2}. If they are both {L} for {Q_2} she goes to {Q_1}.

The remaining case is that one pair is {(W,L)} and the other is {(L,W)}. This cannot happen, because it means that one of the programs is making two unrelated errors.

This is the idea Dick originally had after the videoconference on Freestyle chess. It shows the advantage of using multiple programs to check on each other like the centaur players do. But what happens if the errors are unrelated? Call that Puzzle III.

Puzzles IV and V

Now let’s allow both programs {X} and {Y} to make up to {k} errors. Can Alice still play perfectly? We venture yes, but we hesitate to make this a formal claim because Puzzles II and III are already proving harder than expected.

How much does having a third program {Z} that is perfect help?—of course not knowing which of the three programs is perfect. If instead {Z} too can make up to {k} errors, how much worse is that? Even if Alice can still play perfectly in polynomial time, we wonder if the exponent of {n} will depend on {k}. Call all of this Puzzle IV.

We can add a further wrinkle that matters even for {k = 1}: we can consider related errors to be just one error. This makes sense in chess terms because an error in a position {R} that is reachable from a position {Q} can affect the search by a program {X} in {Q}. Thus the error at {R} knocks-on and makes the play at all nodes between {R} and the root unreliable. Let {E} be the set of all positions at which {X} makes errors. Then we can define {k(E)} to be the minimum {k} such that there are positions {R_1,\dots,R_k \in E} such that {E} is contained in the union of the positions from which some {R_j} is reachable. This is well-defined even when the positions form a DAG not a tree.

Thus programs could err in multiple positions but still count as having a single branch error if those positions are all on the same branch. Anything with branch errors counts as Puzzle V. This is where our error model is starting to get realistic, but as we often find in theory, there is a lot of already-challenging ground to cover before we get there. It is time to call it a day—or a post.

Open Problems

Our puzzles have some of Smullyan’s flavor. In a typical logic puzzle of his, Alice would be confronted by {X} and {Y} that have different behaviors in telling the truth to arbitrary questions. The solutions in his case rely on the ability to ask questions like:

If you are a person of type … , then what would you say to the question … ?

Our situations seem different, but perhaps there are further connections between our puzzles and Smullyan’s. What do you think?

Can you solve the puzzles of kinds II or III or higher? If you or we find a clear principle behind them then this will go into a followup post.

Update (7/31/15) The artist Vierrae, Katerina Suvorova of Russia, has graciously contributed two new portraits of Smullyan in oil. I have used her new version of Smullyan in a ‘Magus’ robe at the top. Here is her portrait of him in more formal wear, as if he were a dinner guest at an Oxford High Table.


The originals in higher resolution are viewable on her DeviantArt page . Our great thanks to her.

by Pip at July 28, 2015 05:35 AM UTC

Authors: Gorav Jindal, Pavel Kolev
Download: PDF
Abstract: We propose the first unified framework that yields an improved sparsification algorithm for computing in nearly linear time a spectral sparsifier $D-\widehat{M}_{N}$ for any matrix-polynomial of the form \[ D-\widehat{M}_{N}\approx_{\epsilon}D-D\sum_{i=0}^{N}\gamma_{i}(D^{-1} M)^i \] where $B=D-M\in\mathbb{R}^{n\times n}$ is either Laplacian or SDDM matrix with $m_B$ non-zero entries, and the vector $\gamma$ is induced by a mixture of discrete Binomial distributions (MDBD).

Our sparsification algorithm runs in time $\widetilde{O}(\epsilon^{-2}m_B\log^{3}n\cdot\log^{2}N+\epsilon^{-4}\cdot nN\cdot\log^{4}n\cdot\log^{5}N)$ and exhibits a significant improvement over the result of Cheng et al. [CCLPT15] whose running time is $\widetilde{O}(\epsilon^{-2}\cdot m_B N^2\cdot\mathrm{poly}(\log n))$. Although their algorithm handles general probability distributions, we show that MDBD approximate large class of continuous probability distributions. Moreover, we propose a nearly linear time algorithm that recovers exactly the coefficients of any convex combination of $N+1$ distinct Binomial distributions (MDBD), given as input the target probability distribution $\gamma$ that is induced by a MDBD.

In addition, we show that a simple preprocessing step speeds up the run time of Spielman and Peng's [PS14] SDDM Solver from $\widetilde{O}(m_B\log^{3}n\cdot\log^{2}\kappa_B)$ to $\widetilde{O}(m_{B}\log^{2}n + n\log^{4}n\cdot\log^{5}\kappa_B)$, where $\kappa_B$ is the condition number of the input SDDM matrix $B$.

When $B$ is a Laplacian matrix, our result leads to a faster algorithm for analyzing the long term behaviour of random walks induced by transition matrices of the form $\sum_{i=0}^{N}\gamma_{i}(D^{-1} A)^i$. We hope that our sparsification algorithm for SDDM matrices can be useful in the context of Machine Learning.

at July 28, 2015 12:41 AM UTC

Authors: David A. Meyer, Asif Shakeel
Download: PDF
Abstract: We define a Hidden Markov Model (HMM) in which each hidden state has time-dependent $\textit{activity levels}$ that drive transitions and emissions, and show how to estimate its parameters. Our construction is motivated by the problem of inferring human mobility on sub-daily time scales from, for example, mobile phone records.

at July 28, 2015 12:41 AM UTC

Authors: Tim Oosterwijk
Download: PDF
Abstract: Set packing is a fundamental problem that generalises some well-known combinatorial optimization problems and knows a lot of applications. It is equivalent to hypergraph matching and it is strongly related to the maximum independent set problem. In this thesis we study the k-set packing problem where given a universe U and a collection C of subsets over U, each of cardinality k, one needs to ?find the maximum collection of mutually disjoint subsets. Local search techniques have proved to be successful in the search for approximation algorithms, both for the unweighted and the weighted version of the problem where every subset in C is associated with a weight and the objective is to maximise the sum of the weights. We make a survey of these approaches and give some background and intuition behind them. In particular, we simplify the algebraic proof of the main lemma for the currently best weighted approximation algorithm of Berman ([Ber00]) into a proof that reveals more intuition on what is really happening behind the math. The main result is a new bound of k/3 + 1 + epsilon on the integrality gap for a polynomially sized LP relaxation for k-set packing by Chan and Lau ([CL10]) and the natural SDP relaxation [NOTE: see page iii]. We provide detailed proofs of lemmas needed to prove this new bound and treat some background on related topics like semide?nite programming and the Lovasz Theta function. Finally we have an extended discussion in which we suggest some possibilities for future research. We discuss how the current results from the weighted approximation algorithms and the LP and SDP relaxations might be improved, the strong relation between set packing and the independent set problem and the diff?erence between the weighted and the unweighted version of the problem.

at July 28, 2015 12:40 AM UTC

Authors: Antares Chen, David G. Harris, Aravind Srinivasan
Download: PDF
Abstract: We consider positive covering integer programs, which generalize set cover and which have attracted a long line of research developing (randomized) approximation algorithms. Srinivasan (2006) gave a rounding algorithm based on the FKG inequality for systems which are "column-sparse." This algorithm may return an integer solution in which the variables get assigned large (integral) values; Kolliopoulos & Young (2005) modified this algorithm to limit the solution size, at the cost of a worse approximation ratio. We develop a new rounding scheme based on the Partial Resampling variant of the Lov\'{a}sz Local Lemma developed by Harris \& Srinivasan (2013). This achieves an approximation ratio of $1 + \frac{\ln (\Delta_1+1)}{a} + O(\sqrt{ \frac{\log (\Delta_1+1)}{a}} )$, where $a$ is the minimum covering constraint and $\Delta_1$ is the maximum $\ell_1$-norm of any column of the covering matrix (whose entries are scaled to lie in $[0,1]$); we also show nearly-matching inapproximability and integrality-gap lower bounds.

Our approach improves asymptotically, in several different ways, over known results. First, it replaces $\Delta_0$, the maximum number of nonzeroes in any column (from the result of Srinivasan) by $\Delta_1$ which is always -- and can be much -- smaller than $\Delta_0$; this is the first such result in this context. Second, our algorithm automatically handles multi-criteria programs; we achieve improved approximation ratios compared to the algorithm of Srinivasan, and give, for the first time when the number of objective functions is large, polynomial-time algorithms with good multi-criteria approximations. We also significantly improve upon the upper-bounds of Kolliopoulos & Young when the integer variables are required to be within $(1 + \epsilon)$ of some given upper-bounds, and show nearly-matching inapproximability.

at July 28, 2015 12:41 AM UTC

Authors: Chien-Chung Huang, Sebastian Ott
Download: PDF
Abstract: Makespan minimization in restricted assignment $(R|p_{ij}\in \{p_j, \infty\}|C_{\max})$ is a classical problem in the field of machine scheduling. In a landmark paper in 1990 [7], Lenstra, Shmoys, and Tardos gave a 2-approximation algorithm and proved that the problem cannot be approximated within 1.5 unless P=NP. The upper and lower bounds of the problem have been essentially unimproved in the intervening 25 years, despite several remarkable successful attempts in some special cases of the problem [1,3,10] recently.

In this paper, we consider a special case called graph-balancing with light hyper edges: jobs with weights in the range of $(\beta W, W]$ can be assigned to at most two machines, where $4/7\leq \beta < 1$, and $W$ is the largest job weight, while there is no restriction on the number of machines a job can be assigned to if its weight is in the range of $(0,\beta W]$. Our main contribution is a $5/3+\beta/3$ approximation algorithm, thus breaking the barrier of 2 in this special case. Unlike the several recent works [1,3,10] that make use of (configuration)-LPs, our algorithm is purely combinatorial.

In addition, we consider another special case where jobs have only two possible weights $\{w, W\}$, and $w < W$. Jobs with weight $W$ can be assigned to only two machines, while there is no restriction on the others. For this special case, we give a 1.5-approximation algorithm, thus matching the general lower bound (indeed the current 1.5 lower bound is established in an even more restricted setting). Interestingly, depending on the specific values of $w$ and $W$, sometimes our algorithm guarantees sub-1.5 approximation ratios.

at July 28, 2015 12:43 AM UTC

Authors: Nader H. Bshouty, Ariel Gabizon
Download: PDF
Abstract: Roughly speaking, an $(n,(r,s))$-Cover Free Family (CFF) is a small set of $n$-bit strings such that: "in any $d:=r+s$ indices we see all patterns of weight $r$". CFFs have been of interest for a long time both in discrete mathematics as part of block design theory, and in theoretical computer science where they have found a variety of applications, for example, in parametrized algorithms where they were introduced in the recent breakthrough work of Fomin, Lokshtanov and Saurabh under the name `lopsided universal sets'.

In this paper we give the first explicit construction of cover-free families of optimal size up to lower order multiplicative terms, {for any $r$ and $s$}. In fact, our construction time is almost linear in the size of the family. Before our work, such a result existed only for $r=d^{o(1)}$. and $r= \omega(d/(\log\log d\log\log\log d))$. As a sample application, we improve the running times of parameterized algorithms from the recent work of Gabizon, Lokshtanov and Pilipczuk.

at July 28, 2015 12:42 AM UTC

Authors: Kenji Kume, Naoko Nose-Togawa
Download: PDF
Abstract: Singular spectrum analysis (SSA) is a nonparametric and adaptive spectral decomposition of a time series. The singular value decomposition of the trajectory matrix and the anti-diagonal averaging leads to a time-series decomposition. In this algorithm, a single free parameter, window length $K$, is involved which is the FIR filter length for the time series. There are no generally accepted criterion for the proper choice of the window length $K$. Moreover, the proper window length depends on the specific problem which we are interested in. Thus, it is important to monitor the spectral structure of the SSA decomposition and its window length dependence in detail for the practical application. In this paper, based on the filtering interpretation of SSA, it is shown that the decomposition of the power spectrum for the original time series is possible with the filters constructed from the eigenvectors of the lagged-covariance matrix. With this, we can obtain insights into the spectral structure of the SSA decomposition and it helps us for the proper choice of the window length in the practical application of SSA.

at July 28, 2015 12:42 AM UTC

Authors: Shahar Dobzinski, Ami Mor
Download: PDF
Abstract: The problem of maximizing a non-negative submodular function was introduced by Feige, Mirrokni, and Vondrak [FOCS'07] who provided a deterministic local-search based algorithm that guarantees an approximation ratio of $\frac 1 3$, as well as a randomized $\frac 2 5$-approximation algorithm. An extensive line of research followed and various algorithms with improving approximation ratios were developed, all of them are randomized. Finally, Buchbinder et al. [FOCS'12] presented a randomized $\frac 1 2$-approximation algorithm, which is the best possible.

This paper gives the first deterministic algorithm for maximizing a non-negative submodular function that achieves an approximation ratio better than $\frac 1 3$. The approximation ratio of our algorithm is $\frac 2 5$. Our algorithm is based on recursive composition of solutions obtained by the local search algorithm of Feige et al. We show that the $\frac 2 5$ approximation ratio can be guaranteed when the recursion depth is $2$, and leave open the question of whether the approximation ratio improves as the recursion depth increases.

at July 28, 2015 12:41 AM UTC

Authors: Yitong Yin, Chihao Zhang
Download: PDF
Abstract: We propose a notion of contraction function for a family of graphs and establish its connection to the strong spatial mixing for spin systems. More specifically, we show that for anti-ferromagnetic Potts model on families of graphs characterized by a specific contraction function, the model exhibits strong spatial mixing, and if further the graphs exhibit certain local sparsity which are very natural and easy to satisfy by typical sparse graphs, then we also have FPTAS for computing the partition function.

This new characterization of strong spatial mixing of multi-spin system does not require maximum degree of the graphs to be bounded, but instead it relates the decay of correlation of the model to a notion of effective average degree measured by the contraction of a function on the family of graphs. It also generalizes other notion of effective average degree which may determine the strong spatial mixing, such as the connective constant, whose connection to strong spatial mixing is only known for very simple models and is not extendable to general spin systems.

As direct consequences: (1) we obtain FPTAS for the partition function of $q$-state anti-ferromagnetic Potts model with activity $0\le\beta<1$ on graphs of maximum degree bounded by $d$ when $q> 3(1-\beta)d+1$, improving the previous best bound $\beta> 3(1-\beta)d$ and asymptotically approaching the inapproximability threshold $q=(1-\beta)d$, and (2) we obtain an efficient sampler (in the same sense of fully polynomial-time almost uniform sampler, FPAUS) for the Potts model on Erd\H{o}s-R\'enyi random graph $\mathcal{G}(n,d/n)$ with sufficiently large constant $d$, provided that $q> 3(1-\beta)d+4$. In particular when $\beta=0$, the sampler becomes an FPAUS for for proper $q$-coloring in $\mathcal{G}(n,d/n)$ with $q> 3d+4$, improving the current best bound $q> 5.5d$ for FPAUS for $q$-coloring in $\mathcal{G}(n,d/n)$.

at July 28, 2015 12:41 AM UTC

Authors: Anke van Zuylen
Download: PDF
Abstract: We show improved approximation guarantees for the traveling salesman problem on cubic graphs, and cubic bipartite graphs. For cubic bipartite graphs with n nodes, we improve on recent results of Karp and Ravi (2014) by giving a simple "local improvement" algorithm that finds a tour of length at most 5/4 n - 2. For 2-connected cubic graphs, we show that the techniques of Moemke and Svensson (2011) can be combined with the techniques of Correa, Larre and Soto (2012), to obtain a tour of length at most (4/3-1/8754)n.

at July 28, 2015 12:42 AM UTC

Authors: Djamal Belazzougui, Simon J. Puglisi
Download: PDF
Abstract: The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of size $\sigma$ can be computed in $O(n\log\log\sigma)$ time ($O(n)$ time if we allow randomization) using $O(n\log\sigma)$ bits of working space; that is, using space proportional to that of the input string in bits. The previous fastest algorithm using $O(n\log\sigma)$ space takes $O(n(\log\sigma+\log\log n))$ time. We also consider the important rightmost variant of the problem, where the goal is to associate with each phrase of the parsing its most recent occurrence in the input string. We solve this problem in $O(n(1 + (\log\sigma/\sqrt{\log n}))$ time, using the same working space as above. The previous best solution for rightmost parsing uses $O(n(1+\log\sigma/\log\log n))$ time and $O(n\log n)$ space. As a bonus, in our solution for rightmost parsing we provide a faster construction method for efficient 2D orthogonal range reporting, which is of independent interest.

at July 28, 2015 12:43 AM UTC

Authors: Ali Alatabbi, Jacqueline W. Daykin, M. Sohel Rahman, W. F. Smyth
Download: PDF
Abstract: $V$-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words. $V$-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient $V$-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on $V$-order in strings, then go on to explore the algorithmic consequences.

at July 28, 2015 12:40 AM UTC

Ponder the following:

Larry, Moe, and Curly are on Jeopardy.

Going into Final Jeopardy:

Larry has $50,000, Moe has $10,000, Curly has $10,000

Larry bets $29,999, Moe bets $10,000, Curly bets $10,000

These bets are ALL RATIONAL and ALL MATTER independent of what the category is. For example, these bets make sense whether the category is THE THREE STOOGES or CIRCUIT LOWER BOUNDS.

Explain why this is.

I'll answer in my next post or in the comments of this one
depending on... not sure what it depends on.

by GASARCH ( at July 27, 2015 10:22 PM UTC

Authors: Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan
Download: PDF
Abstract: We introduce and analyze stochastic optimization methods where the input to each gradient update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms.In this framework, asynchronous stochastic optimization algorithms can be thought of as serial methods operating on noisy inputs. Using our perturbed iterate framework, we provide new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates. We proceed to apply our framework to develop and analyze KroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

at July 27, 2015 12:42 AM UTC

Authors: Parinya Chalermsook, Mayank Goswami, Laszlo Kozma, Kurt Mehlhorn, Thatchaphol Saranurak
Download: PDF
Abstract: The dynamic optimality conjecture is perhaps the most fundamental open question about binary search trees (BST). It postulates the existence of an asymptotically optimal online BST, i.e. one that is constant factor competitive with any BST on any input access sequence. The two main candidates for dynamic optimality in the literature are splay trees [Sleator and Tarjan, 1985], and Greedy [Lucas, 1988; Munro, 2000; Demaine et al. 2009] [..]

Dynamic optimality is trivial for almost all sequences: the optimum access cost of most length-n sequences is Theta(n log n), achievable by any balanced BST. Thus, the obvious missing step towards the conjecture is an understanding of the "easy" access sequences. [..] The difficulty of proving dynamic optimality is witnessed by highly restricted special cases that remain unresolved; one prominent example is the traversal conjecture [Sleator and Tarjan, 1985], which states that preorder sequences (whose optimum is linear) are linear-time accessed by splay trees; no online BST is known to satisfy this conjecture.

In this paper, we prove two different relaxations of the traversal conjecture for Greedy: (i) Greedy is almost linear for preorder traversal, (ii) if a linear-time preprocessing is allowed, Greedy is in fact linear. These statements are corollaries of our more general results that express the complexity of access sequences in terms of a pattern avoidance parameter k. [..] To our knowledge, these are the first upper bounds for Greedy that are not known to hold for any other online BST. To obtain these results we identify an input-revealing property of Greedy. Informally, this means that the execution log partially reveals the structure of the access sequence. This property facilitates the use of rich technical tools from forbidden submatrix theory.


at July 27, 2015 12:47 AM UTC

Authors: François Le Gall, Shogo Nakajima
Download: PDF
Abstract: This paper presents a quantum algorithm for triangle finding over sparse graphs that improves over the previous best quantum algorithm for this task by Buhrman et al. [SIAM Journal on Computing, 2005]. Our algorithm is based on the recent $\tilde O(n^{5/4})$-query algorithm given by Le Gall [FOCS 2014] for triangle finding over dense graphs (here $n$ denotes the number of vertices in the graph). We show in particular that triangle finding can be solved with $O(n^{5/4-\epsilon})$ queries for some constant $\epsilon>0$ whenever the graph has at most $O(n^{2-c})$ edges for some constant $c>0$.

at July 27, 2015 12:00 AM UTC