The College of Computer and Information Science and the Department of Mathematics at Northeastern University invite applications for tenure track and tenured faculty positions in Mathematics and Data Science.
Outstanding candidates from all areas in Computer and Data Science, Probability and Statistics, Topological Data Analysis, and Discrete and Computational Mathematics will be considered.
Website: https://neu.peopleadmin.com/postings/search
Email: viola@ccs.neu.edu
One Thursday afternoon last month, dozens of fires and explosions rocked three towns along the Merrimack River in Massachusetts. By the end of the day 131 buildings were damaged or destroyed, one person was killed, and more than 20 were injured. Suspicion focused immediately on the natural gas system. It looked like a pressure surge in the pipelines had driven gas into homes where stoves, heaters, and other appliances were not equipped to handle the excess pressure. Earlier this week the National Transportation Safety Board released a brief preliminary report supporting that hypothesis.
A house in Lawrence, Mass., burned on September 13, 2018, as a result of a natural gas pressure surge. (Image from the NTSB report.)
I had believed such a catastrophe was all but impossible. The natural gas industry has many troubles, including chronic leaks that release millions of tons of methane into the atmosphere, but I had thought that pressure regulation was a solved problem. Even if someone turned the wrong valve, failsafe mechanisms would protect the public. Evidently not. (I am not an expert on natural gas. While working on my book Infrastructure, I did some research on the industry and the technology, toured a pipeline terminal, and spent a day with a utility crew installing new gas mains in my own neighborhood. The pages of the book that discuss natural gas are online here.)
The hazards of gas service were already well known in the 19th century, when many cities built their first gas distribution systems. Gas in those days was not “natural” gas; it was a product manufactured by roasting coal, or sometimes the tarry residue of petroleum refining, in an atmosphere depleted of oxygen. The result was a mixture of gases, including methane and other hydrocarbons but also a significant amount of carbon monoxide. Because of the CO content, leaks could be deadly even if the gas didn’t catch fire.
Every city needed its own gasworks, because there were no long-distance pipelines. The output of the plant was accumulated in a gasholder, a gigantic tank that confined the gas at low pressure—less than one pound per square inch above atmospheric pressure (a unit of measure known as pounds per square inch gauge, or psig). The gas was gently wafted through pipes laid under the street to reach homes at a pressure of 1/4 or 1/2 psig. Overpressure accidents were unlikely because the entire system worked at the same modest pressure. As a matter of fact, the greater risk was underpressure. If the flow of gas was interrupted even briefly, thousands of pilot lights would go out; then, when the flow resumed, unburned toxic gas would seep into homes. Utility companies worked hard to ensure that would never happen.
Gas technology has evolved a great deal since the gaslight era. Long-distance pipelines carry natural gas across continents at pressures of 1,000 psig or more. At the destination, the gas is stored in underground cavities or as a cryogenic liquid. It enters the distribution network at pressures in the neighborhood of 100 psig. The higher pressures allow smaller diameter pipes to serve larger territories. But the pressure must still be reduced to less than 1 psig before the gas is delivered to the customer. Having multiple pressure levels complicates the distribution system and requires new safeguards against the risk of high-pressure gas going where it doesn’t belong. Apparently those safeguards didn’t work last month in the Merrimack valley.
The gas system in that part of Massachusetts is operated by Columbia Gas, a subsidiary of a company called NiSource, with headquarters in Indiana. At the time of the conflagration, contractors for Columbia were upgrading distribution lines in the city of Lawrence and in two neighboring towns, Andover and North Andover. The two-tier system had older low-pressure mains—including some cast-iron pipes dating back to the early 1900s—fed by a network of newer lines operating at 75 psig. Fourteen regulator stations handled the transfer of gas between systems, maintaining a pressure of 1/2 psig on the low side.
The NTSB preliminary report gives this account of what happened around 4 p.m. on September 13:
The contracted crew was working on a tie-in project of a new plastic distribution main and the abandonment of a cast-iron distribution main. The distribution main that was abandoned still had the regulator sensing lines that were used to detect pressure in the distribution system and provide input to the regulators to control the system pressure. Once the contractor crews disconnected the distribution main that was going to be abandoned, the section containing the sensing lines began losing pressure.
As the pressure in the abandoned distribution main dropped about 0.25 inches of water column (about 0.01 psig), the regulators responded by opening further, increasing pressure in the distribution system. Since the regulators no longer sensed system pressure they fully opened allowing the full flow of high-pressure gas to be released into the distribution system supplying the neighborhood, exceeding the maximum allowable pressure.
When I read those words, I groaned. The cause of the accident was not a leak or an equipment failure or a design flaw or a worker turning the wrong valve. The pressure didn’t just creep up beyond safe limits while no one was paying attention; the pressure was driven up by the automatic control system meant to keep it in bounds. The pressure regulators were “trying” to do the right thing. Sensor readings told them the pressure was falling, and so the controllers took corrective action to keep the gas flowing to customers. But the feedback loop the regulators relied on was not in fact a loop. They were measuring pressure in one pipe and pumping gas into another.
The NTSB’s preliminary report offers no conclusions or recommendations, but it does note that the contractor in Lawrence was following a “work package” prepared by Columbia Gas, which did not mention moving or replacing the pressure sensors. Thus if you’re looking for someone to blame, there’s a hint about where to point your finger. The clue is less useful, however, if you’re hoping to understand the disaster and prevent a recurrence. “Make sure all the parts are connected” is doubtless a good idea, but better still is building a failsafe system that will not burn the city down when somebody goofs.
Suppose you’re taking a shower, and the water feels too warm. You nudge the mixing valve toward cold, but the water gets hotter still. When you twist the valve a little further in the same direction, the temperature rises again, and the room fills with steam. In this situation, you would surely not continue turning the knob until you were scalded. At some point you would get out of the shower, shut off the water, and investigate. Maybe the controls are mislabeled. Maybe the plumber transposed the pipes.
Since you do so well controlling the shower, let’s put you in charge of regulating the municipal gas service. You sit in a small, windowless room, with your eyes on a pressure gauge and your hand on a valve. The gauge has a pointer indicating the measured pressure in the system, and a red dot (called a bug) showing the desired pressure, or set point. If the pointer falls below the bug, you open the valve a little to let in more gas; if the pointer drifts up too high, you close the valve to reduce the flow. (Of course there’s more to it than just open and close. For a given deviation from the set point, how far should you twist the valve handle? Control theory answers this question.)
It’s worth noting that you could do this job without any knowledge of what’s going on outside the windowless room. You needn’t give a thought to the nature of the “plant,” the system under control. What you’re controlling is the position of the needle on the gauge; the whole gas distribution network is just an elaborate mechanism for linking the valve you turn with the gauge you watch. Many automatic control system operate in exactly this mindless mode. And they work fine—until they don’t.
As a sentient being, you do in fact have a mental model of what’s happening outside. Just as the control law tells you how to respond to changes in the state of the plant, your model of the world tells you how the plant should respond to your control actions. For example, when you open the valve to increase the inflow of gas, you expect the pressure to increase. (Or, in some circumstances, to decrease more slowly. In any event, the sign of the second derivative should be positive.) If that doesn’t happen, the control law would call for making an even stronger correction, opening the valve further and forcing still more gas into the pipeline. But you, in your wisdom, might pause to consider the possible causes of this anomaly. Perhaps pressure is falling because a backhoe just ruptured a gas main. Or, as in Lawrence last month, maybe the pressure isn’t actually falling at all; you’re looking at sensors plugged into the wrong pipes. Opening the valve further could make matters worse.
Could we build an automatic control system with this kind of situational awareness? Control theory offers many options beyond the simple feedback loop. We might add a supervisory loop that essentially controls the controller and sets the set point. And there is an extensive literature on predictive control, where the controller has a built-in mathematical model of the plant, and uses it to find the best trajectory from the current state to the desired state. But neither of these techniques is commonly used for the kind of last-ditch safety measures that might have saved those homes in the Merrimack Valley. More often, when events get too weird, the controller is designed to give up, bail out, and leave it to the humans. That’s what happened in Lawrence.
Minutes before the fires and explosions occurred, the Columbia Gas monitoring center in Columbus, Ohio [probably a windowless room], received two high-pressure alarms for the South Lawrence gas pressure system: one at 4:04 p.m. and the other at 4:05 p.m. The monitoring center had no control capability to close or open valves; its only capability was to monitor pressures on the distribution system and advise field technicians accordingly. Following company protocol, at 4:06 p.m., the Columbia Gas controller reported the high-pressure event to the Meters and Regulations group in Lawrence. A local resident made the first 9-1-1 call to Lawrence emergency services at 4:11 p.m.
Columbia Gas shut down the regulator at issue by about 4:30 p.m.
I admit to a morbid fascination with stories of technological disaster. I read NTSB accident reports the way some people consume murder mysteries. The narratives belong to the genre of tragedy. In using that word I don’t mean just that the loss of life and property is very sad. These are stories of people with the best intentions and with great skill and courage, who are nonetheless overcome by forces they cannot master. The special pathos of technological tragedies is that the engines of our destruction are machines that we ourselves design and build.
Looking on the sunnier side, I suspect that technological tragedies are more likely than Oedipus Rex or Hamlet to suggest a practical lesson that might guide our future plans. Let me add two more examples that seem to have plot elements in common with the Lawrence gas disaster.
First, the meltdown at the Three Mile Island nuclear power plant in 1979. In that event, a maintenance mishap was detected by the automatic control system, which promptly shut down the reactor, just as it was supposed to do, and started emergency pumps to keep the uranium fuel rods covered with cooling water. But in the following minutes and hours, confusion reigned in the control room. Because of misleading sensor readings, the crowd of operators and engineers believed the water level in the reactor was too high, and they struggled mightily to lower it. Later they realized the reactor had been running dry all along.
Second, the crash of Air France 447, an overnight flight from Rio de Janeiro to Paris, in 2009. In this case the trouble began when ice at high altitude clogged pitot tubes, the sensors that measure airspeed. With inconsistent and implausible speed inputs, the autopilot and flight-management systems disengaged and sounded an alarm, basically telling the pilots “You’re on your own here.” Unfortunately, the pilots also found the instrument data confusing, and formed the erroneous opinion that they needed to pull the nose up and climb steeply. The aircraft entered an aerodynamic stall and fell tail-first into the ocean with the loss of all on board.
In these events no mechanical or physical fault made an accident inevitable. In Lawrence the pipes and valves functioned normally, as far as I can tell from press reports and the NTSB report. Even the sensors were working; they were just in the wrong place. At Three Mile Island there were multiple violations of safety codes and operating protocols; nevertheless, if either the automatic or the human controllers had correctly diagnosed the problem, the reactor would have survived. And the Air France aircraft over the Atlantic was airworthy to the end. It could have flown on to Paris if only there had been the means to level the wings and point it in the right direction.
All of these events feel like unnecessary disasters—if we were just a little smarter, we could have avoided them—but the fires in Lawrence are particularly tormenting in this respect. With an aircraft 35,000 feet over the ocean, you can’t simply press Pause when things don’t go right. Likewise a nuclear reactor has no safe-harbor state; even after you shut down the fission chain reaction, the core of the reactor generates enough heat to destroy itself. But Columbia Gas faced no such constraints in Lawrence. Even if the pressure-regulating system is not quite as simple as I have imagined it, there is always an escape route available when parameters refuse to respond to control inputs. You can just shut it all down. Safeguards built into the automatic control system could do that a lot more quickly than phone calls from Ohio. The service interruption would be costly for the company and inconvenient for the customers, but no one would lose their home or their life.
Control theory and control engineering are now embarking on their greatest adventure ever: the design of self-driving cars and trucks. Next year we may see the first models without a steering wheel or a brake pedal—there goes the option of asking the driver (passenger?) to take over. I am rooting for this bold undertaking to succeed. I am also reminded of a term that turns up frequently in discussions of Athenian tragedy: hubris.
[Guest post by Thomas Orton who presented a lecture on this in our physics and computation seminar –Boaz]
This blog post is a continuation of the CS229R lecture series. Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behavior, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimization for physical systems. With these two perspectives in mind, we’ll then try to use belief propagation to do inference on the the stochastic block model. We’ll see some heuristic techniques for determining when BP succeeds and fails in inference, as well as some numerical simulation results of belief propagation for this problem. Lastly, we’ll talk about where this all fits into what is currently known about efficient algorithms and information theoretic barriers for the stochastic block model.
Suppose someone gives you a probabilistic model on (think of as a discrete set) which can be decomposed in a special way, say
where each only depends on the variables . Recall from last week that we can express constraint satisfaction problems in these kinds of models, where each is associated with a particular constraint. For example, given a 3SAT formula , we can let if is satisfied, and 0 otherwise. Then each only depends on 3 variables, and only has support on satisfying assignments of .
A central problem in computer science is trying to find satisfying assignments to constraint satisfaction problems, i.e. finding values in the support of . Suppose that we knew that the value of were . Then we would know that there exists some satisfying assignment where . Using this knowledge, we could recursively try to find ‘s in the support of , and iteratively come up with a satisfying assignment to our constraint satisfaction problem. In fact, we could even sample uniformally from the distribution as follows: randomly assign to with probability , and assign it to otherwise. Now iteratively sample from for the model where is fixed to the value we assigned to it, and repeat until we’ve assigned values to all of the . A natural question is therefore the following: When can we try to efficiently compute the marginals
for each ?
A well known efficient algorithm for this problem exists when the corresponding graphical model of (more in this in the next section) is a tree. Even though belief propagation is only guaranteed to work exactly for trees, we might hope that if our factor graph is “tree like”, then BP will still give a useful answer. We might even go further than this, and try to analyze exactly when BP fails for a random constraint satisfaction problem. For example, you can do this for k-SAT when is large, and then learn something about the solution threshold for k-SAT. It therefore might be natural to try and study when BP succeeds and fails for different kinds of problems.
We will start by making two simplifying assumptions on our model .
First, we will assume that can be written in the form for some functions and some “edge set” (where the edges are undirected). In other words, we will only consider pairwise constraints. We will see later that this naturally corresponds to a physical interpretation, where each of the “particles” interact with each other via pairwise forces. Belief propagation actually still works without this assumption (which is why we can use it to analyze -SAT for ), but the pairwise case is all we need for the stochastic block model.
For the second assumption, notice that there is a natural correspondence between and the graphical model on vertices, where forms an edge in iff . In order words, edges in correspond to factors of the form in , and vertices in correspond to variables in . Our second assumption is that the graphical model is a tree.
Now, suppose we’re given such a tree which represents our probabilistic model. How to we compute the marginals? Generally speaking, when computer scientists see trees, they begin to get very excited [reference]. “I know! Let’s use recursion!” shouts the student in CS124, their heart rate noticeably rising. Imagine that we arbitrarily rooted our tree at vertex . Perhaps, if we could somehow compute the marginals of the children of , we could somehow stitch them together to compute the marginal . In order words, we should think about computing the marginals of roots of subtrees in our graphical model. A quick check shows that the base case is easy: suppose we’re given a graphical model which is a tree consisting of a single node . This corresponds to some PDF . So to compute , all we have to do is compute the marginalizing constant , and then we have . With the base case out of the way, let’s try to solve the induction step: given a graphical model which is a tree rooted at , and where we’re given the marginals of the subtrees rooted at the children of , how do we compute the marginal of the tree rooted at ? Take a look at figure 2 to see what this looks like graphically. To formalize the induction step, we’ll define some notation that will also be useful to us later on. The main pieces of notation are , which is the subtree rooted at with parent , and the “messages” , which can be thought of as information which is passed from the child subtrees of to the vertex in order to compute the marginals correctly.
Phew! That was a lot of notation. Now that we have that out of the way, let’s see how we can express the marginal of the root of a tree as a function of the marginals of its subtrees. Suppose we’re considering the subtree , so that vertex has children . Then we can compute the marginal directly:
The non-obvious step in the above is that we’re able to switch around summations and products: we’re able to do this because each of the trees are functions on disjoint sets of variables. So we’re able to express as a function of the children values . Looking at the update formula we have derived, we can now see why the are called “messages” to vertex : they send information about the child subtrees to their parent .
The above discussion is a purely algebraic way of deriving belief propagation. A more intuitive way to get this result is as follows: imagine fixing the value of in the the subtree , and then drawing from each of the marginals of the children of conditioned on the value . We can consider the marginals of each of the children independently, because the children are independent of each other when conditioned on the value of . Converting words to equations, this means that if has children , then the marginal probability of in the subtree is proportional to . We can then write
And we get back what we had before. We’ll call this last equation our “update” or “message passing” equation. The key assumption we used was that if we condition on , then the children of are independent. It’s useful to keep this assumption in mind when thinking about how BP behaves on more general graphs.
A similar calculation yields that we can calculate the marginal of our original probability distribution as the marginal of the subtree with no parent, i.e.
Great! So now we have an algorithm for computing marginals: recursively compute for each in a dynamic programming fashion with the “message passing” equations we have just derived. Then, compute for each . If the diameter of our tree is , then the recursion depth of our algorithm is at most .
However, instead of computing every neatly with recursion, we might try something else: let’s instead randomly initialize each with anything we want. Then, let’s update each in parallel with our update equations. We will keep doing this in successive steps until each has converged to a fixed value. By looking at belief propagation as a recursive algorithm, it’s easy to see that all of the ‘s will have their correct values after at most steps. This is because (after arbitrarily rooting our tree at any vertex) the leaves of our recursion will initialize to the correct value after 1 step. After two steps, the parents of the leaves will be updated correctly as functions of the leaves, and so they will have the correct values as well. Specifically:
Proposition: Suppose we initialize messages arbitrarily, and update them in parallel according to our update equations. If has diameter , then after steps each converges, and we recover the correct marginals.
Why would anyone want to do things in this way? In particular, by computing everything in parallel in steps instead of recursively, we’re computing a lot of “garbage” updates which we never use. However, the advantage of doing things in this way is that this procedure is now well defined for general graphs. In particular, suppose violated assumption (2), so that the corresponding graph were not a tree. Then we could still try to compute the messages with parallel updates. We are also able to do this in a local “message passing” kind of way, which some people may find physically intuitive. Maybe if we’re lucky, the messages will converge after a reasonable amount of iterations. Maybe if we’re even luckier, they will converge to something which gives us information about the marginals . In fact, we’ll see that just such a thing happens in the stochastic block model. More on that later. For now, let’s shift gears and look at belief propagation from a physics perspective.
We’ve just seen a statistical/algorithmic view of how to compute marginals in a graphical model. It turns out that there’s also a physical way to think about this, which leads to a qualitatively similar algorithm. Recall from last week that another interpretation of a pairwise factor-able PDF is that of particles interacting with each other via pairwise forces. In particular, we can imagine each particle interacting with via a force of strength
and in addition, interacting with an external field
We imagine that each of our particles take values from a discrete set . When , we recover the Ising model, and in general we have a Potts model. The energy function of this system is then
with probability distribution given by
Now, for , computing the marginals corresponds to the equivalent physical problem of computing the “magnetizations” .
How does this setup relate to the previous section, where we thought about constraint satisfaction problems and probability distributions? If we could set and , we would recover exactly the probability distribution from the previous section. From a constraint satisfaction perspective, if we set if constraint is satisfied and otherwise, then as (our system becomes colder), ‘s probability mass becomes concentrated only on the satisfying assignments of the constraint satisfaction problem.
We’re now going to try a different approach to computing the marginals: let’s define a distribution , which we will hope to be a good approximation to . If you like, you can think about the marginal as being the “belief” about the state of variable . We can measure the “distance” between and by the KL divergence
which equals 0 iff the two distributions are equal. Let’s define the Gibbs free energy as
So the minimum value of is which is called the Helmholz free energy, is the “average energy” and is the “entropy”.
Now for the “free energy minimization part”. We want to choose to minimize , so that we can have that is a good approximation of . If this happens, then maybe we can hope to “read out” the marginals of directly. How do we do this in a way which makes it easy to “read out” the marginals? Here’s one idea: let’s try to write as a function of only the marginals and of . If we could do this, then maybe we could try to minimize by only optimizing over values for “variables” and . However, we need to remember that and are actually meant to represent marginals for some real probability distribution . So at the very least, we should add the consistency constraints and to our optimization problem. We can then think of and as “pseudo-marginals” which obey degree-2 Sherali-Adams constraints.
Recall that we’ve written as a sum of both the average energy and the entropy . It turns out that we can actually write as only a function of the pariwise marginals of :
which follows just because the sums marginalize out the variables which don’t form part of the pairwise interactions:
This is good news: since only depends on pairwise interactions, the average energy component of only depends on and . However, it is not so clear how to express the entropy as a function of one node and two node beliefs. However, maybe we can try to pretend that our model is really a “tree”. In this case, the following is true:
Claim: If our model is a tree, and and are the associated marginals of our probabilistic model , then we have
where is the degree of vertex in the tree.
It’s not too difficult to see why this is the case: imagine a tree rooted at , with children . We can think of sampling from this tree as first sampling from via its marginal , and then by recursively sampling the children conditioned on . Associate with the subtrees of the children of , i.e. is equal to the probability of the occurrence on the probabilistic model of the tree rooted at vertex . Then we have
where the last line follows inductively, since each only sees edges of .
If we make the assumption that our model is a tree, then we can write the Bethe approximation entropy as
Where the ‘s are the degrees of the variables in the graphical model defined by . We then define the Bethe free energy as . The Bethe free energy is in general not an upper bound on the true free energy. Note that if we make the assignments , , then we can rewrite as
which is similar in form to the Bethe approximation entropy. In general, we have
which is exactly the Gibbs free energy for a probabilistic model whose associated graph is a tree. Since BP gives the correct marginals on trees, we can say that the BP beliefs are the global minima of the Bethe free energy. However, the following is also true:
Proposition: A set of beliefs gives a BP fixed point in any graph (not necessarily a tree) iff they correspond to local stationary points of the Bethe free energy.
(For a proof, see e.g. page 20 of [4])
So trying to minimize the Bethe free energy is in some sense the same thing as doing belief propagation. Apparently, one typically finds that when belief propagation fails to converge on a graph, the optimization program which is trying to minimize also runs into problems in similar parameter regions, and vice versa.
Now that we’ve seen Belief Propagation from two different perspectives, let’s try to apply this technique of computing marginals to analyzing the behavior of the stochastic block model. This section will heavily follow the paper [2].
The stochastic block model is designed to capture a variety of interesting problems, depending on its settings of parameters. The question we’ll be looking at is the following: suppose we generate a random graph, where each vertex of the graph comes from one of groups each with probability . We add an edge between vertices in groups resp. with probability . For sparse graphs, we define , where we think of as . The problem is the following: given such a random graph, can you label the vertices so that, up to permutation, the labels you choose have high correlation to the true hidden labels which were used to generate the graph? Here are some typical settings of parameters which represent different problems:
We’ll concern ourselves with the case where our graph is sparse, and we need to try and come up with an assignment for the vertices such that we have high correlation with the true labeling of vertices. How might we measure how well we solve this task? Ideally, a labeling which is identical to the true labeling (up to permutation) should get a score of 1. Conversely, a labeling which naively guesses that every vertex comes from the largest group should get a score of 0. Here’s one metric which satisfies these properties: if we come up with a labeling , and the true labeling is , then we’ll measure our performance by
where we maximize over all permutations . When we choose a labeling which (up to permutation) agrees with the true labeling, then the numerator of will equal the denominator, and . Likewise, when we trivially guess that every vertex belongs to the largest group, then the numerator of is and .
Given and a set of observed edges , we can write down the probability of a labeling as
How might we try to infer such that we have maximum correlation (up to permutation) with the true labeling? It turns out that the answer is to use the maximum likelihood estimator of the marginal distribution of each , up to a caveat. In particular, we should label with the such that is maximized. The caveat comes in when is invariant under permutations of the labelings , so that each marginal is actually the uniform distribution. For example, this happens in community detection, when all the group sizes are equal. In this case, the correct thing to do is to still use the marginals, but only after we have “broken the symmetry” of the problem by randomly fixing certain values of the vertices to have particular labels. There’s actually a way belief propagation does this implicitly: recall that we start belief propagation by randomly initializing the messages. This random initialization can be interpreted as “symmetry breaking” of the problem, in a way that we’ll see shortly.
We’ve just seen from the previous section that in order to maximize the correlation of the labeling we come up with, we should pick the labelings which maximize the marginals of . So we have some marginals that we want to compute. Let’s proceed by applying BP to this problem in the “sparse” regime where (other algorithms, like approximate message passing, can be used for “dense” graph problems). Suppose we’re given a random graph with edge list . What does does graph associated with our probabilistic model look like? Well, in this case, every variable is actually connected to every other variable because includes a factor for every , so we actually have a complete graph. However, some of the connections between variables are much weaker than others. In full, our BP update equations are
Likewise
what we want to do is approximate these equations so that we only have to pass messages along the edges , instead of the complete graph. This will make our analysis simpler, and also allow the belief propagation algorithm to run more efficiently. The first observation is the following: Suppose we have two nodes such that . Then we see that , since the only difference between these two variables are two factors of order which appear in the first product of the BP equations. Thus, we send essentially the same messages to non-neighbours of in our random graph. In general though, we have:
The first approximation comes from dropping non-edge constraints on the first product, and is reasonable because we expect the number of neighbours of to be constant. We’ve also defined a variable
and we’ve used the approximation for small . We think of the term as defining an “auxiliary external field”. We’ll use this approximate BP equation to find solutions for our problem. This has the advantage that the computation time is instead of , so we can deal with large sparse graphs computationally. It also allows us to see how a large dense graphical model with only sparse strong connections still behaves like a sparse tree-like graphical model from the perspective of Belief Propagation. In particular, we might have reason to hope that the BP equations will actually converge and give us good approximations to the marginals.
From now on, we’ll only consider factored block models, which in some sense represent a “hard” setting of parameters. These are models which satisfy the condition that each group has the same average degree . In particular, we require
An important observation for this setting of parameters is that
is always a fixed point of our BP equations, which is known as a factored fixed point (this can be seen by inspection by plugging the fixed point conditions into the belief propagation equations we derived). When BP ever reaches such a fixed point, we get that and the algorithm fails. However, we might hope that if we randomly initialize , then BP might converge to some non-trivial fixed point which gives us some information about the original labeling of the vertices.
Now that we have our BP equations, we can run numerical simulations to try and get a feel of when BP works. Let’s consider the problem of community detection. In particular, we’ll set our parameters with all group sizes being equal, and with for and vary the ratio , and see when BP finds solutions which are correlated “better than guessing” to the original labeling used to generate the graph. When we do this, we get images which look like this:
It should be mentioned that the point at which the dashed red line occurs depends on the parameters of the stochastic block model. We get a few interesting observations from numerical experiments:
How might we analytically try to determine when BP fails for certain settings of and ? One way we might heuristically try to do this, is to calculate the stability of the factored fixed point. If the fixed point is stable, this suggests that BP will converge to a factored point. If however it is unstable, then we might hope that BP converges to something informative. In particular, suppose we run BP, and we converge to a factored fixed point, so we have that for all our messages . Suppose we now add a small amount of noise to some of the ‘s (maybe think of this as injecting a small amount of additional information about the true marginals). We (heuristically) claim that if we now continue to run more steps of BP, either the messages will converge back to the fixed point , or they will diverge to something else, and whether or not this happens depends on the eigenvalue of some matrix of partial derivatives.
Following this idea, here’s a heuristic way of calculating the stability of the factored fixed point. Let’s pretend that our BP equations occur on a tree, which is a reasonable approximation in the sparse graph case. Let our tree be rooted at node and have depth . Let’s try to approximately calculate the influence on of perturbing a leaf from its factored fixed point. In particular, let the path from the leaf to the root be . We’re going to apply a perturbation for each . In vector notation, this looks like , where is a column vector. The next thing we’ll do is define the matrix of partial derivatives
Up to first order (and ignoring normalizing constants), the perturbation effect on is then (by chain rule) . Since does not depend on , we can write this as , where is the largest eigenvalue of . Now, on a random tree, we have approximately leaves. If we assume that the perturbation effect from each leaf is independent, and that has 0 mean, then the net mean perturbation from all the leaves will be 0. The variance will be
if we assume that the cross terms vanish in expectation.
(Aside: You might want to ask: why are we assuming that has mean zero, and that (say) the noise at each of the leaves are independent, so that the cross terms vanish? If we want to maximize the variance, then maybe choosing the ‘s to be correlated or have non-zero mean would give us a better bound. The problem is that we’re neglecting the effects of normalizing constants in this analysis: if we perturbed all the in the same direction (e.g. non-zero mean), our normalization conditions would cancel out our perturbations.)
We Therefore end up with the stability condition . When , a small perturbation will be magnified as we move up the tree, leading to the messages moving away from the factored fixed point after successive iterations of BP (the fixed point is unstable). If , the effect of a small perturbation will vanish as we move up the tree, we expect the factored fixed point to be stable. If we restrict our attention to graphs of the form for , and have all our groups with size , then is known to have eigenvalues with eigenvector , and . The stability threshold then becomes . This condition is known as the Almeida-Thouless local stability condition for spin glasses, and the Kesten-Stigum bound on reconstruction on trees. It is also observed empirically that BP and MCMC succeed above this threshold, and converge to factored fixed points below this threshold. The eigenvalues of are related to the belief propagation equations and the backtracking matrix. For more details, see [3]
We’ve just seen a threshold for when BP is able to solve the community detection problem. Specifically, when , BP doesn’t do better than chance. It’s natural to ask whether this is because BP is not powerful enough, or whether there really isn’t enough information in the random graph to recover the true labeling. For example, if is very close to , it might be impossible to distinguish between group boundaries up to random fluctuations in the edges.
It turns out that for , there is not enough information below the threshold to find a labeling which is correlated with the true labeling [3]. However, it can be shown information-theoretically [1] that the threshold at which one can find a correlated labeling is . In particular, when , there exists exponential time algorithms which recover a correlated labeling below the Kesten-Stigum threshold. This is interesting, because it suggests an information-computational gap: we observe empirically that heuristic belief propagation seems to perform as well as any other inference algorithm at finding a correlated labeling for the stochastic block model. However, belief propagation fails at a “computational” threshold below the information theoretic threshold for this problem. We’ll talk more about these kinds of information-computation gaps in the coming weeks.
[1] Jess Banks, Cristopher Moore, Joe Neeman, Praneeth Netrapalli,
Information-theoretic thresholds for community detection in sparse
networks. AJMLR: Workshop and Conference Proceedings vol 49:1–34, 2016.
Link
[2] Aurelien Decelle, Florent Krzakala, Cristopher Moore, Lenka Zdeborová,
Asymptotic analysis of the stochastic block model for modular networks and its
algorithmic applications. 2013.
Link
[3] Cristopher Moore, The Computer Science and Physics
of Community Detection: Landscapes, Phase Transitions, and Hardness. 2017.
Link
[4] Jonathan Yedidia, William Freeman, Yair Weiss, Understanding Belief Propogation and its Generalizations
Link
[5] Afonso Banderia, Amelia Perry, Alexander Wein, Notes on computational-to-statistical gaps: predictions using statistical physics. 2018.
Link
[6] Stephan Mertens, Marc Mézard, Riccardo Zecchina, Threshold Values of Random K-SAT from the Cavity Method. 2005.
Link
[7] Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian, Clusters of solutions and replica symmetry breaking in random k-satisfiability. 2008.
Link
A big thanks to Tselil for all the proof reading and recommendations, and to both Boaz and Tselil for their really detailed post-presentation feedback.
Authors: Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang
Download: PDF
Abstract: We show that for the problem of testing if a matrix $A \in F^{n \times n}$
has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to
have rank at most $d$, there is a non-adaptive query algorithm making
$\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$.
This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and
bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the
algorithm is required to read a submatrix. Our algorithm is the first such
algorithm which does not read a submatrix, and instead reads a carefully
selected non-adaptive pattern of entries in rows and columns of $A$. We
complement our algorithm with a matching query complexity lower bound for
non-adaptive testers over any field. We also give tight bounds of
$\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access
comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps
surprisingly these bounds do not depend on $\epsilon$.
We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
Authors: Spyros Angelopoulos, Christoph Dürr, Shendan Jin
Download: PDF
Abstract: In search problems, a mobile searcher seeks to locate a target that hides in
some unknown position of the environment. Such problems are typically
considered to be of an on-line nature, in that the input is unknown to the
searcher, and the performance of a search strategy is usually analyzed by means
of the standard framework of the competitive ratio, which compares the cost
incurred by the searcher to an optimal strategy that knows the location of the
target. However, one can argue that even for simple search problems,
competitive analysis fails to distinguish between strategies which,
intuitively, should have different performance in practice.
Motivated by the above, in this work we introduce and study measures supplementary to competitive analysis in the context of search problems. In particular, we focus on the well-known problem of linear search, informally known as the cow-path problem, for which there is an infinite number of strategies that achieve an optimal competitive ratio equal to 9. We propose a measure that reflects the rate at which the line is being explored by the searcher, and which can be seen as an extension of the bijective ratio over an uncountable set of requests. Using this measure we show that a natural strategy that explores the line aggressively is optimal among all 9-competitive strategies. This provides, in particular, a strict separation from the competitively optimal doubling strategy, which is much more conservative in terms of exploration. We also provide evidence that this aggressiveness is requisite for optimality, by showing that any optimal strategy must mimic the aggressive strategy in its first few explorations.
Authors: Marco Gaboardi, Ryan Rogers, Or Sheffet
Download: PDF
Abstract: This work provides tight upper- and lower-bounds for the problem of mean
estimation under $\epsilon$-differential privacy in the local model, when the
input is composed of $n$ i.i.d. drawn samples from a normal distribution with
variance $\sigma$. Our algorithms result in a $(1-\beta)$-confidence interval
for the underlying distribution's mean $\mu$ of length $\tilde O\left(
\frac{\sigma \sqrt{\log(\frac 1 \beta)}}{\epsilon\sqrt n} \right)$. In
addition, our algorithms leverage binary search using local differential
privacy for quantile estimation, a result which may be of separate interest.
Moreover, we prove a matching lower-bound (up to poly-log factors), showing
that any one-shot (each individual is presented with a single query) local
differentially private algorithm must return an interval of length
$\Omega\left( \frac{\sigma\sqrt{\log(1/\beta)}}{\epsilon\sqrt{n}}\right)$.
Authors: Nicolas Champseix, Esther Galby, Bernard Ries
Download: PDF
Abstract: We show that for any $k \geq 0$, there exists a planar graph which is
$B_{k+1}$-CPG but not $B_k$-CPG. As a consequence, we obtain that $B_k$-CPG is
a strict subclass of $B_{k+1}$-CPG.
Authors: Saeed Akhoondian Amiri, Alexandru Popa, Golnoosh Shahkarami, Hossein Vahidi
Download: PDF
Abstract: The anti-Ramsey numbers are a fundamental notion in graph theory, introduced
in 1978, by Erd\"os, Simonovits and S\'os. For given graphs $G$ and $H$ the
anti-Ramsey number $\textrm{ar}(G,H)$ is defined to be the maximum number $k$
such that there exists an assignment of $k$ colors to the edges of $G$ in which
every copy of $H$ in $G$ has at least two edges with the same color. Precolored
version of the problem is defined in a similar way except that the input graph
is given with some fixed colors on some of the edges.
Usually, combinatorists study extremal values of anti-Ramsey numbers for various classes of graphs. In this paper we study the complexity of computing the anti-Ramsey number $\textrm{ar}(G,P_k)$, where $P_k$ is a path of length $k$. First we observe the hardness of the problem when $k$ is not fixed and we study the exact complexity of precolored version and show that there is no subexponential algorithm for the problem unless ETH fails already for $k=3$. We show that computing the $\textrm{ar}(G,P_3)$ is hard to approximate to a factor of $n^{-
1/2 - \epsilon}$ even in $3$-partite graphs, unless $NP{}= {}ZPP$. On the positive side we provide polynomial time algorithm for trees and we show approximability of the problem on special classes of graphs.
Authors: Zhiyi Huang, Binghui Peng, Zhihao Gavin Tang, Runzhou Tao, Xiaowei Wu, Yuhao Zhang
Download: PDF
Abstract: Huang et al.~(STOC 2018) introduced the fully online matching problem, a
generalization of the classic online bipartite matching problem in that it
allows all vertices to arrive online and considers general graphs. They showed
that the ranking algorithm by Karp et al.~(STOC 1990) is strictly better than
$0.5$-competitive and the problem is strictly harder than the online bipartite
matching problem in that no algorithms can be $(1-1/e)$-competitive.
This paper pins down two tight competitive ratios of classic algorithms for the fully online matching problem. For the fractional version of the problem, we show that a natural instantiation of the water-filling algorithm is $2-\sqrt{2} \approx 0.585$-competitive, together with a matching hardness result. Interestingly, our hardness result applies to arbitrary algorithms in the edge-arrival models of the online matching problem, improving the state-of-art $\frac{1}{1+\ln 2} \approx 0.5906$ upper bound. For integral algorithms, we show a tight competitive ratio of $\approx 0.567$ for the ranking algorithm on bipartite graphs, matching a hardness result by Huang et al. (STOC 2018).
Authors: Michael B. Cohen, Yin Tat Lee, Zhao Song
Download: PDF
Abstract: This paper shows how to solve linear programs of the form $\min_{Ax=b,x\geq0}
c^\top x$ with $n$ variables in time
$$O^*((n^{\omega}+n^{2.5-\alpha/2}+n^{2+1/6}) \log(n/\delta))$$ where $\omega$
is the exponent of matrix multiplication, $\alpha$ is the dual exponent of
matrix multiplication, and $\delta$ is the relative accuracy. For the current
value of $\omega\sim2.37$ and $\alpha\sim0.31$, our algorithm takes
$O^*(n^{\omega} \log(n/\delta))$ time. When $\omega = 2$, our algorithm takes
$O^*(n^{2+1/6} \log(n/\delta))$ time.
Our algorithm utilizes several new concepts that we believe may be of independent interest:
$\bullet$ We define a stochastic central path method.
$\bullet$ We show how to maintain a projection matrix $\sqrt{W}A^{T}(AWA^{\top})^{-1}A\sqrt{W}$ in sub-quadratic time under $\ell_{2}$ multiplicative changes in the diagonal matrix $W$.
Authors: Shi Li, Xiangyu Guo
Download: PDF
Abstract: In this paper, we consider the $k$-center/median/means clustering with
outliers problems (or the $(k, z)$-center/median/means problems) in the
distributed setting. Most previous distributed algorithms have their
communication costs linearly depending on $z$, the number of outliers. Recently
Guha et al. overcame this dependence issue by considering bi-criteria
approximation algorithms that output solutions with $2z$ outliers. For the case
where $z$ is large, the extra $z$ outliers discarded by the algorithms might be
too large, considering that the data gathering process might be costly. In this
paper, we improve the number of outliers to the best possible $(1+\epsilon)z$,
while maintaining the $O(1)$-approximation ratio and independence of
communication cost on $z$. The problems we consider include the $(k, z)$-center
problem, and $(k, z)$-median/means problems in Euclidean metrics.
Implementation of the our algorithm for $(k, z)$-center shows that it
outperforms many previous algorithms, both in terms of the communication cost
and quality of the output solution.
Authors: David Eppstein, Bruce Reed
Download: PDF
Abstract: We consider decomposing a 3-connected planar graph $G$ using laminar
separators of size three. We show how to find a maximal set of laminar
3-separators in such a graph in linear time. We also discuss how to find
maximal laminar set of 3-separators from special families. For example we
discuss non-trivial cuts, ie. cuts which split $G$ into two components of size
at least two. For any vertex $v$, we also show how to find a maximal set of
3-separators disjoint from $v$ which are laminar and satisfy: every vertex in a
separator $X$ has two neighbours not in the unique component of $G-X$
containing $v$. In all cases, we show how to construct a corresponding tree
decomposition of adhesion three. Our new algorithms form an important component
of recent methods for finding disjoint paths in nonplanar graphs.
Authors: Joe Cheriyan, Jack Dippel, Fabrizio Grandoni, Arindam Khan, Vishnu V. Narayan
Download: PDF
Abstract: We present a $\frac74$ approximation algorithm for the matching augmentation
problem (MAP): given a multi-graph with edges of cost either zero or one such
that the edges of cost zero form a matching, find a 2-edge connected spanning
subgraph (2-ECSS) of minimum cost.
We first present a reduction of any given MAP instance to a collection of well-structured MAP instances such that the approximation guarantee is preserved. Then we present a $\frac74$ approximation algorithm for a well-structured MAP instance. The algorithm starts with a min-cost 2-edge cover and then applies ear-augmentation steps. We analyze the cost of the ear-augmentations using an approach similar to the one proposed by Vempala and Vetta for the (unweighted) min-size 2-ECSS problem (`Factor 4/3 approximations for minimum 2-connected subgraphs,' APPROX 2000, LNCS 1913, pp.262-273).
Authors: Sarah J. Berkemer, Christian Höner zu Siederdissen, Peter F. Stadler
Download: PDF
Abstract: Alignments, i.e., position-wise comparisons of two or more strings or ordered
lists are of utmost practical importance in computational biology and a host of
other fields, including historical linguistics and emerging areas of research
in the Digital Humanities. The problem is well-known to be computationally hard
as soon as the number of input strings is not bounded. Due to its prac- tical
importance, a huge number of heuristics have been devised, which have proved
very successful in a wide range of applications. Alignments nevertheless have
received hardly any attention as formal, mathematical structures. Here, we
focus on the compositional aspects of alignments, which underlie most algo-
rithmic approaches to computing alignments. We also show that the concepts
naturally generalize to finite partially ordered sets and partial maps between
them that in some sense preserve the partial orders.
Authors: Jose Blanchet, Arun Jambulapati, Carson Kent, Aaron Sidford
Download: PDF
Abstract: In this work, we provide faster algorithms for approximating the optimal
transport distance, e.g. earth mover's distance, between two discrete
probability distributions $\mu, \nu \in \Delta^n$. Given a cost function $C :
[n] \times [n] \to \mathbb{R}_{\geq 0}$ where $C(i,j)$ quantifies the penalty
of transporting a unit of mass from $i$ to $j$, we show how to compute a
coupling $X$ between $r$ and $c$ in time $\widetilde{O}\left(n^2 /\epsilon
\right)$ whose expected transportation cost is within an additive $\epsilon$ of
optimal, where we have assumed that the largest entry of $C$ is bounded by a
constant. This improves upon the previously best known running time for this
problem of $\widetilde{O}\left(\min\left\{ n^{9/4}/\epsilon, n^2/\epsilon^2
\right\}\right)$.
We achieve our results by providing reductions from optimal transport to canonical optimization problems for which recent algorithmic efforts have provided nearly-linear time algorithms. Leveraging nearly linear time algorithms for solving packing linear programs and for solving the matrix balancing problem, we obtain two separate proofs of our stated running time.
Moreover, we show that further algorithmic improvements to our result would be surprising in the sense that any improvement would yield an $o(n^{2.5})$ algorithm for \textit{maximum cardinality bipartite matching}, for which currently the only known algorithms for achieving such a result are based on fast-matrix multiplication.
For chess and science: a cautionary tale about decision models
Clarke Chronicler blog source |
Marmaduke Wyvill was a British chess master and Member of Parliament in the 1800s. He was runner-up in what is considered the first major international chess tournament, London 1851, but never played in a comparable tournament again. He promoted chess and helped organize and sponsor the great London 1883 chess tournament. Here is a fount of information on the name and the man, including that he once proposed marriage to Florence Nightingale, who became a pioneer of statistics.
Today we use Wyvill’s London 1883 tournament to critique statistical models. Our critique extends to ask, how extensively are models cross-checked?
London is about to take center stage again in chess. The World Championship match between the current world champion, Magnus Carlsen of Norway, and his American challenger Fabiano Caruana will begin there on November 9. This is the first time since 1972 that an American will play for the title. The organizer is WorldChess (previously Agon Ltd.) in partnership with the World Chess Federation (FIDE).
The London 1883 tournament had two innovations. It was the first to use chess clocks. The second was that in the event of a game being drawn, the players had to play another game, twice if needed. Only after three draws would the point be considered halved, and this happened only seven times in 182 meetings. Chess clocks have been used in virtually every competition since, but the second experiment has never been repeated—the closest to it will come next year. Two scientific imports were that the time to decide on a move was regulated and games without critical action were set aside.
Chess has long been considered a (or “the”) “game of science.” It has been the focus of numerous scientific studies. Here we emphasize how it is a copious source of scientific data. Millions of games—every top-level game and nowadays many games at lower levels—have been preserved in databases. Except that the time taken by players to choose a move at each turn is recorded only sporadically, we have easy access to full information about each player’s choices of moves.
What we also have now is authoritative judgment on the true values of those choices via analysis by strong computer chess programs. Those programs, called “engines,” can beat even Carlsen and Caruana with regularity, so we humans have no standing to doubt their judgments. The programs’ values for moves are a robust quality metric, and correlate supremely with the Elo Rating, which provides a robust skill metric.
The move values are the only chess-specific input to my statistical choice model. I have covered it several times before, but not yet in the sense of going “back to square one” to say how it originated—where it fits among decision models.
This year I have overhauled the model’s 28,000+ lines of C++ code. More exactly I have “underhauled” it by chopping out stale features, removing assumptions, and simplifying operations. I widened the equations to accommodate multiple alternative models and fitting methods, besides the ones I’ve deployed to judge allegations of cheating on behalf of FIDE and other chess bodies. The main alternative discussed here is one I did already program and reject nine years ago, but having recently tried multiple other possibilities reinforces the points about models that I am making here. So let’s first see one general form of decision model and how the chess application fits the framework.
The general goal is to project the probabilities of certain decision choices or event outcomes in terms of data about the and attributes of the decision makers. An index can refer to multiple actors and/or multiple situations; we will suppress it when intent is clear. The index refers to multiple alternatives () in any situation and their probabilities . The goal for any is to infer as a function of , , and internal model parameters . The function is “the” model.
The models we consider all incorporate into a function that takes and and outputs quantities —which we will speak of as single numbers but which could be vectors over a separate index . In many settings, this represents the utility of outcome for the actor or situation , which the actor wants to maximize or at least gain enough of to satisfy needs. Insofar as depends on it is distinct from a neutral notion of “objective value” . Such a distinction was already observed in the early 1700s.
The multinomial logit model, and log-linear models in general, represent the logarithms of the probabilities as linear functions of the other elements. Using the utility function this means setting
where we have suppressed and could be multiple linear terms . This makes
for all . Then becomes a normalization constant to ensure that the probabilities sum to , dropping out to give the final equations
Fitting thus yields all the probabilities. Note that putting a difference of probabilities on the left-hand side of (1), which is the log of the ratio of the probabilities, leads to the same model and normalization (up to the sign of ). The function of normalizing exponentiated quantities is so common it has its own pet name, softmax.
These last three equations were known already in 1883 via the physicists Josiah Gibbs and Ludwig Boltzmann, with coming out in units of inverse temperature, the denominator of (3) representing the partition function of a physical system, and the numerator the Boltzmann factor. It seems curious that apart from some contemporary references by Charles Peirce they were not used in wider contexts until the World War II era. Equation (3) essentially appears as equation (1) in the 2000 Economics Nobel lecture by Daniel McFadden, who calls it “the” multinomial logit model (see also this) and traces it to work by Duncan Luce in 1959. Such pan-scientific heft makes its failure in chess all the more surprising.
In chess tournaments we have multiple players, but only one is involved in deciding each move. So we focus on one player but can treat multiple players as a group. Instead what we represent with the index is the player(s) facing multiple positions. To a large extent we can treat those decisions as independent. Even if the player executes a plan over a few moves the covariance is still sparse, and often players realize they have to revise their plans on the next turn. Thus we replace by an index signifying “turn” or “time” for each position faced by the player. Clearly we want to fit by regression over multiple turns , so and any other fitted parameters will not depend on , which again we sometimes suppress.
Each possible move at each turn is given a value by the chess engine(s) used to analyze the games. We order the moves by those values, so the engine’s first-listed move has the optimal value . In just over 90% of positions there is a unique optimal move. There are four salient ways to define the utility from these values—prefatory to involving model parameters describing the player:
Option (d) automatically scales down differences in positions where one side is significantly ahead. The same small slip that would halve one’s chances in a balanced position might only reduce 90% to 85% in a strong position or be nearly irrelevant in a losing one. I remove the most extremely unbalanced positions from samples anyway. For (c) I use a “non-sliding” scale function whose efficacy I detailed here but I can easily generate results without it. Note that if I were to cut the sample down only to balanced positions— near —of which there are a lot, then (a) and (b) become respectively equivalent to (c) and (d) anyway, up to signs which are handled by flipping the sign of .
My primary model parameter, called for “sensitivity,” is just a divisor of the values and so gets absorbed by . I have a second main parameter for “consistency” but more on it later. Having is enough to fill the dictates of the multinomial logit model in the simplest manners.
Criteria for fitting log-linear models are also a general issue. For linear regressions, least-squares fitting is distinguished by its being equivalent to maximum-likelihood estimation (MLE) under Gaussian error, but or other distance can be minimized instead. With log-linear regressions the flex is wider and MLE competes with criteria that minimize various discrepancies between quantities projected from the fitted probabilities and their actual values in the sample. Here are four of them—we will see more:
The reason for the first three in particular is that they create my three main tests for possible cheating, so I want to fit them on my training data (which now encompasses every rating level from Elo 1025 to Elo 2800 in steps of 25) to be unbiased estimators. Besides those and MLE—which here means maximizing the projected likelihood of the moves that were observed to be played (or alternately the likelihood of the observed -match/non-match sequence and various others)—my code allows composing a loss function from myriad components and weighting them ad-lib. Components unused in the fitting become the cross-checks.
Ideally, we’d like all the fitting criteria to produce similar fits—that is, close sets of fitted values for and other parameters on the same data. Finally, the code implements other modeling equations besides multinomial logit—and we’d like their results to agree too. But let’s first see how multinomial logit performs.
I analyzed the 168 official played games of London 1883 (one competitor left just after the halfway point), and separately, the 76 rejected draws, using Stockfish 7 to high depth. The former give 10,289 analyzed game turns after applying the extreme-value cutoff and a few others. Using the simple unscaled version (c) of utility and fitting according to matching gives these results for the first three fitting criteria:
Test Name ProjVal Actual Proj% Actual% z-score MoveMatch 4871.02 4871.00 47.34% 47.34% z = -0.00 EqValueMatch 5228.95 5201.00 50.82% 50.55% z = -0.73 ExpectationLoss 259.20 297.34 0.0252 0.0289 z = -10.58
This is actual output from my code, except that to avoid crowding I have elided some columns including the standard deviations on which the -scores are based. The -scores give a uniform way to judge goodness of fit. The first one is exactly zero because that was the criterion expressly fitted. The fitted model generates a projection for the second one that is higher than what actually happened at London 1883, but only slightly: the -score is within one standard deviation. The third, however, is under-projected by more than 10 standard deviations. In absolute terms it doesn’t look so bad—259 is only 13% smaller than 297—but the large -score reflects our having a lot of data. Well, there’s large and there’s huge:
Test Name ProjVal Actual Proj% Actual% z-score AvgDifference 1493.46 2780.07 0.145 0.270 z = -60.9638
The projection is only half what it should be. The -score is inconceivable.
For more cross-checks, there are the projected versus actual frequency of playing the -th best move for . Here is the table for ranks 1–10:
Rk ProjVal Actual Proj% Actual% z-score 1 4871.02 4871.00 47.34% 47.34% z = -0.00 2 1416.72 1729.00 13.80% 16.85% z = +9.44 3 761.84 951.00 7.47% 9.32% z = +7.33 4 523.25 593.00 5.19% 5.88% z = +3.20 5 401.63 410.00 4.03% 4.11% z = +0.43 6 325.30 295.00 3.29% 2.99% z = -1.73 7 272.12 247.00 2.77% 2.51% z = -1.57 8 232.05 197.00 2.37% 2.01% z = -2.36 9 200.88 169.00 2.06% 1.73% z = -2.30 10 175.95 104.00 1.81% 1.07% z = -5.54
The first row was the one fitted. Then the projections are off by three percentage points for and almost two for . For ranks 5–9 they are tantalizingly close but by they have clearly overshot—as they must for the probabilities to add to 1.
There are yet more cross-checks of even greater importance. They are the frequency with which players make errors of a given range of magnitude: a small slip, a mistake, a serious misstep, a blunder. Those results are too gruesome to show here. Fitting by MLE helps in some places but throws off the fit entirely.
The huge gaps in these and especially in the “AvgDifference” test (AD for short) rule out any patch to the log-linear model with one . I have tried adding other linear terms representing features such as a move turning an advantage into a disadvantage ( but ). They give haywire results unless the nonlinearity described next is introduced.
This is to define the utility function using a new parameter as
Without scaling this is just ; one can also use or put the power on the values separately. In forming it does not matter whether the fitted value is represented as or as . Using the notation , so that divides out the “pawn units” of and , this means that without loss of generality we can write
This makes clear that the quantity being powered is dimensionless. The motivation for is that in any quantity of the form , the marginal influence of becomes greater for large than that of . Thus can be said to govern the propensity for making large mistakes while governs the perception of small differences in value. Higher and lower correspond to higher skill. The former connotes the ability to navigate tactical minefields, the latter strategic skill of amassing small advantages.
Thus I regard as natural in chess. In my results, usually fits with values going up from below to about as the Elo level increases. This is in the rough neighborhood of square-root and definitely apart from . It also changes the calculus on a property called “independence from irrelevant alternatives,” which McFadden cites from Luce but has issues discussed e.g. here.
Since is part of the revised utility function, the model is still log-linear in the utility and the probabilities are still obtained via the procedure (1)–(3). The end-product is that having allows fitting two criteria exactly to yield them as unbiased estimators. Here are the results of fitting and AD in what is now a “log-radical()” model:
Test Name ProjVal Actual Proj% Actual% z-score MoveMatch 4870.99 4871.00 47.34% 47.34% z = +0.00 AvgDifference 2780.10 2780.07 0.2702 0.2702 z = +0.00 EqValueMatch 5261.35 5201.00 51.14% 50.55% z = -1.44 ExpectationLoss 413.42 297.35 0.0402 0.0289 z = +15.89
The equal-optimal projection remains OK. The expectation loss, however, flips from an under-projection to a vast over-projection. The cross-checks from the move ranks give further bad news:
Rk ProjVal Actual Proj% Actual% z-score 1 4870.99 4871.00 47.34% 47.34% z = +0.00 2 1123.22 1729.00 10.94% 16.85% z = +19.88 3 633.30 951.00 6.21% 9.32% z = +13.27 4 459.83 593.00 4.56% 5.88% z = +6.44 5 370.58 410.00 3.72% 4.11% z = +2.11 6 311.98 295.00 3.16% 2.99% z = -0.99 7 270.56 247.00 2.75% 2.51% z = -1.46 8 239.36 197.00 2.44% 2.01% z = -2.79 9 214.30 169.00 2.19% 1.73% z = -3.15 10 193.93 104.00 1.99% 1.07% z = -6.57
The discrepancy in the second-best move has doubled to six percentage points while the third-best move is off by more than three.
Maximum-likelihood fitting makes the gaps even worse. No re-jiggering of fitting methods nor the formula for comes anywhere close to coherence. Inconsistency in the second-best move kills everything. The fault must be tied all the way to the log-linear model for the probabilities.
As we have noted, taking the difference of logs, and inverting so that signs stay positive like so:
does not change the model. The likelihoods are still normalized arithmetically as in the Gibbs equations. Taking a difference of double logarithms, however, yields something different:
With the utility still defined as this creates a triple stack of exponentials on the right-hand side. This all looks really unnatural, but see the results it gives, now also showing the interval and large-error tests that were “too gruesome” before:
Test Name ProjVal Actual Proj% Actual% z-score MoveMatch 4871.02 4871.00 47.34% 47.34% z = -0.00 AvgScaledDiff 1142.61 1142.59 0.111 0.111 z = +0.00 EqValueMatch 5251.90 5201.00 51.04% 50.55% z = -1.10 ExpectationLoss 333.20 334.46 0.0324 0.0325 z = -0.19 Rk ProjVal Sigma Actual Proj% Actual% z-score 1 4871.02 47.02 4871.00 47.34% 47.34% z = -0.00 2 1786.89 37.32 1729.00 17.41% 16.85% z = -1.55 3 929.87 28.60 951.00 9.11% 9.32% z = +0.74 4 589.93 23.29 593.00 5.85% 5.88% z = +0.13 5 419.35 19.84 410.00 4.21% 4.11% z = -0.47 6 315.24 17.32 295.00 3.19% 2.99% z = -1.17 7 246.68 15.39 247.00 2.51% 2.51% z = +0.02 8 198.71 13.85 197.00 2.03% 2.01% z = -0.12 9 161.54 12.52 169.00 1.65% 1.73% z = +0.60 10 134.18 11.43 104.00 1.38% 1.07% z = -2.64 11 111.41 10.43 97.00 1.15% 1.00% z = -1.38 12 93.90 9.59 99.00 0.97% 1.02% z = +0.53 13 77.94 8.75 76.00 0.81% 0.79% z = -0.22 14 65.40 8.02 78.00 0.68% 0.82% z = +1.57 15 55.13 7.37 62.00 0.58% 0.65% z = +0.93 Selec. Test ProjVal Actual Proj% Actual% z-score Delta01-10 656.08 645.00 6.38% 6.27% z = -0.56 Delta11-30 800.75 824.00 7.78% 8.01% z = +1.02 Delta31-70 596.51 607.00 5.80% 5.90% z = +0.50 Delt71-150 295.70 290.00 2.87% 2.82% z = -0.38 Error>=050 709.46 675.00 6.90% 6.56% z = -1.55 Error>=100 331.88 300.00 3.23% 2.92% z = -2.02 Error>=200 141.56 114.00 1.38% 1.11% z = -2.62 Error>=400 68.76 35.00 0.67% 0.34% z = -4.61
Only the first two lines have been fitted. The other lines follow like obedient ducks—and this persists through all tournaments that I have run.
There are some wobbles that also persist: The second-best move is somewhat over-projected and the third-best move slightly under—but the remaining indices are off by small amounts whose signs seem random. So are the interval tests at the end, except that large errors are over-projected. The match to moves of equal-optimal worth tends to be over-projected regardless of the patch described here. Nevertheless, the overall fidelity under so much cross-validation is an amazing change from the log-linear cases.
The most particular issue I see grants that the original log-linear formulation could be fine for a one-shot purpose, say if the match-to- cheating test were the only thing cared about. The concern is that in the absence of validation beyond what is needed for that, “mission creep” could extend the usage unknowingly into flawed territory. It is important to me that a model should score well on a larger slate of pertinent phenomena. Do other models have as rich a field of data and cross-checks as in chess?
Is there extensive literature on modeling the double logarithms of probabilities—and on representing probabilities as powers rather than multiples of the “pivot” ? We have seen scant references. The term “log-log model” instead refers to having a logarithm on both sides, e.g., . Alternatives to log-linear models need to be more conscious of error terms in the utility functions, so perhaps uncertainty needs a more express representation in my formulas.
The form where does have the general issue that when should be very close to 1—as for a completely obvious move in chess—there is strain on getting the exponents large enough to make tiny. The over-projection of large errors ( too high) is a symptom of this. Some of my past posts give my thinking on this, but the implementations have been hard to control, so I would be grateful to hear reader thoughts.
[some name and word fixes]
Authors: Georgia Avarikioti, Yuyi Wang, Roger Wattenhofer
Download: PDF
Abstract: Payment networks, also known as channels, are a most promising solution to
the throughput problem of cryptocurrencies. In this paper we study the design
of capital-efficient payment networks, offline as well as online variants. We
want to know how to compute an efficient payment network topology, how capital
should be assigned to the individual edges, and how to decide which
transactions to accept. Towards this end, we present a flurry of interesting
results, basic but generally applicable insights on the one hand, and hardness
results and approximation algorithms on the other hand.
Authors: Georgia Avarikioti, Gerrit Janssen, Yuyi Wang, Roger Wattenhofer
Download: PDF
Abstract: Payment channels are the most prominent solution to the blockchain
scalability problem. We introduce the problem of network design with fees for
payment channels from the perspective of a Payment Service Provider (PSP).
Given a set of transactions, we examine the optimal graph structure and fee
assignment to maximize the PSP's profit. A customer prefers to route
transactions through the PSP's network if the cheapest path from sender to
receiver is financially interesting, i.e., if the path costs less than the
blockchain fee. When the graph structure is a tree, and the PSP facilitates all
transactions, the problem can be formulated as a linear program. For a path
graph, we present a polynomial time algorithm to assign optimal fees. We also
show that the star network, where the center is an additional node acting as an
intermediary, is a near-optimal solution to the network design problem.
Authors: Kilian Grage, Klaus Jansen, Kim Manuel Klein
Download: PDF
Abstract: Machine scheduling is a fundamental optimization problem in computer science.
The task of scheduling a set of jobs on a given number of machines and
minimizing the makespan is well studied and among other results, we know that
EPTAS's for machine scheduling on identical machines exist. Das and Wiese
initiated the research on a generalization of makespan minimization, that
includes so called bag-constraints. In this variation of machine scheduling the
given set of jobs is partitioned into subsets, so called bags. Given this
partition a schedule is only considered feasible when on any machine there is
at most one job from each bag.
Das and Wiese showed that this variant of machine scheduling admits a PTAS. We will improve on this result by giving the first EPTAS for the machine scheduling problem with bag-constraints. We achieve this result by using new insights on this problem and restrictions given by the bag-constraints. We show that, to gain an approximate solution, we can relax the bag-constraints and ignore some of the restrictions. Our EPTAS uses a new instance transformation that will allow us to schedule large and small jobs independently of each other for a majority of bags. We also show that it is sufficient to respect the bag-constraint only among a constant number of bags, when scheduling large jobs. With these observations our algorithm will allow for some conflicts when computing a schedule and we show how to repair the schedule in polynomial-time by swapping certain jobs around.
Authors: Niv Buchbinder, Anupam Gupta, Marco Molinaro, Joseph Naor
Download: PDF
Abstract: We consider the $k$-server problem on trees and HSTs. We give an algorithms
based on %the convex-programming primitive of Bregman projections. This
algorithm has a competitive ratios that match some of the recent results given
by Bubeck et al. (STOC 2018), whose algorithm was based on mirror-descent-based
continuous dynamics prescribed via a differential inclusion.
Authors: Paweł Gawrychowski, Florin Manea, Radosław Serafin
Download: PDF
Abstract: For $k\geq 3$, a k-rollercoaster is a sequence of numbers whose every maximal
contiguous subsequence, that is increasing or decreasing, has length at least
$k$; $3$-rollercoasters are called simply rollercoasters. Given a sequence of
distinct numbers, we are interested in computing its maximum-length (not
necessarily contiguous) subsequence that is a $k$-rollercoaster. Biedl et al.
[ICALP 2018] have shown that each sequence of $n$ distinct real numbers
contains a rollercoaster of length at least $\lceil n/2\rceil$ for $n>7$, and
that a longest rollercoaster contained in such a sequence can be computed in
$O(n\log n)$-time. They have also shown that every sequence of $n\geq
(k-1)^2+1$ distinct real numbers contains a $k$-rollercoaster of length at
least $\frac{n}{2(k-1)}-\frac{3k}{2}$, and gave an $O(nk\log n)$-time algorithm
computing a longest $k$-rollercoaster in a sequence of length $n$.
In this paper, we give an $O(nk^2)$-time algorithm computing the length of a longest $k$-rollercoaster contained in a sequence of $n$ distinct real numbers; hence, for constant $k$, our algorithm computes the length of a longest $k$-rollercoaster in optimal linear time. The algorithm can be easily adapted to output the respective $k$-rollercoaster. In particular, this improves the results of Biedl et al. [ICALP 2018], by showing that a longest rollercoaster can be computed in optimal linear time. We also present an algorithm computing the length of a longest $k$-rollercoaster in $O(n \log^2 n)$-time, that is, subquadratic even for large values of $k\leq n$. Again, the rollercoaster can be easily retrieved. Finally, we show an $\Omega(n \log k)$ lower bound for the number of comparisons in any comparison-based algorithm computing the length of a longest $k$-rollercoaster.
Authors: Tanuj Mathur, Tian An Wong
Download: PDF
Abstract: We outline a general algorithm for verifying whether a subset of the integers
is a more sum than differences (MSTD) set, also known as sum dominated sets,
and give estimates on its computational complexity. We conclude with some
numerical results on large MSTD sets and MSTD subsets of $[1,N]\cap \mathbb Z$
for $N$ up to 160.
Authors: Bahman Kalantari
Download: PDF
Abstract: A fundamental problem in linear programming, machine learning, and
computational geometry is the {\it Convex Hull Membership} (CHM): Given a point
$p$ and a subset $S$ of $n$ points in $\mathbb{R}^m$, is $p \in conv(S)$? The
{\it Triangle Algorithm} (TA) computes $p' \in conv(S)$ so that, either $\Vert
p'- p \Vert \leq \varepsilon R$, $R= \max \{\Vert p -v \Vert: v\in S\}$; or
$p'$ is a {\it witness}, i.e. the orthogonal bisector of $pp'$ separates $p$
from $conv(S)$. By the {\it Spherical}-CHM we mean a CHM, where $p=0$, $\Vert v
\Vert=1$, $\forall v \in S$. First, we prove the equivalence of exact and
approximate versions of CHM and Spherical-CHM. On the one hand, this makes it
possible to state a simple $O(1/\varepsilon^2)$ iteration TA, each taking
$O(n+m)$ time. On the other hand, using this iteration complexity we prove if
for each $p' \in conv(S)$ with $\Vert p \Vert > \varepsilon$ that is not a
witness there is $v \in S$ with $\Vert p' - v \Vert \geq \sqrt{1+
\varepsilon}$, the iteration complexity of TA reduces to $O(1/\varepsilon)$.
This matches complexity of Nesterov's fast-gradient method. The analysis also
suggests a strategy for when the property does not hold at an iterate. Lastly,
as an application of TA, we show how to solve strict LP feasibility as a dual
of CHM. In summary, TA and the Spherical-CHM provide a convenient geometric
setting for efficient solution to large-scale CHM and related problems, such as
computing all vertices of $conv(S)$.
Authors: M. Kaloorazi, R. C. de Lamare
Download: PDF
Abstract: Low-rank matrix approximations play a fundamental role in numerical linear
algebra and signal processing applications. This paper introduces a novel
rank-revealing matrix decomposition algorithm termed Compressed Randomized UTV
(CoR-UTV) decomposition along with a CoR-UTV variant aided by the power method
technique. CoR-UTV is primarily developed to compute an approximation to a
low-rank input matrix by making use of random sampling schemes. Given a large
and dense matrix of size $m\times n$ with numerical rank $k$, where $k \ll
\text{min} \{m,n\}$, CoR-UTV requires a few passes over the data, and runs in
$O(mnk)$ floating-point operations. Furthermore, CoR-UTV can exploit modern
computational platforms and, consequently, can be optimized for maximum
efficiency. CoR-UTV is simple and accurate, and outperforms reported
alternative methods in terms of efficiency and accuracy. Simulations with
synthetic data as well as real data in image reconstruction and robust
principal component analysis applications support our claims.
Authors: Francesco Pelosin
Download: PDF
Abstract: We are living in a world which is getting more and more interconnected and,
as physiological effect, the interaction between the entities produces more and
more information. This high throughput generation calls for techniques able to
reduce the volume of the data, but still able to preserve the carried
knowledge. Data compression and summarization techniques are one of the
possible approaches to face such problems. The aim of this thesis is to devise
a new pipeline for compressing and decompressing a graph by exploiting
Szemer\'edi's Regularity Lemma. In particular, it has been developed a
procedure called CoDec (Compression-Decompression) which is based on Alon et
al's constructive version of the Regularity Lemma. We provide an extensive
experimental evaluation to measure how robust is the framework as we both
corrupt the structures carried by the graph and add noisy edges among them. The
experimental results make us confident that our method can be effectively used
as a graph compression technique able to preserve meaningful patterns of the
original graph.
Authors: Jayesh Choudhari, Manoj Gupta, Shivdutt Sharma
Download: PDF
Abstract: We design a space-efficient algorithm for performing depth-first search
traversal(DFS) of a graph in $O(m+n\log^* n)$ time using $O(n)$ bits of space.
While a normal DFS algorithm results in a DFS-tree (in case the graph is
connected), our space bounds do not permit us even to store such a tree.
However, our algorithm correctly outputs all edges of the DFS-tree.
The previous best algorithm (which used $O(n)$ working space) took $O(m \log n)$ time (Asano, Izumi, Kiyomi, Konagaya, Ono, Otachi, Schweitzer, Tarui, Uehara (ISAAC 2014) and Elmasry, Hagerup, Krammer (STACS 2015)). The main open question left behind in this area was to design faster algorithm for DFS using $O(n)$ bits of space. Our algorithm answers this open question as it has a nearly optimal running time (as the DFS takes $O(m+n)$ time even if there is no space restriction).
I was recently asked to chair the 4th Highlights of Algorithms conference (HALG 2019). HALG 2019 will be held in Copenhagen on June 14-16, 2019. First, I wish to see many of you there. HALG works very well and for me it was one of the best algorithmic conferences I attended in recent years. You can read more in a report in EATCS bulletin. Second, I hope we will have a great program thanks to the help of the following people in the PC:
Susanne Albers (Technical University of Munich)
Edith Cohen (Google & Tel Aviv University)
Shiri Chechik (Tel Aviv University)
Fabian Kuhn (University of Freiburg)
Seffi Naor (Technion)
Marcin Pilipczuk (University of Warsaw)
Piotr Sankowski (chair - University of Warsaw)
David Shmoys (Cornell University)
Ola Svennson (EPFL)
Mikkel Thorup (University of Copenhagen)
Gregory Valiant (Stanford University)
Ryan Williams (MIT)
The first batch of invitations will be out soon together with the call for nominations, so do not be surprised if you get one.
Authors: Therese Biedl
Download: PDF
Abstract: A segment representation of a graph is an assignment of line segments in 2D
to the vertices in such a way that two segments intersect if and only if the
corresponding vertices are adjacent. Not all graphs have such segment
representations, but they exist, for example, for all planar graphs.
In this note, we study the resolution that can be achieved for segment representations, presuming the ends of segments must be on integer grid points. We show that any planar graph (and more generally, any graph that has a so-called $L$-representation) has a segment representation in a grid of width and height $4^n$.
Authors: Esther Galby, Andrea Munaro, Bernard Ries
Download: PDF
Abstract: A semitotal dominating set of a graph $G$ with no isolated vertex is a
dominating set $D$ of $G$ such that every vertex in $D$ is within distance two
of another vertex in $D$. The minimum size $\gamma_{t2}(G)$ of a semitotal
dominating set of $G$ is squeezed between the domination number $\gamma(G)$ and
the total domination number $\gamma_{t}(G)$.
\textsc{Semitotal Dominating Set} is the problem of finding, given a graph $G$, a semitotal dominating set of $G$ of size $\gamma_{t2}(G)$. In this paper, we continue the systematic study on the computational complexity of this problem when restricted to special graph classes. In particular, we show that it is solvable in polynomial time for the class of graphs with bounded mim-width by a reduction to \textsc{Total Dominating Set} and we provide several approximation lower bounds for subclasses of subcubic graphs. Moreover, we obtain complexity dichotomies in monogenic classes for the decision versions of \textsc{Semitotal Dominating Set} and \textsc{Total Dominating Set}.
Finally, we show that it is $\mathsf{NP}$-complete to recognise the graphs such that $\gamma_{t2}(G) = \gamma_{t}(G)$ and those such that $\gamma(G) = \gamma_{t2}(G)$, even if restricted to be planar and with maximum degree at most $4$, and we provide forbidden induced subgraph characterisations for the graphs heriditarily satisfying either of these two equalities.
Authors: Marek Cygan, Paweł Komosa, Daniel Lokshtanov, Michał Pilipczuk, Marcin Pilipczuk, Saket Saurabh
Download: PDF
Abstract: The randomized contractions technique, introduced by Chitnis et al. in 2012,
is a robust framework for designing parameterized algorithms for graph
separation problems. On high level, an algorithm in this framework recurses on
balanced separators while possible, and in the leaves of the recursion uses
high connectivity of the graph at hand to highlight a solution by color coding.
In 2014, a subset of the current authors showed that, given a graph $G$ and a budget $k$ for the cut size in the studied separation problem, one can compute a tree decomposition of $G$ with adhesions of size bounded in $k$ and with bags exhibiting the same high connectivity properties with respect to cuts of size at most $k$ as in the leaves of the recursion in the randomized contractions framework. This led to an FPT algorithm for the Minimum Bisection problem.
In this paper, we provide a new construction algorithm for a tree decomposition with the aforementioned properties, by using the notion of lean decompositions of Thomas. Our algorithm is not only arguably simpler than the one from 2014, but also gives better parameter bounds; in particular, we provide best possible high connectivity properties with respect to edge cuts. This allows us to provide $2^{O(k \log k)} n^{O(1)}$-time parameterized algorithms for Minimum Bisection, Steiner Cut, and Steiner Multicut.
Authors: Stefan Kratsch, Shaohua Li, Dániel Marx, Marcin Pilipczuk, Magnus Wahlström
Download: PDF
Abstract: We study multi-budgeted variants of the classic minimum cut problem and graph
separation problems that turned out to be important in parameterized
complexity: Skew Multicut and Directed Feedback Arc Set. In our generalization,
we assign colors $1,2,...,\ell$ to some edges and give separate budgets
$k_{1},k_{2},...,k_{\ell}$. Let $E_{i}$ be the set of edges of color $i$. The
solution $C$ for the multi-budgeted variant of a graph separation problem not
only needs to satisfy the usual separation requirements, but also needs to
satisfy that $|C\cap E_{i}|\leq k_{i}$ for every $i\in \{1,...,\ell\}$.
Contrary to the classic minimum cut problem, the multi-budgeted variant turns out to be NP-hard even for $\ell = 2$. We propose FPT algorithms parameterized by $k=k_{1}+...+k_{\ell}$ for all three problems. To this end, we develop a branching procedure for the multi-budgeted minimum cut problem that measures the progress of the algorithm not by reducing $k$ as usual, by but elevating the capacity of some edges and thus increasing the size of maximum source-to-sink flow. Using the fact that a similar strategy is used to enumerate all important separators of a given size, we merge this process with the flow-guided branching and show an FPT bound on the number of (appropriately defined) important multi-budgeted separators. This allows us to extend our algorithm to the Skew Multicut and Directed Feedback Arc Set problems.
Furthermore, we show connections of the multi-budgeted variants with weighted variants of the directed cut problems and the Chain $\ell$-SAT problem, whose parameterized complexity remains an open problem. We show that these problems admit a bounded-in-parameter number of "maximally pushed" solutions (in a similar spirit as important separators are maximally pushed), giving somewhat weak evidence towards their tractability.
Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi
Download: PDF
Abstract: The problem of devising learning strategies for discrete losses (e.g.,
multilabeling, ranking) is currently addressed with methods and theoretical
analyses ad-hoc for each loss. In this paper we study a least-squares framework
to systematically design learning algorithms for discrete losses, with
quantitative characterizations in terms of statistical and computational
complexity. In particular we improve existing results by providing explicit
dependence on the number of labels for a wide class of losses and faster
learning rates in conditions of low-noise. Theoretical results are complemented
with experiments on real datasets, showing the effectiveness of the proposed
general approach.
Authors: Josh Alman
Download: PDF
Abstract: The Light Bulb Problem is one of the most basic problems in data analysis.
One is given as input $n$ vectors in $\{-1,1\}^d$, which are all independently
and uniformly random, except for a planted pair of vectors with inner product
at least $\rho \cdot d$ for some constant $\rho > 0$. The task is to find the
planted pair. The most straightforward algorithm leads to a runtime of
$\Omega(n^2)$. Algorithms based on techniques like Locality-Sensitive Hashing
achieve runtimes of $n^{2 - O(\rho)}$; as $\rho$ gets small, these approach
quadratic.
Building on prior work, we give a new algorithm for this problem which runs in time $O(n^{1.582} + nd),$ regardless of how small $\rho$ is. This matches the best known runtime due to Karppa et al. Our algorithm combines techniques from previous work on the Light Bulb Problem with the so-called `polynomial method in algorithm design,' and has a simpler analysis than previous work. Our algorithm is also easily derandomized, leading to a deterministic algorithm for the Light Bulb Problem with the same runtime of $O(n^{1.582} + nd),$ improving previous results.
Authors: Andrzej Grzesik, Tereza Klimošová, Marcin Pilipczuk, Michał Pilipczuk
Download: PDF
Abstract: In the classic Maximum Weight Independent Set problem we are given a graph
$G$ with a nonnegative weight function on vertices, and the goal is to find an
independent set in $G$ of maximum possible weight. While the problem is NP-hard
in general, we give a polynomial-time algorithm working on any $P_6$-free
graph, that is, a graph that has no path on $6$ vertices as an induced
subgraph. This improves the polynomial-time algorithm on $P_5$-free graphs of
Lokshtanov et al. (SODA 2014), and the quasipolynomial-time algorithm on
$P_6$-free graphs of Lokshtanov et al (SODA 2016). The main technical
contribution leading to our main result is enumeration of a polynomial-size
family $\mathcal{F}$ of vertex subsets with the following property: for every
maximal independent set $I$ in the graph, $\mathcal{F}$ contains all maximal
cliques of some minimal chordal completion of $G$ that does not add any edge
incident to a vertex of $I$.
The Department of Computer Science at Purdue University solicits applications for at least two tenure-track or tenured positions. We are particularly interested in candidates whose work focuses on the design and analysis of algorithms, randomness in computation, graph algorithms, and quantum computing. Highly qualified applicants in other areas will be considered.
Website: https://www.cs.purdue.edu/hiring/index.html
Email: fac-search@cs.purdue.edu
Caltech’s Center for the Mathematics of Information (CMI) announces openings in the CMI Postdoctoral Fellowship Program, for positions beginning in Fall 2019. The CMI is dedicated to fundamental mathematical research with an eye to the roles of information and computation throughout science and engineering.
Website: http://cms.caltech.edu/about/cmi
Email: sydney@cms.caltech.edu
Authors: Daniel J. Saunders
Download: PDF
Abstract: We consider the power of Boolean circuits with MOD$_{6}$ gates. First, we
introduce a few basic notions of computational complexity, and describe the
standard models with which we study the complexity of problems. We then define
the model of Boolean circuits, equate a restricted class of circuits with an
algebraic model, and present some results from working with this algebra.
Authors: Daniel Lemire, Melissa E. O'Neill
Download: PDF
Abstract: L'Ecuyer & Simard's Big Crush statistical test suite has revealed statistical
flaws in many popular random number generators including Marsaglia's XorShift
generators. Vigna recently proposed some 64-bit variations on the Xorshift
scheme that are further scrambled (i.e., Xorshift1024*, Xorshift1024+,
Xorshift128+, Xoroshiro128+). Unlike their unscrambled counterparts, they pass
Big Crush when interleaving blocks of 32 bits for each 64-bit word (most
significant, least significant, most significant, least significant, etc.). We
report that these scrambled generators systematically fail Big
Crush---specifically the linear-complexity and matrix-rank tests that detect
linearity---when taking the 32 lowest-order bits in reverse order from each
64-bit word.