Theory of Computing Blog Aggregator

There will be a postdoctoral position starting any time in 2016 funded by the CompA ANR. This project concerns algebraic complexity, in particular lower bounds and structural properties for arithmetic circuits. The position is for 12 months with the possibility of contract renewal, if started early enough.



by theorycsjobs at October 13, 2015 02:51 PM UTC

A simple idea that everyone missed, and more?

Composite of src1, src2, src3

—A myth of a myth of a myth?

Joshua Miller and Adam Sanjurjo (MS) have made a simple yet striking insight about the so-called hot hand fallacy.

Today Ken and I want to discuss their insight, suggest an alternate fix, and reflect on what it means for research more broadly.

I believe what is so important about their result is: It uses the most elementary of arguments from probability theory, yet seems to shed new light on a well-studied, much-argued problem. This achievement is wonderful, and gives me hope that there are still simple insights out there, simple insights that we all have missed—yet insights that could dramatically change the way we look at some of our most important open problems.

I would like to point out that I first saw their paper mentioned in the technical journal WSJ—the Wall Street Journal.

The Fallacy

Our friends at Wikipedia make it very clear that they believe there is no hot-hand at all:

The hot-hand fallacy (also known as the “hot hand phenomenon” or “hot hand”) is the fallacious belief that a person who has experienced success with a random event has a greater chance of further success in additional attempts. The concept has been applied to gambling and sports, such as basketball.

The fallacy started as a serious study back in 1985, when Thomas Gilovich, Amos Tversky, and Robert Vallone (GTV) published a paper titled “The Hot Hand in Basketball: On the Misperception of Random Sequences” (see also this followup). In many sports, especially basketball, players often seem to get hot. For example, an NBA player who normally usually shoots under 50% from the field may make five shots in a row in a short time. This suggests that he is “in the zone” or has a “hot hand.” His teammates may pass him the ball more often in expectation of his staying “hot.”

Yet GTV and many subsequent studies say that this is wrong. They seem to show rather that shooting is essentially a series of independent events: that each shot is independent from previous ones, and that this is equally true of free throws. Of course given the nature of this problem, there is no way to prove mathematically that hot hands do not exist. There are now many papers, websites, and a continuing debate about whether hot hands are real or not.

The Fallacy Of The Fallacy

Let me start by saying that Ken and I are mostly on the fence about the fallacy. I can imagine many situations that make it mathematically possible. For one, what if a player in basketball is not doing independent trials but rather executing a Markov process. That is, of course, a fancy way of saying that the player has “state.” Over a long term they will make about {75\%} of their free-throws, but there are times when they will shoot at a much higher percentage. And, of course, times when they will shoot at a much lower percentage—a “cold hand” is more obvious with free throws and may lead the other team to choose to foul that player.

I personally once experienced something that seems to suggest that there is a hot hand. I was a poor player of stick-ball when I was young, but one day played like the best in the neighborhood. Oh well. Let me explain this in more detail another day.

Ken chimes in: This folds into a larger question that is not fallacious, and that concerns me as I monitor chess tournaments:

What is the time or game unit of being in form?

In chess I think the tournament more than game or move is the “unit of form” though I am still a long way from rigorously testing this. In baseball the saying is, basically, “Momentum stops with tomorrow’s opposing starting pitcher.”

Nevertheless, I (Ken) wonder how far “hot hand” has been tested in fantasy baseball. In my leagues on Yahoo! one can exchange players at any time. Last month I dropped the slumping Carlos Gomez in two leagues even though Yahoo! still ranked him the 8th-best player by potential. In one league I picked up Jake Marisnick, another Houston outfielder who had been hot. Marisnick soon went cold with the bat but stole 6 bases to stay within the top 100 performers anyway. The leagues ended with the regular season, but had they continued into the playoffs I definitely would have picked up Houston’s Colby Rasmus after 3 homers in 3 games. While I’ve been writing the last three main sections of this post, Rasmus hit another homer today.

The New Insight

MS have a new paper with the long title, “Surprised by the Gambler’s and Hot Hand Fallacies? A Truth in the Law of Small Numbers.” Their paper is interesting enough that it has already stimulated another paper by Yosef Rinott and Maya Bar-Hillel, who do a great job of explaining what is going on. The statistician and social science author Andrew Gelman explained the basic point in even simpler terms on his blog in July, and we will follow his lead.

For the simplest case consider sequences of {n} coin tosses. We will generate lots of these sequences at random. We will look at times the coin comes up heads ({H}) and say it is “hot” if the next throw is also {H}. If the sequence is all tails, however:

\displaystyle  TT \dots TT,

then we must discard it since there is no {H} to look at. This is our first hint of a lurking bias. Likewise if only the last throw is heads,

\displaystyle  TT\dots TH,

there is no chance for a hot hand since the “game is over.” Hence it seems natural to use the following sampling procedure:

  1. Generate a sequence {s} at random.

  2. If it has no head in the first {n-1} places, discard it.

  3. Else pick a head at random from those places. That is, pick {i \leq n-1} such that {s_i = H}.

  4. Score a “hit” if {s_{i+1} = H}, else “miss.”

Let HHS be the number of hits after {M} trials—not counting discarded ones—divided by {M}. That is our “hot hand score.” We expect HHS to converge to {0.5} as {M} gets large. Indeed, there seems a simple way to prove it: Consider any head {H} that we chose in step {3}. The next flip {s_{i+1}} is equally likely to be {H} or {T}. If it is {H} we score a hit, else a miss. Hence our expected hit ratio will be 50%. Q.E.D. And wrong.

To see why, consider {n = 3}. The sequences {TTT} and {TTH} are discarded, so we can skip them. Here are the other six sequences and their expected contribution to our hit score. Remember only a head in the first two places is selected.

\displaystyle  \begin{array}{c|c|c} Seq. & Prob. & Score\\ \hline THT & 0/1 & 0\\ THH & 1/1 & 1\\ HTT & 0/1 & 0\\ HTH & 0/1 & 0\\ HHT & 1/2 & 0.5\\ HHH & 2/2 & 1\\ \hline \end{array}

This gives HHS {= 2.5/6 \approx 0.417}, not {0.5}. What happened?

A Statistical Heads-Up

What happened is that the heads are not truly chosen uniformly at random. Look again at the six rows. There are 8 heads total in the first two characters. Choose one of them at random over the whole data set. Then you see 4 heads are followed by {H} and the other 4 by {T}. So we get the expected 50%. The flaw in our previous sampling is that it was shortchanging the two heads-rich sequences, weighting each {1/6} instead of {1/4}. This is the bias.

You might think the bias would quickly lessen as the sequence length grows but that too is wrong. Try {n = 4}. You get the weird fraction {\frac{17}{42}}. This equals {0.405\dots}, which has gone down. For {n = 5} we get {\frac{147}{360} = 0.408\dots}. As {n \rightarrow \infty} the expectation does come back up to {1/2} but for {n = 20}, which is a typical high-end for shots by a player in an NBA game, it is still under {0.474}. Rinott and Bar-Hillel derive the {k = 1} case with probabilities {p} of heads and {q = 1-p} of tails and {m = n-1} as

\displaystyle  \frac{p}{m} + \frac{p}{1-q^{m}} - \frac{1}{m} = \frac{p}{1-q^{m}} - \frac{1-p}{m} = \frac{1-q}{1-q^{m}} - \frac{q}{m}.

The bias persists if we condition on {k} heads in a row. “Appendix B” in the MS paper attempts to find a formula like that above for {k > 1} but has to settle on partial results that imply growth with {k} among other things. We can give some concrete numbers: For {k = 2} and {n = 4} the chance of seeing heads next is the same {2.5/6 \approx 0.417}. For {k=2} and {n=6} it has dipped to {3/8 = 0.375} with 17 of 32 sequences being thrown out. The larger point, however, is that these numbers represent the expectation for any one sequence {s} that you generate, provided you follow the policy of selecting at random from all the {H^k} subsequences in {s} through place {n-1} and predicate on the next place being {H}.

Did GTV really follow this sampling strategy? It seems yes. It comes about if you follow the letter of conditioning on the event that the previous {k} flips were heads. Rinott and Bar-Hillel quote words to that effect from the GTV paper, ones also saying they compared the probability conditioned on the last {k} shots being misses. The gist of what they and MS are saying is:

If you did your hot-hand study as above and got 50%, then chances are the process from which you drew the data really was “hot.”

And since the way of conditioning on misses has a bias toward {0.6}, if you got {0.5} again then your data might really also show a “Cold Hand.” It should be noted, however, that the GTV study of free throws is not affected by this issue and indeed already abides by our ‘grid’ suggestion below—see also the discussion by MS beginning at page 10 here.

Subtler Bias Still

MS give some other explanations of the bias, one roughly as follows: If a sequence starts with {k} heads and is followed by {T}, then you had no choice but to select those {k} heads. But if they are followed by another {H}, then we do have a choice: we could select the latter {k} heads instead. This has a chance of failure, so the positive case is not as rich as the negative one. Getting positive results—more heads—also gives more ways to do selections that yield negative results.

This suggests a fix, but the fix doesn’t work: Let us only condition on sequences of {k} heads that come after a tail or start at the beginning of the game. The above issue goes away, but now there is a bias in the other direction. It shows first with {k = 1} and {n = 4}; we have grouped sequences differing on the last bit:

\displaystyle  \begin{array}{c|c|c} Seq. & Prob. & Score\\ \hline TTTT,\;TTTH & - & -\\ TTHT,\;TTHH & 1/2 & 0.5\\ THTT,\;THTH & 0/2 & 0\\ THHT, \; THHH & 2/2 & 1\\ HTTT, \; HTTH & 0/2 & 0\\ HTHT, \; HTHH & 1/4 & 0.25\\ HHTT, \; HHTH & 2/2 & 1\\ HHHT, \; HHHH & 2/2 & 1\\ \hline \end{array}

This gives {3.75/7 \approx 0.536}. The main culprit again is the selection within the sequence {HTH*}, with the discarding of {TTT*} also a factor.

A Grid Idea

This train of thought led me (Ken) to an ironclad fix, different from a test combining conditionals by MS which they refer to their 2014 working paper. It simply allows only one selection from each non-discarded sequence, namely, testing the bit after the first {k} consecutive heads. The line {HTH*} then gives {0/2} and makes the whole table {3.5/7 = 0.5} overall.

That this is unbiased is easy to prove: Given {s} in the set {S} of non-discarded sequences, define {s'} to be the sequence obtained by flipping the bit after the first {k} consecutive heads. This is invertible—{(s')' = s} back again—and flips the result. Hence the next-bit expectation is {0.5}. This works similarly for any probability {p} of heads and does not care whether the overall sequence length {n} is kept constant between samples. Hence our proposal is:

Break up the data into subsequences {s_j} of any desired lengths {n_j}. The {n_j} may depend on the lengths of the given sequences (e.g., how many shots each player surveyed took) but of course not on the data bits themselves. For every {s_j} that has a run of {k} consecutive “hits” in the first {n_j - 1} places, choose the first such sequence and test whether the next bit is “hit” or “miss.”

The downsides to putting this kind of grid on the data are sacrificing some samples—that is, discarding cases of {H^k} that cross gridlines or come second in an {s_j}—and the arbitrary nature of choosing rules for {n_j}. But it eliminates the bias. We can also re-sample by randomly choosing other grid divisions. The samples would be correlated but estimation of the means would be valid, and now every sequence of {H^k} might be used as a condition. The re-sampling fixes the original bias in choosing every such sequence singly as the condition.

A larger lesson is that hidden bias might be rooted out by clarifying the algorithms by which data is collated. Researchers count as part of the public addressed in articles on how not to go astray such as this and this, both of which highlight pitfalls of conditional probability.

Open Problems

Has our grid idea been tried? Does it show a “hot hand”?

by RJLipton+KWRegan at October 13, 2015 04:05 AM UTC

I just recalled a nice mathematical riddle. I can’t remember where I originally read it, but it was likely in one of the blogs that are in the blogroll to the right. Riddle. A game takes place where person A and person B are on the same team while person Z is their adversary. There […]

by Adam Sheffer at October 13, 2015 12:37 AM UTC

An early version of Moore's law is as follows:

HARDWARE: The number of transistors on an integrated circuits doubles every 18 MONTHS.

Moore's law is  often used to refer to the following:

SPEED: The speed of computers due to hardware doubles every 18 MONTHS.

There are other versions as well. I've heard that some versions of it are no longer working  (it couldn't go on forever). But what about the gains made NOT by hardware? Is there a Moore's Law of Code Optimization?

There is! Its called  Proebstring's law

SPEED: The speed of computers due to code opt doubles every  18 YEARS.

The paper pointed to gives evidence for this law.
So is Code Opt worth it? I've heard the following:

1) Some of the Code Opts eventually go into the hardware, so its not quite fair to say that Code Opts improve speed that slowly.

2) Any improvement is worth having.

3) Being forced to think about such issues leads to other benefits.

by GASARCH ( at October 12, 2015 03:52 AM UTC

Authors: Bruce Hajek, Yihong Wu, Jiaming Xu
Download: PDF
Abstract: The stochastic block model for one community with parameters $n, K, p,$ and $q$ is considered: $K$ out of $n$ vertices are in the community; two vertices are connected by an edge with probability $p$ if they are both in the community and with probability $q$ otherwise, where $p > q > 0$ and $p/q$ is assumed to be bounded. An estimator based on observation of the graph $G=(V,E)$ is said to achieve weak recovery if the mean number of misclassified vertices is $o(K)$ as $n \to \infty$. A critical role is played by the effective signal-to-noise ratio $\lambda=K^2(p-q)^2/((n-K)q).$ In the regime $K=\Theta(n)$, a na\"{i}ve degree-thresholding algorithm achieves weak recovery in $O(|E|)$ time if $\lambda \to \infty$, which coincides with the information theoretic possibility of weak recovery.

The main focus of the paper is on weak recovery in the sublinear regime $K=o(n)$ and $np = n^{o(1)}.$ It is shown that weak recovery is provided by a belief propagation algorithm running for $\log^\ast(n)+O(1) $ iterations, if $\lambda > 1/e,$ with the total time complexity $O(|E| \log^*n)$. Conversely, no local algorithm with radius $t$ of interaction satisfying $t = o(\frac{\log n}{\log(2+np)})$ can asymptotically outperform trivial random guessing if $\lambda \leq 1/e.$ By analyzing a linear message-passing algorithm that corresponds to applying power iteration to the non-backtracking matrix of the graph, we provide evidence to suggest that spectral methods fail to provide weak recovery if $\lambda \leq 1.$

at October 12, 2015 12:40 AM UTC

Authors: Rann Duan, Jugal Garg, Kurt Mehlhorn
Download: PDF
Abstract: We present an improved combinatorial algorithm for the computation of equilibrium prices in the linear Arrow-Debreu model. For a market with $n$ agents and integral utilities bounded by $U$, the algorithm runs in $O(n^7 \log^3 (nU))$ time. This improves upon the previously best algorithm of Ye by a factor of $\tOmega(n)$. The algorithm refines the algorithm described by Duan and Mehlhorn and improves it by a factor of $\tOmega(n^3)$. The improvement comes from a better understanding of the iterative price adjustment process, the improved balanced flow computation for nondegenerate instances, and a novel perturbation technique for achieving nondegeneracy.

at October 12, 2015 12:42 AM UTC

Authors: Carlos Barrón-Romero
Download: PDF
Abstract: This paper presents a complete algorithmic study of the decision Boolean Satisfiability Problem under the classical computation and quantum computation theories. The paper depicts deterministic and probabilistic algorithms, propositions of their properties and the main result is that the problem has not an efficient algorithm (NP is not P). Novel quantum algorithms and propositions depict that the complexity by quantum computation approach for solving the Boolean Satisfiability Problem or any NP problem is lineal time.

at October 12, 2015 12:41 AM UTC

Authors: Patrizio Angelini, Giordano Da Lozzo, Giuseppe Di Battista, Valentino Di Donato, Philipp Kindermann, Günter Rote, Ignaz Rutter
Download: PDF
Abstract: Given a planar graph $G(V,E)$ and a partition of the neighbors of each vertex $v \in V$ in four sets $UR(v)$, $UL(v)$, $DL(v)$, and $DR(v)$, the problem Windrose Planarity asks to decide whether $G$ admits a windrose-planar drawing, that is, a planar drawing in which (i) each neighbor $u \in UR(v)$ is above and to the right of $v$, (ii) each neighbor $u \in UL(v)$ is above and to the left of $v$, (iii) each neighbor $u \in DL(v)$ is below and to the left of $v$, (iv) each neighbor $u \in DR(v)$ is below and to the right of $v$, and (v) edges are represented by curves that are monotone with respect to each axis. By exploiting both the horizontal and the vertical relationship among vertices, windrose-planar drawings allow to simultaneously visualize two partial orders defined by means of the edges of the graph.

Although the problem is NP-hard in the general case, we give a polynomial-time algorithm for testing whether there exists a windrose-planar drawing that respects a combinatorial embedding that is given as part of the input. This algorithm is based on a characterization of the plane triangulations admitting a windrose-planar drawing. Furthermore, for any embedded graph admitting a windrose-planar drawing we show how to construct one with at most one bend per edge on an $O(n) \times O(n)$ grid. The latter result contrasts with the fact that straight-line windrose-planar drawings may require exponential area.

at October 12, 2015 12:45 AM UTC

Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter
Download: PDF
Abstract: We give efficient algorithms for ranking Lyndon words of length n over an alphabet of size {\sigma}. The rank of a Lyndon word is its position in the sequence of lexicographically ordered Lyndon words of the same length. The outputs are integers of exponential size, and complexity of arithmetic operations on such large integers cannot be ignored. Our model of computations is the word-RAM, in which basic arithmetic operations on (large) numbers of size at most {\sigma}^n take O(n) time. Our algorithm for ranking Lyndon words makes O(n^2) arithmetic operations (this would imply directly cubic time on word-RAM). However, using an algebraic approach we are able to reduce the total time complexity on the word-RAM to O(n^2 log {\sigma}). We also present an O(n^3 log^2 {\sigma})-time algorithm that generates the Lyndon word of a given length and rank in lexicographic order. Finally we use the connections between Lyndon words and lexicographically minimal de Bruijn sequences (theorem of Fredricksen and Maiorana) to develop the first polynomial-time algorithm for decoding minimal de Bruijn sequence of any rank n (it determines the position of an arbitrary word of length n within the de Bruijn sequence).

at October 12, 2015 12:41 AM UTC

Authors: David Avis, Charles Jordan
Download: PDF
Abstract: We report some preliminary computational results comparing parallel and sequential codes for vertex/facet enumeration problems for convex polyhedra. The problems chosen span the range from simple to highly degenerate polytopes. For most problems tested,the results clearly show the advantage of using the parallel implementation mplrs of the reverse search based code lrs.

at October 12, 2015 12:47 AM UTC

Authors: Md. Jawaherul Alam, Michael Kaufmann, Stephen G. Kobourov
Download: PDF
Abstract: We study two variants of the problem of contact representation of planar graphs with axis-aligned boxes. In a cube-contact representation we realize each vertex with a cube, while in a proportional box-contact representation each vertex is an axis-aligned box with a prespecified volume. We present algorithms for constructing cube-contact representation and proportional box-contact representation for several classes of planar graphs.

at October 12, 2015 12:44 AM UTC

The 2-D Tucker search problem is shown to be PPA-hard under many-one reductions; therefore it is complete for PPA. The same holds for $k$-D Tucker for all $k\ge 2$. This corrects a claim in the literature that the Tucker search problem is in PPAD.

at October 11, 2015 05:35 AM UTC

Constant time is the absolute low end of time complexity. One may wonder: is there anything nontrivial that can be computed in constant time? If we stick to the Turing machine model, then not much can be done, since the answer can only depend on a constant length initial segment of the input, as farther parts of the input cannot even be reached in constant time.

On the other hand, if we adopt the somewhat more powerful (and more realistic) unit-cost RAM model, in which elementary operations on $O(\log n)$-bit numbers are counted as single steps, then we may be able to solve nontrivial tasks, even in constant time. Here is an example:

Instance: Integers $n, k, l, d$, each given in binary format by $O(\log n)$ bits.

Question: Does there exist an $n$-vertex graph, such that its vertex connectivity is $k$, its edge connectivity is $l$, and its minimum degree is $d$?

Note that from the definition it is not even obvious that the problem is in NP. The reason is that the natural witness (the graph) may need $\Omega(n^2)$-bit long description, while the input is given by only $O(\log n)$ bits. On the other hand, the following theorem (see Extremal Graph Theory by B. Bollobas) comes to the rescue.

Theorem: Let $n, k, l, d$ be integers. There exists an $n$-vertex graph with vertex connectivity $k$, edge connectivity $l$, and minimum degree $d$, if and only if one of the following conditions is satisfied:

  • $0\leq k\leq l \leq d <\lfloor n/2 \rfloor$,
  • $1\leq 2d+2-n\leq k\leq l = d< n-1$
  • $k=l=d=n-1.$

Since these conditions can be checked in constant time (in the unit-cost RAM model), the Theorem leads to a constant time algorithm in this model.

Question: What are some other nontrivial examples of constant time algorithms?

by Andras Farago at October 10, 2015 11:51 PM UTC

Our own Ken is featured in the Wall Street Journal

Christopher Chabris just wrote a wonderful piece on cheating titled “High-Tech Chess Cheaters Charge Ahead.” Chabris is a research psychologist who is well known for his book The Invisible Gorilla, written with Daniel Simons.

Today I want to point out that the piece is in this Saturday’s review section of the Walll Street Journal.

Often this section is reserved for commentary on politics or the economy or some other issue of the week. But this week our own Ken Regan is featured as the expert on chess cheating. Wonderful. Here is a short part of the article—see the article on-line for the rest.


Ken Regan, an international chess master and computer scientist, has developed a software tool that automates the process of comparing human and computer moves, and flags suspicious cases. The approach is sophisticated: It doesn’t suggest that, say, current world champion Magnus Carlsen is a cheater just because his moves often match those of a computer. That’s to be expected. Mr. Regan instead finds cases in which players matched computer moves much more often than expected, given their skill levels and the situations on the board.

Open Problems

Read not just the article but some of the comments. One offers a way to try and stop the problem:

I’m no electronics expert, but wouldn’t it be possible for a location holding a major tournament to install some sort of jamming device that would interfere with signals?

Jamming is usually illegal, and I do not think it would solve the problem anyway. But it does raise the problem: is there some way to stop chess cheating? For now we have Ken to thank for at least a post-factor method of detecting cheaters.

Again congratulations to Ken.

by rjlipton at October 10, 2015 06:36 PM UTC

Penn Engineering is embarking on a period of substantial growth, and the computer science department plans to hire lots of people over the next several years. (You can see a draft of the Penn Engineering plan for growth here. "Data Science and Computation" and Security are both priority areas).

This year, we are planning to hire for multiple positions, both junior and senior, with a focus on (among other things), machine learning. 

So if you are on the market (or thinking about it), send us your application -- apply here: 

(Unless you want to apply for an endowed chair in computer graphics -- then apply here: )

Edit: I forgot to mention! Interest in machine learning is university wide -- both statistics and ESE also plan to hire in machine learning.
Here is the link for applying to the statistics position:
And here is the link for ESE:

by Aaron Roth ( at October 10, 2015 05:12 PM UTC

We show that the Graph Automorphism problem is ZPP-reducible to MKTP, the problem of minimizing time-bounded Kolmogorov complexity. MKTP has previously been studied in connection with the Minimum Circuit Size Problem (MCSP) and is often viewed as essentially a different encoding of MCSP. All prior reductions to MCSP have applied equally well to MKTP, and vice-versa, and all such reductions have relied on the fact that functions computable in polynomial time can be inverted with high probability relative to MCSP and MKTP. Our reduction uses a different approach, and consequently yields the �first example of a problem in ZPP$^{\rm MKTP}$ that is not known to lie in NP $\cap$ coNP. We also show that this approach can be used to provide a reduction of the Graph Isomorphism problem to MKTP.

at October 09, 2015 02:02 AM UTC

Authors: Therese Biedl, Jens M. Schmidt
Download: PDF
Abstract: It is well-known that every graph with maximum degree 4 has an orthogonal drawing with area at most $\frac{49}{64} n^2+O(n) \approx 0.76n^2$. In this paper, we show that if the graph is 3-connected, then the area can be reduced even further to $\frac{25}{36}n^2+O(n) \approx 0.56n^2$. The drawing uses the 3-canonical order for (not necessarily planar) 3-connected graphs, which is a special Mondshein sequence and can hence be computed in linear time. To our knowledge, this is the first application of a Mondshein sequence in graph drawing.

at October 09, 2015 12:41 AM UTC

Authors: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis
Download: PDF
Abstract: We present a novel distributed algorithm for counting all four-node induced subgraphs in a big graph. These counts, called the $4$-profile, describe a graph's connectivity properties and have found several uses ranging from bioinformatics to spam detection. We also study the more complicated problem of estimating the local $4$-profiles centered at each vertex of the graph. The local $4$-profile embeds every vertex in an $11$-dimensional space that characterizes the local geometry of its neighborhood: vertices that connect different clusters will have different local $4$-profiles compared to those that are only part of one dense cluster.

Our algorithm is a local, distributed message-passing scheme on the graph and computes all the local $4$-profiles in parallel. We rely on two novel theoretical contributions: we show that local $4$-profiles can be calculated using compressed two-hop information and also establish novel concentration results that show that graphs can be substantially sparsified and still retain good approximation quality for the global $4$-profile.

We empirically evaluate our algorithm using a distributed GraphLab implementation that we scaled up to $640$ cores. We show that our algorithm can compute global and local $4$-profiles of graphs with millions of edges in a few minutes, significantly improving upon the previous state of the art.

at October 09, 2015 12:40 AM UTC

Authors: Ante Ćustić, Abraham P. Punnen
Download: PDF
Abstract: We investigate special cases of the quadratic minimum spanning tree problem (QMSTP) on a graph $G=(V,E)$ that can be solved as a linear minimum spanning tree problem. Characterization of such problems on graphs with special properties are given. This include complete graphs, complete bipartite graphs, cactuses among others. Our characterization can be verified in $O(|E|^2)$ time. In the case of complete graphs and when the cost matrix is given in factored form, we show that our characterization can be verified in $O(|E|)$ time. Related open problems are also indicated.

at October 09, 2015 12:40 AM UTC

Authors: Xiaorui Sun, John Wilmes
Download: PDF
Abstract: Coherent configurations (CCs) are highly regular colorings of the set of ordered pairs of a "vertex set"; each color represents a "constituent digraph." A CC is primitive (PCC) if all its constituent digraphs are connected.

We address the problem of classifying PCCs with large automorphism groups. This project was started in Babai's 1981 paper in which he showed that only the trivial PCC admits more than $\exp(\tilde{O}(n^{1/2}))$ automorphisms. (Here, $n$ is the number of vertices and the $\tilde{O}$ hides polylogarithmic factors.)

In the present paper we classify all PCCs with more than $\exp(\tilde{O}(n^{1/3}))$ automorphisms, making the first progress on Babai's conjectured classification of all PCCs with more than $\exp(n^{\epsilon})$ automorphisms.

A corollary to Babai's 1981 result solved a then 100-year-old problem on primitive but not doubly transitive permutation groups, giving an $\exp(\tilde{O}(n^{1/2}))$ bound on their order. In a similar vein, our result implies an $\exp(\tilde{O}(n^{1/3}))$ upper bound on the order of such groups, with known exceptions. This improvement of Babai's result was previously known only through the Classification of Finite Simple Groups (Cameron, 1981), while our proof, like Babai's, is elementary and almost purely combinatorial.

Our result also has implications to the complexity of the graph isomorphism problem. PCCs arise naturally as obstacles to combinatorial partitioning approaches to the problem. Our results give an algorithm for deciding isomorphism of PCCs in time $\exp(\tilde{O}(n^{1/3}))$, the first improvement over Babai's $\exp(\tilde{O}(n^{1/2}))$ bound.

at October 09, 2015 12:40 AM UTC

Authors: Zhi-Hong Deng, Shulei Ma, He Liu
Download: PDF
Abstract: In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average.

at October 09, 2015 12:41 AM UTC

Our next talk will take place this coming Wednesday, October 14th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC). Ryan O’Donnell from CMU will speak about his work on “How to refute a random CSP”, to appear in the next FOCS (abstract below).

Please make sure you reserve a spot for your group to join us live by signing up on the online form. As usual, for more information about the TCS+ online seminar series and the upcoming talks, please see the website.

Let P be a k-ary predicate and let H be a uniformly random constraint satisfaction problem with n variables and m = m(n) constraints, each of type P applied to k literals. For example, if P  is the 3-ary OR predicate, then H is a classic random instance of 3SAT.

For each P there is a critical constant \alpha such that H will be satisfiable (with high probability) if m < \alpha n and will be unsatisfiable (whp) if m > \alpha n. In the former case, the natural algorithmic task is to find a satisfying assignment to the variables.

In the latter case, the natural algorithmic task is to find a refutation; i.e., a proof of unsatisfiability. This task becomes algorithmically easier the larger m is. As an example, in the case of 3SAT, it is known that efficient refutation algorithms exist provided m \gg n^{3/2}. We will discuss the refutation problem in general, focusing on how the predicate, P, affects the number of constraints, m, required for efficient refutation. We will also describe the applications to hardness of learning.

by plustcs at October 08, 2015 05:40 PM UTC

Authors: Nicolás Alvarez, Verónica Becher
Download: PDF
Abstract: Among the currently known constructions of absolutely normal numbers, the one given by Mordechay Levin in 1979 achieves the lowest discrepancy bound. In this work we analyze this construction in terms of computability and computational complexity. We show that, under basic assumptions, it yields a computable real number. The construction does not give the digits of the fractional expansion explicitly, but it gives a sequence of increasing approximations whose limit is the announced absolutely normal number. The $n$-th approximation has an error less than $2^{2^{-n}}$. To obtain the $n$-th approximation the construction requires, in the worst case, a number of mathematical operations that is double exponential in $n$. We consider variants on the construction that reduce the computational complexity at the expense of an increment in discrepancy.

at October 08, 2015 12:40 AM UTC

Authors: Adam Kurpisz, Samuli Leppänen, Monaldo Mastrolilli
Download: PDF
Abstract: The Lasserre/Sum-of-Squares (SoS) hierarchy is a systematic procedure for constructing a sequence of increasingly tight semidefinite relaxations. It is known that the hierarchy converges to the 0/1 polytope in n levels and captures the convex relaxations used in the best available approximation algorithms for a wide variety of optimization problems.

In this paper we characterize the set of 0/1 integer linear problems and unconstrained 0/1 polynomial optimization problems that can still have an integrality gap at level n-1. These problems are the hardest for the Lasserre hierarchy in this sense.

at October 08, 2015 12:00 AM UTC

Authors: Oswin Aichholzer, Nieves Atienza, Ruy Fabila-Monroy, Pablo Perez-Lantero, Jose M. Dıaz-Báñez, David Flores-Peñaloza, Birgit Vogtenhuber, Jorge Urrutia
Download: PDF
Abstract: Let $P$ be a set of $n$ points in general position in the plane, $r$ of which are red and $b$ of which are blue. In this paper we prove that there exist: for every $\alpha \in \left [ 0,\frac{1}{2} \right ]$, a convex set containing exactly $\lceil \alpha r\rceil$ red points and exactly $\lceil \alpha b \rceil$ blue points of $P$; a convex set containing exactly $\left \lceil \frac{r+1}{2}\right \rceil$ red points and exactly $\left \lceil \frac{b+1}{2}\right \rceil$ blue points of $P$. Furthermore, we present polynomial time algorithms to find these convex sets. In the first case we provide an $O(n^4)$ time algorithm and an $O(n^2\log n)$ time algorithm in the second case. Finally, if $\lceil \alpha r\rceil+\lceil \alpha b\rceil$ is small, that is, not much larger than $\frac{1}{3}n$, we improve the running time to $O(n \log n)$.

at October 08, 2015 12:42 AM UTC

Authors: Kai Zhu, Lei Ying
Download: PDF
Abstract: Information diffusion in networks can be used to model many real-world phenomena, including rumor spreading on online social networks, epidemics in human beings, and malware on the Internet. Informally speaking, the source localization problem is to identify a node in the network that provides the best explanation of the observed diffusion. Despite significant efforts and successes over last few years, theoretical guarantees of source localization algorithms were established only for tree networks due to the complexity of the problem. This paper presents a new source localization algorithm, called the Short-Fat Tree (SFT) algorithm. Loosely speaking, the algorithm selects the node such that the breadth-first search (BFS) tree from the node has the minimum depth but the maximum number of leaf nodes. Performance guarantees of SFT under the independent cascade (IC) model are established for both tree networks and the Erdos-Renyi (ER) random graph. On tree networks, SFT is the maximum a posterior (MAP) estimator. On the ER random graph, the following fundamental limits have been obtained: $(i)$ when the infection duration $<\frac{2}{3}t_u,$ SFT identifies the source with probability one asymptotically, where $t_u=\left\lceil\frac{\log n}{\log \mu}\right\rceil+2$ and $\mu$ is the average node degree, $(ii)$ when the infection duration $>t_u,$ the probability of identifying the source approaches zero asymptotically under any algorithm; and $(iii)$ when infection duration $<t_u,$ the BFS tree starting from the source is a fat tree. Numerical experiments on tree networks, the ER random graphs and real world networks with different evaluation metrics show that the SFT algorithm outperforms existing algorithms.

at October 08, 2015 12:41 AM UTC

Authors: Arthur Flajolet, Patrick Jaillet
Download: PDF
Abstract: Achievable regret bounds for Multi-Armed Bandit problems are now well-documented. They can be classified into two categories based on the dependence on the time horizon $T$: (1) small, distribution-dependent, bounds of order of magnitude $\ln(T)$ and (2) robust, distribution-free, bounds of order of magnitude $\sqrt{T}$. The Bandits with Knapsacks theory, an extension to the framework allowing to model resource consumption, lacks this duality. While several algorithms have been shown to yield asymptotically optimal distribution-free bounds on regret, there has been little progress toward the development of small distribution-dependent regret bounds. We partially bridge the gap by designing a general purpose algorithm which we show enjoy asymptotically optimal regret bounds in several cases that encompass many practical applications including dynamic pricing with limited supply and online bidding in ad auctions.

at October 08, 2015 12:41 AM UTC

Let n be the following number
135066410865995223349603216278805969938881475605667027524485143851526510604859533833940287150571909441798207282164471551373680419703964191743046496589274256239341020864383202110372958725762358509643110564073501508187510676594629205563685529475213500852879416377328533906109750544334999811150056977236890927563 (RSA 1024)

What is the probability that the fifth least significant bit of the smallest prime factor of n is a one? This bit is fully defined--the probability is either one or zero. But if gave me better than 50/50 odds one way or the other I would take the bet, unless you worked for RSA Labs.

How does this jibe with a Frequentist or Bayesian view of probability? No matter how often you factor the number you will always get the same answer. No matter what randomized process was used to generate the original primes, conditioned on n being the number above, the bit is determined.

Whether we flip a coin, shuffle cards, choose lottery balls or use a PRG, we are not creating truly random bits, just a complex process who unpredictability is,we hope, indistinguishable from true randomness. We know from Impagliazzo-Wigderson, and its many predecessors and successors, that any sufficiently complex process can be converted to a PRG indistinguishable from true randomness. Kolmogorov complexity tells us we can treat a single string with no short description as a "random string".

That's how we ground randomness in complexity: Randomness is not some distribution over something not determined, just something we cannot determine.

by Lance Fortnow ( at October 07, 2015 12:53 PM UTC

Our strongest interest is in candidates with research interests in all areas related to data science, but we also welcome applicants with research interests in design and analysis of algorithms. Position available starting in September 2016. Ph.D. or equivalent required by September 2016. Undergraduate teaching only.



by theorycsjobs at October 07, 2015 04:13 AM UTC

When and why do people stop thinking?


Tamal Biswas has been my graduate partner on my research model of decision-making at chess for over four years. He has I believe solved the puzzle of how best to incorporate depth into the model. This connects to ideas of the inherent difficulty of decisions, levels of thinking, and stopping rules by which to convert thought into action.

Today I describe his work on the model and the surprise of being able to distinguish skill solely on cases where people make mistakes. This is shown in two neat animations, one using data from the Stockfish 6 chess program, the other with the Komodo 9 program, whose elements are explained below.

Tamal presented this work on the Doctoral Consortium day of the 2015 Algorithmic Decision Theory conference in Lexington, Kentucky. The conference was organized and hosted by Judy Goldsmith, whom I have known since our undergraduate days at Princeton. Our full paper, “Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data,” will be presented at the 2015 IEEE Conference on Machine Learning and its Applications in Miami. Tamal presented our predecessor paper, “Quantifying Depth and Complexity of Thinking and Knowledge,” at the 2015 International Conference on Agents and AI in Lisbon last January.

Although we have blogged about the chess research several times before, I haven’t yet described details of my model here. After doing so we’ll see why depth has been tricky to incorporate and what our new discoveries mean.

The Data

The only connection to chess is that chess has alternating turns involving decisions whose options can be given numerical utility values that are objective and reliable but difficult for the players to discern. The numbers come from powerful chess programs (commonly called engines) whose rated playing skill long surpasses that of human players and is arguably measurably close to perfection. The Elo rating system is scaled so that a difference of 100 rating points derives from and predicts a 64%-to-36% points margin for the stronger player. The World Chess Federation lists over 20,000 human players above the 2200 threshold for “master” but only 45 players above 2700 and just four above 2800 including world champion Magnus Carlsen at 2850—while the best engines are rated above 3200 and my model currently suggests a ceiling below 3500.

The move values are updated by the program in rounds of increasing search depth {d = 1,2,3,\dots} often reaching {20} and beyond, by which they have most often stabilized. The highest option—or the first listed move in case of tied values—is the engine’s best move in the search, and its final value is the overall position value.

The numbers come from chess factors—beginning with basic values for the pieces such as pawn = 1, knight = 3, and so on—but they are governed by a powerful regularity. Position values are centered on 0.00 meaning “even chances,” positive for advantage to the side to move, and negative for disadvantage. The regularity is that when the average score (counting 1 per win and 0.5 per draw) achieved by players from positions of value {v} is plotted against {v}, the result is almost perfectly a logistic curve

\displaystyle  L(av) = \frac{1}{1 + \exp(-av)}.

The factor {a} offsets the scale of the engine’s evaluation function—for a simple instance, whether it counts a queen as 9 or 10. Curiously the curve flattens only a little for two players of lower Elo rating. When there is a substantial difference in rating between two players, however, it does a strong horizontal shift:


A 2012 paper by Amir Ban not only observes this relationship (also separating out wins and draws) but argues that generating evaluations within the search that follow it optimizes the strength of the engine. We henceforth argue that the utility values have organic import beyond chess, and that the problem addressed by our model is a general one of “converting utilities into probabilities.”

By the way: Ban is famous for being a co-creator of both the Deep Junior chess program and the USB flash drive. That is right: the flash drive that we all use, every day, to store and transfer information—probably a bit more impactful than anything to do with chess, but amazing that he did both.

The Model and Depth

The model uses the move values together with parameters denoting skills of the player to generate probabilities {p_i} for each legal move {m_i}. Up to now I’ve used two parameters called {s} for sensitivity and {c} for consistency. They govern a term

\displaystyle  u_i = \exp(-(\frac{\delta(v_1,v_i)}{s})^c),

where {\delta(v_1,v_i)} is a scaling adjustment on the raw difference {v_1 - v_i}. Lower {s} and higher {c} both reduce {u_i} and ultimately the probability {p_i} of an inferior move.

Chess has a broad divide between “positional” and “tactical” skill. I’ve regarded {s} as measuring positional skill via the ability to discern small differences in value between moves, and {c} as avoidance of large errors which crop up most often in tactical games. Neither uses any chess-specific features. Without introducing any dependence on chess, I’ve desired to enrich the model with other player parameters. Chief among them has been modeling depth of thinking but it is tricky.

My first idea was to model depth as a random variable with a player-specific distribution. As a player I’ve had the experience of sometimes not seeing two moves ahead and other times seeing “everything.” A discrete distribution {W = (w_1,\dots,w_D)} specified by a mean {d} for a player’s “habitual depth” and a shape parameter {v} seemed to fit the bill. The basic idea was to use the engine’s values {v_{i,j}} for {m_i} at all depths {j} up to the limit depth {D} of the search in an expression of the form

\displaystyle  u_i = \sum_{j=1}^D w_j \exp(-(\frac{\delta(v_{1,j},v_{i,j})}{s})^c).

A related idea was to compute probabilities {p_{i,j}} from the values at each depth {j} and finally set {p_i = \sum_j w_j p_{i,j}}.

Doing so changed the regression from one over {s,c} to one over {s,c,d} and possibly {v}. Whereas the former almost always has one clear minimum, the latter proved to be pockmarked with local minima having no global drift we could discern. Indeed, for many initial points, {d} would drift toward {0} with wild {s} and {c} values regardless of the skill of the games being analyzed. Perhaps this was caused by noisy values at the very lowest depths giving a good random match. I stayed with the two-parameter model in my official use and followed other priorities.

The Depth is in the Data

Looking back, I think my idea was wrong because depth is not primarily a human property. We don’t first (randomly) choose each time whether to be deep or shallow—to be a Fischer or a “fish” for a move. Instead, aspects of the position influence how deep we go before being satisfied enough to stop. One aspect which I had conceived within my {d,v} scheme involves moves whose values swing as the depth increases.

A move involving a sacrifice might look poor until quite far into the search, and then “swing up” to become the best move when its latent value emerges. A “trap” set by the opponent might look tempting enough to be valued best at first even by the engines, but they will eventually “swing down” the value as the point of the trap is plumbed. The sublime trap that did much to win the 2008 world championship match for Viswanathan Anand over Vladimir Kramnik shows both phenomena:


Left: Position before Kramnik’s 29.Nxd4. Right: Anand’s winning 34…Nxe3! riposte. Below: Stockfish 6 values x100 at depths 1–19.


Kramnik’s 29.Nxd4 capture might be dismissed by a beginner since Black’s queen can just take the knight, but a seasoned player will see the followup 30.Rd1 skewering Black’s queen and knight on d7. Evidently this looked good to Kramnik through 30…Nf6 31.Rxd4 Nxg4 32.Rd7+ Kf6 33.Rxb7 Rc1+ 34.Bf1 coming out a pawn ahead, and Stockfish turns from negative to positive throughout depths 9–14. But then it sees the shocking 34…Ne3!!, when after 35.fxe3 fxe3 Black’s passed pawn is unstoppable. Kramnik never saw these moves coming and resigned then and there.

In my old view, Kramnik pulled a number between 9 and 14 out of his thinking cap and paid the piper to Anand who had rolled 16 or higher at his previous turn. Based on the good fits to my two-parameter model, the reality that most positions are clear-cut enough even for beginners to find the best move over 40% of the time, and reasoning about randomness, I pegged the phenomenon as secondary and likely offset by cases where the played move “swung up.”

In Tamal’s view the whole plateau of 9–14 put friction on Kramnik’s search. Once Kramnik was satisfied the advantage was holding, he stopped and played the fateful capture. Of various ways to quantify “swing,” Tamal chose one that emphasizes plateaus. Then he showed larger effects than I ever expected. Between the 0.0–1.0 and 4.0–5.0 ranges of “swing up” for the best move in his measure, the frequency of 2700-level players finding it plummets from 70% to 30%. This cannot be ignored.

Depth of Satisficing

This result presaged the effectiveness of several other measures formulated in our ICAART 2015 paper to represent the complexity and difficulty of a decision. We introduced some ideas relating to test-taking in an earlier post but now we’ve settled on metrics. The most distinctive of Tamal’s ideas blends our investigation of depth with Herbert Simon’s notion of satisficing going back to the years after World War II.

Simon coined satisficing from “satisfy” and “suffice” and contrasted it to optimizing during a search. It is often treated as a decision policy of meeting needs and constraints, but Simon also had in mind the kind of limitation from bounded rationality (another term he coined) that arises in searches when information is hard to find and values are hard to discern. In a zero-sum setting like chess, except for cases where one is so far ahead that safety trumps maximizing, any return short of optimum indicates an objective mistake. We address why players stop thinking when their move is (or is not) a mistake. We use the search depth {d} to indicate both time and effectiveness, gauging the latter by comparing the values given by the engine at depth {d} and at the supreme depth {D}.

Not all inferior moves {m_i} have negative swing—their inferiority {\delta_{i,d}} might lessen with {d}—and not all moves of negative swing were valued best at some lower depths. For those that were valued best we can identify the depth {d} at which the move is exposed as an error by the engine: {d = 15} in our example above. We judge that the player failed to look deeper than {d-1} and call {d-1} the depth of satisficing for that move.

We can extend the notion to include the many negative-swing cases where the played move {m_i} does not actually cross with the ultimate best move {m_1}. For each {d}, plot the average over selected positions of {\delta_{i,d}} and {\delta_{1,d}} (note {\delta_{1,D} = 0} by definition, but {\delta_{1,d} > 0} if some other move is better at depth {d}). Over all positions the curves tend to be relatively close at low depths, but over positions with played move {m_{i}} of positive swing the curve for {m_{i}} stays significantly apart from that for {m_1}, roughly paralleling it. This explains our persistent observation at all skill levels that over the negative-swing moves the curves do cross at some depth {d}. Indeed, figures in our paper show that {m_1} (green line below) is often significantly inferior to {m_i} (red) at low depths. The crossover depth of the averages becomes the depth of satisficing for the aggregate.

Using UB’s Center for Computational Research (CCR) to run the engines Stockfish 6 and Komodo 9, we have analyzed every published game played in 2010–2014 in which both players were within 10 Elo points of a century or half-century mark (widened to 15 or 20 for Elo 1300 to 1650, 2750, and 2800 to keep up the sample size). Combining Elo 1300 and 1350, 1400 and 1450, etc., shows that the crossover depth grows steadily with rating in both the GIF for Stockfish (left) and the GIF for Komodo (right).


There we have it: a strong correlate of skill derived only from moves where players erred. The inference is that better players make mistakes whose flaws require more time and depth to expose, as measured objectively by the engines. A further surprise is that the satisficing depths go clear down near zero for novice players—that we got anything regular from values at depths below 4 (which many engines omit) is noteworthy. That the best players’ figures barely poke above depth 10—which computers reach in milliseconds—is also sobering.

Depth and Time

Chess is played with a time budget, much as one has when taking a standardized test. The first budget is almost always for turns 1–40, and although exhausting it before move 40 loses the game immediately, players often use the bulk of it by move 30 or even 25. The following moves are then played under “time pressure.” Segregating turns 9–25 from 26–40 (the opening turns 1–8 are skipped in the analysis) shows a clear effect of free time enhancing the depth and time pressure lowering it:


All of these games—this data set has over 1.08 million analyzed moves from over 17,500 games—come from recent real competitions, no simulations or subject waivers needed, in which skill levels are reliably known via the Elo system. Data this large, in a model that promotes theoretical formulations for practical phenomena, should provide a good standard of comparison for other applications. Tamal and I have many further ideas to pursue.

Open Problems

What parallels do you see between the thought processes revealed in chess and decision behaviors in other walks of life?

by KWRegan at October 07, 2015 12:44 AM UTC

Authors: Alexandros Avdis, Christian T. Jacobs, Jon Hill, Matthew D. Piggott, Gerard J. Gorman
Download: PDF
Abstract: Due to the fractal nature of the domain geometry in geophysical flow simulations, a completely accurate description of the domain in terms of a computational mesh is frequently deemed infeasible. Shoreline and bathymetry simplification methods are used to remove small scale details in the geometry, particularly in areas away from the region of interest. To that end, a novel method for shoreline and bathymetry simplification is presented. Existing shoreline simplification methods typically remove points if the resultant geometry satisfies particular geometric criteria. Bathymetry is usually simplified using traditional filtering techniques, that remove unwanted Fourier modes. Principal Component Analysis (PCA) has been used in other fields to isolate small-scale structures from larger scale coherent features in a robust way, underpinned by a rigorous but simple mathematical framework. Here we present a method based on principal component analysis aimed towards simplification of shorelines and bathymetry. We present the algorithm in detail and show simplified shorelines and bathymetry in the wider region around the North Sea. Finally, the methods are used in the context of unstructured mesh generation aimed at tidal resource assessment simulations in the coastal regions around the UK.

at October 07, 2015 12:43 AM UTC

Authors: Akanksha Agrawal, Daniel Lokshtanov, Amer E. Mouawad, Saket Saurabh
Download: PDF
Abstract: Given a family of graphs $\mathcal{F}$, a graph $G$, and a positive integer $k$, the $\mathcal{F}$-Deletion problem asks whether we can delete at most $k$ vertices from $G$ to obtain a graph in $\mathcal{F}$. $\mathcal{F}$-Deletion generalizes many classical graph problems such as Vertex Cover, Feedback Vertex Set, and Odd Cycle Transversal. A graph $G = (V, \cup_{i=1}^{\alpha} E_{i})$, where the edge set of $G$ is partitioned into $\alpha$ color classes, is called an $\alpha$-edge-colored graph. A natural extension of the $\mathcal{F}$-Deletion problem to edge-colored graphs is the $\alpha$-Simultaneous $\mathcal{F}$-Deletion problem. In the latter problem, we are given an $\alpha$-edge-colored graph $G$ and the goal is to find a set $S$ of at most $k$ vertices such that each graph $G_i \setminus S$, where $G_i = (V, E_i)$ and $1 \leq i \leq \alpha$, is in $\mathcal{F}$. In this work, we study $\alpha$-Simultaneous $\mathcal{F}$-Deletion for $\mathcal{F}$ being the family of forests. In other words, we focus on the $\alpha$-Simultaneous Feedback Vertex Set ($\alpha$-SimFVS) problem. Algorithmically, we show that, like its classical counterpart, $\alpha$-SimFVS parameterized by $k$ is fixed-parameter tractable (FPT) and admits a polynomial kernel, for any fixed constant $\alpha$. In particular, we give an algorithm running in $2^{O(\alpha k)}n^{O(1)}$ time and a kernel with $O(\alpha k^{3(\alpha + 1)})$ vertices. The running time of our algorithm implies that $\alpha$-SimFVS is FPT even when $\alpha \in o(\log n)$. We complement this positive result by showing that for $\alpha \in O(\log n)$, where $n$ is the number of vertices in the input graph, $\alpha$-SimFVS becomes W[1]-hard. Our positive results answer one of the open problems posed by Cai and Ye (MFCS 2014).

at October 07, 2015 12:40 AM UTC

Authors: Amir Ali Ahmadi, Georgina Hall
Download: PDF
Abstract: We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite programming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave procedure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard.

at October 07, 2015 12:42 AM UTC

Authors: Anirban Dasgupta, Kevin Lang, Lee Rhodes, Justin Thaler
Download: PDF
Abstract: Given $m$ distributed data streams $A_1, \dots, A_m$, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over $A_1, \dots, A_m$. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability.

at October 07, 2015 12:42 AM UTC

UIUC CS is hiring in theory this year. Apply now while supply lasts – we have at least 5 gazillion slots this year in all aspects of computer science, including computational mythology…

CS at Aarhus is also hiring – they are looking for a machine learning person.

by Sariel at October 06, 2015 10:42 PM UTC

Authors: Damianos Gavalas, Charalampos Konstantopoulos, Grammati Pantziou
Download: PDF
Abstract: Vehicle (bike or car) sharing represents an emerging transportation scheme which may comprise an important link in the green mobility chain of smart city environments. This chapter offers a comprehensive review of algorithmic approaches for the design and management of vehicle sharing systems. Our focus is on one-way vehicle sharing systems (wherein customers are allowed to pick-up a vehicle at any location and return it to any other station) which best suits typical urban journey requirements. Along this line, we present methods dealing with the so-called asymmetric demand-offer problem (i.e. the unbalanced offer and demand of vehicles) typically experienced in one-way sharing systems which severely affects their economic viability as it implies that considerable human (and financial) resources should be engaged in relocating vehicles to satisfy customer demand. The chapter covers all planning aspects that affect the effectiveness and viability of vehicle sharing systems: the actual system design (e.g. number and location of vehicle station facilities, vehicle fleet size, vehicles distribution among stations); customer incentivisation schemes to motivate customer-based distribution of bicycles/cars (such schemes offer meaningful incentives to users so as to leave their vehicle to a station different to that originally intended and satisfy future user demand); cost-effective solutions to schedule operator-based repositioning of bicycles/cars (by employees explicitly enrolled in vehicle relocation) based on the current and future (predicted) demand patterns (operator-based and customer-based relocation may be thought as complementary methods to achieve the intended distribution of vehicles among stations).

at October 06, 2015 12:42 AM UTC

Authors: Hokky Situngkir
Download: PDF
Abstract: The carved and painted decorations in traditional Batak houses and buildings, gorga, are the source of their exoticism. There are no identical patterns of the ornaments within Batak houses and the drawings are closely related to the way ancient Batak capture the dynamicity of the growing 'tree of life', one of central things within their cosmology and mythology. The survey of ornaments of Batak houses and buildings in Northern Sumatera Indonesia has made us possible to observe the complex pattern. The fractal dimensions of the geometrical shapes in gorga are calculated and they are conjectured into 1.5-1.6, between the dimensional of a line and a plane. The way gorga is drawn is captured by using some modification to the turtle geometry of L-System model, a popular model to model the dynamics of growing plants. The result is a proposal to see Bataknese gorga as one of traditional heritage that may enrich the studies to the generative art.

at October 06, 2015 12:43 AM UTC

Authors: Haim Kaplan, Wolfgang Mulzer, Liam Roditty, Paul Seiferth
Download: PDF
Abstract: Let $S$ be a set of $n$ sites in the plane. The unit disk graph UD$(S)$ on $S$ has vertex set $S$ and an edge between two distinct sites $s,t \in S$ if and only if $s$ and $t$ have Euclidean distance $|st| \leq 1$.

A routing scheme $R$ for UD$(S)$ assigns to each site $s \in S$ a label $\ell(s)$ and a routing table $\rho(s)$. For any two sites $s, t \in S$, the scheme $R$ must be able to route a packet from $s$ to $t$ in the following way: given a current site $r$ (initially, $r = s$), a header $h$ (initially empty), and the target label $\ell(t)$, the scheme $R$ may consult the current routing table $\rho(r)$ to compute a new site $r'$ and a new header $h'$, where $r'$ is a neighbor of $r$. The packet is then routed to $r'$, and the process is repeated until the packet reaches $t$. The resulting sequence of sites is called the routing path. The stretch of $R$ is the maximum ratio of the (Euclidean) length of the routing path produced by $R$ and the shortest path in UD$(S)$, over all pairs of distinct sites in $S$.

For any given $\varepsilon > 0$, we show how to construct a routing scheme for UD$(S)$ with stretch $1+\varepsilon$ using labels of $O(\log n)$ bits and routing tables of $O(\varepsilon^{-5}\log^2 n \log^2 D)$ bits, where $D$ is the (Euclidean) diameter of UD$(S)$. The header size is $O(\log n \log D)$ bits.

at October 06, 2015 12:00 AM UTC

Authors: Akshar Varma
Download: PDF
Abstract: The rooted tree is an important data structure, and the height, depth and subtree size are naturally defined attributes of every node. We consider the problem of the realization of a tree given a sequence of one of these attributes. We also consider the problem of the existence of a tree when multiple sequences of attributes are given or when a sequence of tuples of attributes is given. Our most significant results are the NP-Completeness of problems related to subtree size sequences, which we prove by non-trivial reductions from Numerical Matching with Target Sums. By contrast we give polynomial time algorithms for depth and height sequences. Even for the subtree size problem, we show that the realization of some classes of trees can be solved in O(log(n)) time.

at October 06, 2015 12:41 AM UTC

Authors: Yonghui Guan
Download: PDF
Abstract: I find new equations for Chow varieties, their secant varieties, and an additional variety that arises in the study of depth 5 circuits by flattenings and Koszul Young flattenings. This enables a new lower bound for symmetric border rank of $x_1x_2\cdots x_d$ when $d$ is odd, and a lower bound on the size of depth 5 circuits that compute the permanent.

at October 06, 2015 12:40 AM UTC