Raymond Smullyan is probably the world’s greatest expert on the logic of lying and the logic of chess. He is still writing books well into his 10th decade. Last year he published a new textbook, *A Beginner’s Guide to Mathematical Logic*, and in 2013 a new puzzle book named for Kurt Gödel. His 1979 book *The Chess Mysteries of Sherlock Holmes* introduced *retrograde analysis*—taking back moves from positions to tell how they could possibly have arisen—to a wide public.

Today Ken and I wish to talk about whether we can ever play perfect chess—or at least better chess than any one chess program—by combining output from multiple programs that sometimes might “lie.”

We will start with Smullyan-style puzzles today, but they are prompted by an amazing and serious fact. Even though human players have been outclassed by computers for over a decade, humans judging between multiple programs have been able to beat those programs playing alone. This happens even when the human player is far from being a master—someone who would get crushed in minutes by programs available on smartphones. We want to know, how can this be?

By coincidence, yesterday’s New York Times Magazine has a feature on Terry Tao that likens discovering and proving theorems to “playing chess with the devil”—quoting Charles Fefferman:

The devil is vastly superior at chess, but […] you may take back as many moves as you like, and the devil may not. … If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but—aha!—you have your first clue.

On this blog we have previously likened perfect programs to “playing chess against God”—this was quoting Ken Thompson about endgames where perfect tables have been computed. Since the programs we consider here occasionally err—one can say lie—we will reserve the “devil” term in yesterday’s Times for them.

## Freestyle Chess

We, that is I and my Tech colleague Dr. Kathryn Farley, just visited Ken at his home base in Buffalo and had a great time. Of course the weather up there is near perfect this time of year and his family was wonderful to us. Plus we got to visit Wegmans—our pick for the greatest supermarket chain in the world.

One afternoon I was honored to sit in on a video conference in which Ken presented some of his research on using computer programs to evaluate the play of humans. Joining him via the Internet were two experts in so-called *freestyle chess* where humans are allowed access to multiple chess programs during a game. One-on-one the programs totally dominate the humans—even on laptops programs such as Stockfish and Komodo have Elo ratings well above 3100 whereas the best humans struggle to reach even 2900—but the human+computer “Centaurs” had better results than the computers alone. In the audience were representatives of defense and industrial systems that involve humans and computers.

Ken got into freestyle chess not as a player but because of his work on chess *cheating*—see this for example. Freestyle chess says “go ahead and cheat, and let’s see what happens…” The audience was not interested in cheating but rather in how combining humans and computers changes the game. While chess programs are extremely strong players, they may have weaknesses that humans can help avoid. Thus, the whole point of freestyle chess is:

Are humans + computers > computers alone?

That is the central question. Taken out of the chess context it becomes a vital question as computers move more and more into our jobs and our control systems. The chess context attracts interest because it involves extreme performance that can can be precisely quantified, and at least until recently, the answer has been a clear “Yes.”

At the beginning of the video conference Ken spoke about the history of computer chess, giving his usual clear and precise presentation, and then reviewed his joint paper showing the good results for human-computer teams were no fluke—they really made better moves. Ken used some of the slides from the end of his TEDxBuffalo talk. Then two Freestyle experts spoke, including a winner of three tournaments who doesn’t even have a human chess rating, and an interesting discussion followed on how they actually *used* the computer chess programs.

I must admit at first I was a skeptic: how could weak players, humans, help strong players, computers? Part of it was that when two or more programs disagreed on the best move the human could make the choice. This kind-of means saying the programs whose move you don’t choose are *wrong*.

As Ken and I mulled over the idea of freestyle chess we realized it raises some interesting puzzles. I wrote a first draft, then Ken took over adding more detail and tricks to what follows. Let’s take a look at the puzzles now.

## Puzzle I

Suppose Alice has one program that is perfect except that there is *one* position in which makes a error. To simplify, let’s suppose the only outcomes are win () or loss (). An **error** means that is a winning position for the player to move—it could be Bob not Alice—but chooses a move leading to a position that is losing for that player.

Let Alice start in a position . She wants to play perfect chess using as guide. Can she do so? The bad position might be reached in a game played from , indeed might be itself.

Alice of course cannot tell by herself whether a given position has value or , unless the position is right near the end of a game. But she has one advantage that lacks. She can play against itself from any position . If is beyond —at least if is not reachable in a correctly-played sequence of moves from —then Alice will get the correct value of .

This is like the power of a human centaur to try a program deeper on the moves it is suggesting. In symbols, Alice executes , which generates a game sequence

of positions where is checkmate for one side. The cost of this is . You might think we could let do the same thing, but is not like “Satan” in Smullyan’s story, “Satan, Cantor, and Infinity.” is not trying to out-psych Alice or correct itself; is just given as-is and by-hook-or-by-crook makes that error in some position .

So Puzzle I is:

Can the “centaur” Alice + *X* play perfectly, even though neither Alice nor *X* plays perfectly alone? And at what cost compared to *X*?

## Answer to Puzzle I

The answer is, yes she can. Let the legal moves in be going to positions with the move recommended by . Her algorithm exemplifies a key idea that bridges to the more interesting puzzles.

We claim this algorithm, whose running time we count as linear in , plays perfectly in any . If all values are but is wrong then some other is really a winning position. But then is wrong both at and at some position in the game from which is a contradiction. If but is wrong anyway then is wrong both at and somewhere at or beyond.

Finally if the “switch” is wrong then errs somewhere beyond **and** either errs beyond or erred by choosing in after all. (If is wrong and all other are losing then and so wasn’t a mistake by Alice since it didn’t matter.)

There is one loophole that needs attention. and could both be wrong because their games go to a common position in which makes an error. However, that lone error cannot simultaneously flip a true to in and a true to in , because and have the same player (Bob) to move. There is also the possibility that play from could go through (perhaps via or through some other , which we leave you to analyze. We intend to rule out the latter, and we *could* also rule out the former by insisting that the “game tree” from positions really be a tree not a DAG.

Boiled down, the idea is that where goes to is an “error signature” that Alice can recognize. If errs at then the signature definitely happens because is perfect at each and one of the must be winning. If the signature happens yet did not err at then must be a losing position. Hence once the signature happens then Alice can trust completely. The only way the error could possibly lie in her future is if she was losing at but lucks into a winning position—but then the error happened on Bob’s move not hers.

We claim also that all this logic is unaffected if “draw” is a possible outcome. Indeed, we could play with the old Arabian rules that giving stalemate is a win—counting it 0.8 points say—and that occupying the center with one’s king after leaving the opponent with a bare king is a win—worth say 0.7 points. It all works with “” being the true value of position (aside from a total loss) and “” being any inferior value.

## Puzzle II

Now suppose is allowed to make errors in *two* positions. Can Alice modify her algorithm to still play perfectly?

First suppose the errors are related. Call two positions **related** if can occur in a game played from where all intervening moves are not errors. Per above we will always suppose that two positions reached in one move from are unrelated (else we would say the two options are not fully distinct). Related errors are ones that occur in related positions.

If Alice knows this about , then we claim she can solve Puzzle II. She plays the game through as before, but now she looks for the error signature at **all** nodes in the game. If she never finds it then she plays . If she does, then she lets be the last position at which it occurs. Then she knows that either errs at or errs somewhere beyond . Either way, she can use this knowledge to whittle down the possibilities at . Or at least we think she can.

Notice, however, what has happened to Alice’s complexity. She is now running at every node in a length- game path. Her time is now quadratic in . This is still not terrible, not an exponential blowup of backtracking. But in honor of what Alberto Apostolico cared about, we should care about it here. So there is really a second puzzle: can Alice play perfectly in time ?

If the errors are unrelated then we would like Alice to carry out the same algorithm as for Puzzle I. The logic is not airtight, however, because of the case where there were unrelated errors at and . Worse, what if Alice doesn’t know whether the errors are related?

## Puzzle III: Two Programs

Here comes the “Freestyle” idea of using multiple programs. Let us have two programs, and . Suppose one of them can make up to errors but the other is perfect. Alice does not know which is which. Now can she play perfectly—in time?

If the errors are related then she can localize to and have the same linear time complexity as before. For simplicity let’s suppose there are just legal moves, i.e., . Here is her algorithm:

The remaining case is that one pair is and the other is . This *cannot happen*, because it means that one of the programs is making two *unrelated* errors.

This is the idea Dick originally had after the videoconference on Freestyle chess. It shows the advantage of using multiple programs to check on each other like the centaur players do. But what happens if the errors are unrelated? Call that Puzzle III.

## Puzzles IV and V

Now let’s allow both programs and to make up to errors. Can Alice still play perfectly? We venture yes, but we hesitate to make this a formal claim because Puzzles II and III are already proving harder than expected.

How much does having a third program that is perfect help?—of course not knowing which of the three programs is perfect. If instead too can make up to errors, how much worse is that? Even if Alice can still play perfectly in polynomial time, we wonder if the exponent of will depend on . Call all of this Puzzle IV.

We can add a further wrinkle that matters even for : we can consider related errors to be just one error. This makes sense in chess terms because an error in a position that is reachable from a position can affect the search by a program in . Thus the error at knocks-on and makes the play at all nodes between and the root unreliable. Let be the set of all positions at which makes errors. Then we can define to be the minimum such that there are positions such that is contained in the union of the positions from which some is reachable. This is well-defined even when the positions form a DAG not a tree.

Thus programs could err in multiple positions but still count as having a single **branch error** if those positions are all on the same branch. Anything with branch errors counts as Puzzle V. This is where our error model is starting to get realistic, but as we often find in theory, there is a lot of already-challenging ground to cover before we get there. It is time to call it a day—or a post.

## Open Problems

Our puzzles have some of Smullyan’s flavor. In a typical logic puzzle of his, Alice would be confronted by and that have different behaviors in telling the truth to arbitrary questions. The solutions in his case rely on the ability to ask questions like:

If you are a person of type … , then what would you say to the question … ?

Our situations seem different, but perhaps there are further connections between our puzzles and Smullyan’s. What do you think?

Can you solve the puzzles of kinds II or III or higher? If you or we find a clear principle behind them then this will go into a followup post.

**Update (7/31/15)** The artist Vierrae, Katerina Suvorova of Russia, has graciously contributed two new portraits of Smullyan in oil. I have used her new version of Smullyan in a ‘Magus’ robe at the top. Here is her portrait of him in more formal wear, as if he were a dinner guest at an Oxford High Table.

The originals in higher resolution are viewable on her DeviantArt page . Our great thanks to her.