Virginia Vassilevska Williams stunned the world of computer science by presenting the first improvement to the matrix multiplication constant in twenty years (later it was found that a more modest progress had been obtained independently and slightly earlier by Andrew James Stothers). The current world record is held by François Le Gall, who showed that by perfecting the methods of Vassilevska Williams. For an exposition of this line of work, check out my recent lecture notes. For the full details, I recommend Le Gall’s paper and the journal version of Stothers’ thesis (with his advisor A. M. Davie). The recent progress heavily builds on classic work by Coppersmith and Winograd. Briefly, they came up with a basic identity known as the Coppersmith–Winograd identity. Applying Strassen’s laser method with their own ingenious construction (relying on Salem–Spencer sets), they obtained the bound . Applying their method to the tensor square of the basic identity, they obtained the improved bound . That’s where things had been standing since 1990, until Stothers managed to perform the computations for the fourth tensor power, obtaining the bound in 2010. A year later (but independently), Vassilevska Williams analyzed the eighth tensor power and obtained the bound . Further progress was obtained by Le Gall, who analyzed the sixteenth and thirty-second tensor powers, obtaining the current world record stated above. Although taking ever higher tensor powers improves the bound on , the improvements are diminishing, and it seems safe to conjecture that taking the th tensor power does not yield the conjectured as . However, how can we be certain? Perhaps the improvements slow down, but like a slowly divergent series, eventually go all the way down to ? In the rest of the blog post, we describe a recent result of Andris Ambainis and myself, which shows that the best bound that this method can produce is , for any value of . In fact, our result allows for a wider class of methods which utilize the Coppersmith–Winograd identity, and hint to a new technique which can potentially lead to better algorithms. Very recently, Le Gall was able to use our methods to show that the best bound that can be obtained by taking an th tensor power of the Coppersmith–Winograd identity is , and so the current analysis of the identity is quite tight. The rest of this post assumes that the reader is versed in the basics of the theory of matrix multiplication algorithms. Specifically, we expect you to know the statement of Schönhage’s asymptotic sum inequality, which is described in my earlier blog post. The laser method Schönhage’s asymptotic sum inequality allows us to obtain an upper bound on given an upper bound on the border rank of a disjoint sum of matrix multiplication tensors . Strassen’s laser method is a framework for achieving the same for non-disjoint sums of matrix multiplication tensors. It is best illustrated with the following example from the paper of Coppersmith and Winograd:
Here is some integer parameter. The superscripts on the variables partition them into groups. For example, the two groups of x-variables are and . We can write this identity succinctly as
The superscripts have the same meaning, for example the first constituent tensor involves the x-variables from group , the y-variables from group , and the z-variables from group . Each constituent tensor uses all variables from each group, and this convention gives a unique interpretation of the short form of the identity.
Denote the tensor whose border rank is estimated by . This tensor is a sum of three matrix multiplication tensors, but we cannot apply the asymptotic sum inequality since these constituent tensors are not disjoint: for example, the first two constituent tensors share the z-variables. Strassen’s idea was to take a high tensor power of the original identity, and zero variables so that the remaining constituent tensors are disjoint (Strassen actually considered a different operation, degeneration, instead of zeroing, but Coppersmith and Winograd preferred to use the simpler operation of zeroing). The th tensor power of has constituent tensors, and border rank at most . For example, when , we obtain
The partition of the x-variables of into two groups naturally corresponds to a partition of the x-variables of into four groups, indexed by vectors of length . These indices are used above to form the index triples appearing as superscripts. In fact, the dimensions of any of the constituent matrix multiplication tensors can be recovered from its index triple, so we can “summarize” by giving a list of all nine index triples:
When taking the th tensor power, there will be index triples, each consisting of three vectors of length . Our task is to zero out some groups of variables so that the surviving index triples correspond to disjoint matrix multiplication tensors, and as many of these as possible. Put differently, if we denote by the list of index triples and by the remaining variables, then we want to consist of index triples in which no x-index repeats, no y-index repeats, and no z-index repeats. If we manage to accomplish this, generating surviving index triples, then the asymptotic sum inequality gives the bound
(Since each of the constituent tensors in satisfies , each constituent tensor in satisfies .)
Let be the maximum of over all legal choices of . It turns out that , and so the asymptotic sum inequality shows that . When , this gives the bound . We call the limit the capacity.
Upper bounds on the capacity
Coppersmith and Winograd used an ingenious construction to show that , but they neglected to mention a matching upper bound showing that their construction is optimal. This upper bound, whose philosophy is essential to our approach, appears curiously enough in the paper by H. Cohn, R. Kleinberg, B. Szegedy and C. Umans on group-theoritic algorithms for matrix multiplication. Their basic idea is to use the fact that after zeroing out variables, no x-index repeats, and so is at most the number of x-indices. Since there are at most x-indices, this gives and so , a bound not as good as the one we had hoped for ().
To improve on this trivial bound, we divide the surviving index triples into types. An index triple has type if it results from copies of , copies of and copies of . Within each given type, the number of distinct x-indices depends on the type. For example, when , the number of distinct x-indices is . It is not hard to check that for each choice of , either the number of distinct x-indices is at most , or the same is true for either the number of distinct y-indices or the number of distinct z-indices. Since there are different types, we obtain the bound
Stirling’s approximation then shows that ; note that the number of types disappears when computing the limit since is polynomial rather than exponential in .
The Coppersmith–Winograd identity
The Coppersmith–Winograd identity improves on the previous identity by computing three more products at no cost in the border rank:
This identity is the fount and source of all improvements on since 1989. The two groups of x-variables appearing in the previous identity are joined by a new group consisting of a single variable. The curious reader can try to spell out the actual identity on her own, or she can look it up in my lecture notes. We denote the corresponding tensor by .
The laser method can be applied to this identity in much the same way as before. After taking the th tensor power, we are left with different constituent tensors , each represented by an index triple . If the index triple contains factors of one of the forms then . Our task is to zero out some variables so that the remaining constituent tensors are disjoint, and the quantity is maximized. Since we don’t know , we maximize instead for some parameter . If is the maximum of this quantity for given , then the asymptotic sum inequality shows that .
It turns out that for each , the limit exists, and the asymptotic sum inequality shows that . This reduces the analysis of to the determination of its associated capacity . Coppersmith and Winograd show that
where is the base 2 entropy function. Amazingly, the method of Cohn, Kleinberg, Szegedy and Umans shows that this lower bound is tight! When and , one checks that , and since and is increasing in , this shows that .
Powers of the Coppersmith–Winograd identity
Coppersmith and Winograd managed to get an even better bound on by considering the tensor square whose border rank is at most . If we retain the old partition into groups (so there are now nine groups of each of the x-variables, y-variables and z-variables), then a quick consideration of the laser method reveals that we should obtain the exact same bound on . Instead, Coppersmith and Winograd merge the original nine groups into five groups by putting the original group into the new group . This corresponds to the following partition of the original groups:
The original 36 constituent tensors are now grouped to 15 constituent tensors:
- 3 terms of the form .
- 6 terms of the form resulting from the merger of and into a single matrix multiplication tensor.
- 3 terms of the form resulting from the merger of three original constituent tensors: , and .
- 3 terms whose index triple is of the form that are not matrix multiplication tensors, one of which results from merging , , and .
The idea is now to apply the same method, but we encounter a problem: there are three tensors which are not matrix multiplication tensors. However, each of these tensors is itself a sum of matrix multiplication tensors, and so can be analyzed using the same method. In this recursive fashion, Coppersmith and Winograd were able to analyze this second power, and obtained the bound . What Stothers and Vassilevska Williams did was to codify this recursive application of the laser method, and using this codification they were able to handle larger powers, using a computer for the calculations. Their method has one shortcoming: when analyzing higher powers, the capacity problems encountered are too hard to solve exactly. Instead, they employ costly numerical methods that produce a good lower bound on the relevant capacities. Le Gall came up with a more efficient method which produces slightly inferior bounds but is applicable to higher tensor powers. A different viewpoint: the laser method with merging Our main innovation is a different way of looking at the analysis of powers of . The recipe of the laser method consists of taking a high tensor power, zeroing groups of variables so that the remaining constituent tensors are disjoint, and applying the asymptotic sum inequality. We add an additional step: after zeroing groups of variables, we allow merging sets of constituent tensors, as long as each merged set forms a matrix multiplication tensor, and the resulting merged constituent tensors are all disjoint (before the merging step, the surviving constituent tensors don’t need to be disjoint). After the merging step, we apply the asymptotic sum inequality as before. We call this scheme the laser method with merging. We already saw some examples of the merging step in the analysis of : for example, the two constituent tensors and are merged into the matrix multiplication tensor . Coppersmith and Winograd’s analysis of is an example of the laser method with merging applied to the original tensor . To see this, let us examine their analysis. They first merge some tensors in , and are left with a sum of matrix multiplication tensors and three “complicated” tensors, which we denote by , each of which is by itself a sum of matrix multiplication tensors. They proceed to take an th tensor power of , and zero some groups of variables (each group corresponds to several original groups) so that the resulting constituent tensors are disjoint. Each surviving constituent tensor contains a factor of the form (the powers of are always the same in their construction), and by zeroing out groups of variables internal to this factor, they reduce it to a disjoint sum of matrix multiplication tensors. After this second step of zeroing, we are left with a disjoint sum of matrix multiplication tensors, to which the asymptotic sum inequality is finally applied. Instead of first merging the constituent tensors of , taking an th tensor power, and then zeroing variables in two steps, we can reverse the order of steps. Starting with , we first zero out variables, and then do the appropriate merging. We end up with the exact same disjoint sum of matrix multiplication tensors as before. This hopefully convinces the reader that the analysis of Coppersmith and Winograd can be put in our framework. The analysis of higher power by Stothers, Vassilevska Williams and Le Gall adds more levels of recursion but is otherwise the same, so these also fall into our framework. If we could prove an upper bound on the corresponding notion of capacity, we would obtain a limit on the upper bound on provable using this framework. Limits on the laser method with merging Let be the maximum of over all direct sums which can be obtained from by zeroing out groups of variables and merging the surviving constituent tensors in sets. The asymptotic sum inequality shows that . It turns out that the limit exists (essentially due to the easy fact ), and so the bound obtained by the method is . Our goal in this section is to obtain an upper bound on the capacity , which correspond to a lower bound on the solution of the equation . The first step is to understand which mergers are possible — which sets of constituent tensors of can be merged to form a matrix multiplication tensor. Let’s take a look at the mergers taking place in Coppersmith and Winograd’s analysis of :
In both cases, the x-indices of the merged index triples consist entirely of zeroes.This is not the most general possible form of merging. Indeed, consider the following example for :
This results from the tensor product of and its rotation . In this example, the first two positions are always zero in the x-indices, and the last two positions are always zero in the y-indices.
Generalizing, we say that a set of constituent tensors of is merged on zeroes if for each of the positions, either all the x-indices are zero at the position, or all the y-indices are zero at the position, or all the z-indices are zero at the position. Using this nomenclature, we see that Coppersmith and Winograd’s analysis of always results in sets of constituent tensors merged on zeroes. Looking closely at the works of Stothers, Vassilevska Williams and Le Gall, we see that the same is true for their analyses. Indeed, we managed to prove that this is always the case.
Lemma. If then any set of constituent tensors of which merge to a matrix multiplication tensor is merged on zeroes.
The interested reader can look at the proof in our paper (Lemma 4.1 on page 11).
Given the lemma, our upper bound on the capacity broadly follows the ideas in the upper bound of Cohn, Kleinberg, Szegedy and Umans. Given , consider a collection of disjoint tensors resulting from starting with , zeroing out some groups of variables, and merging some of the surviving tensors. We call each merged set of tensors a line. Each line is associated with a line type , where is the number of positions at which the x-indices are always zero. Each constituent tensor inside each line has a tensor type which corresponds to how many of the original six constituent tensors were used to form it. Since the number of types is polynomial, we can focus our attention on an arbitrary line type and tensor type.
Consider for simplicity the case where the line type is and the tensor type is of the form (i.e., it gives the same weight to constituent tensors of similar shape). We can calculate the number of distinct x-indices and the number of distinct x-indices which can appear in any given line; since some of the coordinates are fixed at zero. Each constituent tensor in each line has the same dimensions , where the value of depends on the parameters . Convexity considerations show that it is advantageous for all lines to contain the same number of distinct x,y,z-indices. The total number of x-variables, y-variables and z-variables in each line is of each (since each x-index corresponds to x-variables), and so the line is isomorphic to . The total number of lines is at most (since different lines have disjoint x-indices), and so the total contribution to is .
We can similarly bound the contributions of other line types and tensor types, and through this we obtain an upper bound on and so on .
Theorem. For and , the solution to satisfies .
We obtain similar results for other values of . Curiously, the results deteriorate as gets smaller.
Le Gall used our technique to analyze the merged version of the tensor . For this tensor, he obtained a significantly improved bound.
Theorem (Le Gall). For and the merged version of , the solution to satisfies .
His result encompasses the analyses of à la Stothers, Vassilevska Williams and himself for all values of ; no value of can prove the bound .
We have seen that the Coppersmith–Winograd identity cannot be used to prove , at least not using current techniques. Anybody who wishes to prove should therefore either look for a different identity, or develop a new technique. The second avenue has been tried by Cohn and Umans, who suggested a technique based on groups. Cohn, Kleinberg, Szegedy and Umans showed how to match the bound obtained by Coppersmith and Winograd using the group-theoretic approach; their construction heavily relies on the methods of Coppersmith and Winograd, and in particular they need to use Coppersmith and Winograd’s lower bound on the capacity, which is the difficult part of both proofs. It is fair to say that so far, the group-theoretic techniques have not yielded any better upper bounds on . It thus seems that the most promising avenue for major improvement is finding better identities replacing the Coppersmith–Winograd identity, which has served us for 25 years. Perhaps computers can be enlisted for the search?
Until we find a new identity or a new technique, here is another suggestion for obtaining an improved upper bound on , albeit not . Let be the best bound that can be obtained by analysing according to the previous techniques, and let . The bound is obtained as the solution to , where is the restriction of in which each merged set of tensors is a cartesian product of tensors of width . The best bound obtained by the laser method with merging applied to , in contrast, is the solution to . It could be that
and in that case . In other words, it could be the case that allowing the merging width to grow with results in a better bound on . Unfortunately, so far we haven’t been able to obtain an upper bound on along these lines.