- 1 Recap
- 2 Where we’re headed
- 3 Algebraisation of
**3-SAT**: from**3-SAT**to**QUADEQ** - 4 Modifying the witness / proof format of
**QUADEQ** - 5 Test, Test, Test … The only test that matters
- 6 Diving into the weeds: relative Hamming distance between maps and distance to linear maps
- 6.1 (Normalized) Hamming distance between maps
- 6.2 How the set of Linear functions sits inside the space of functions \(\mathbb{F}_2^n\to\mathbb{F}_2\)
- 6.3 Uniform Separation
- 6.4 Hamming distance of a map to the set of linear maps
- 6.5 What \(B_\delta(\mathrm{Lin})\) looks like for different values of \(\rho\)
- 6.6 \(\rho\)-Quasi linearity
- 6.7 How to picture the set of \(\rho\)-quasi-linear maps within \(B_\delta(\mathrm{Lin})\)

- 7 Test #1 : \(\rho\)-quasi-linearity test for functions
- 8 Test #2 : probabilistically testing if two implicitly defined vectors \(u\) and \(\mathbf{U}\) satisfy \(\mathbf{U}=u\otimes u\)
- 9 Test #3 : probabilistically testing if an implicitly defined vector \(\mathbf{U}\) satisfies the linear system \(A\mathbf{U}=b\)
- 10 Tying everything together: the Verifier Test

In the first post in this series we gave basic definitions : the complexity classes **NP** and **PCP** and what it takes for a decision problem to have probabilistically checkable proofs (PCPs), and thus lie in a PCP complexity class. We also teased the statement of the weak PCP theorem:

where

In this second post we sketch a proof of this result. We warn the reader that proving it requires a bit of linear algebra and (unsurprisingly) some basic probability theory. But the general gist of the proof isn’t all that complicated (although it *is* ingenious). As is usual in these kinds of proofs, the hard part is establishing *soundness*. The proof is taken from this excellent Introduction to the PCP theorem by Min Jae Song.

To simplify notations, write **PCP** for the complexity class **PCP**\(_{1,\frac12}(O(\ln(n)), 0(1))\) that appears in the statement of the strong PCP theorem and **wPCP** for the complexity class **PCP**\(_{1,\frac12}(\mathrm{poly}(n), 0(1))\) that appears in the statement of the weak PCP theorem.

There will be a lot of ground to cover, so we shall divide the work up into smaller chunks and provide the reader with a bird eye’s view of what’s to come.

The bulk of this post will be about constructing PCPs for **3-SAT** using the decision problem known as **QUADEQ** as an intermediary. We will thus see how an to reduce **3-SAT** to **QUADEQ** (along the way we shall of course describe **QUADEQ**, i.e. its instances and witnesses). This first transformation from **3-SAT** to **QUADEQ** is very lean: hardly any complexity gets added. In doing so, we move from a typical computer science problem to one that has more of a *linear algebra* flavour to it.

The next step is again to change the problem to an equivalent one, however, this transformation comes at a **tremendous cost**: the solution/witness space grows exponentially larger in size. In other words, this modification brutally impacts the witness / PCP proof-format size for **QUADEQ**.

Thankfully, this proof-format is quite concrete and explicit: rather than a 0/1-assignment as in **3-SAT** or a \(\{0, 1\}\)-vector for **QUADEQ**, proofs in the PCP format are pairs \((f,g)\) of \(\{0, 1\}\)-valued functions given by their values \(\Gamma_f\) and \(\Gamma_g\). When we say that a function \(f\) is *given by its values*, what we mean is that we are given a complete description of all its values. For instance, we might describe an integer valued map \(f:\{-5,-3,-2,-1,0,4\}\to\mathbb{N}\) as a list of tuples, say \[
\Gamma_f=[(-5,26), (-3,10), (-2,5), (-1,2), (0,1), (4,17)]
\] where each tuple \((x,y)\in\Gamma_f\) represents an input value \(x\) of \(f\), and the output value \(y=f(x)\). This is to be contrasted with algorithmic definitions of functions, where functions are described by means of a recipe that allows one to compute their values for different inputs, say \[
f:\left\{
\begin{array}{rcl}
\mathscr{D}_f & \longrightarrow & \mathbb{N}\\
x & \mapsto & x^2+1
\end{array}
\right.
\] with domain \(\mathscr{D}_f=\{-5,-3,-2,-1,0,4\}\).

Returning to our discussion of the PCP proof format, correct proofs are special pairs of functions (*linear* functions) (Walsh-Hadamard codes) that are *derived from a witness*. The series of conversions

\[ (\mathbf{3}\textbf{-}\mathbf{SAT}\text{-witness}) \Rightarrow (\mathbf{QUADEQ}\text{-witness}) \Rightarrow (\mathbf{QUADEQ}\text{ PCP proof}) \]

is handled by the Prover and involves only trivial computations (dot products), albeit a great many of them. It requires as only input a witness \(u\) of the specific **3-SAT** instance, and cannot be precomputed. In fact, it involves so many computations that producing such PCP proofs for instances of **3-SAT** with 10 Boolean variables already is totally out of the question. We stress that this is not meant to be a practical PCP scheme. At this point we will have described the proof format, and how a Prover contructs a proof \((\Gamma_f, \Gamma_g)\) from witness data \(u\).

The job of the Verifier is very easy. It applies three different tests, all of which are instances of a single simple master test. To behold, the Verifier is handed a “proof” \(\pi=(\Gamma_f,\Gamma_g)\), and its job is

- to toss a polynomial amount of random coins,
- to perform simple polynomial time computations using only bits, and outputting either 1 or 0.

The purpose of the random coin tosses is to come up with random locations within the domains of the maps \(\Gamma_f\) and \(\Gamma_g\) where to perform queries. Thus the Verifier reads a constant number of randomly (and independently) sampled bits (\(a,b,c,d,e,A,B,C,D,E,F,G\) below) from the proof \((\Gamma_f, \Gamma_g)\) and uses them as inputs for computations. To give the reader an idea of just how simple the actual computations are, we list them here:

- fetch \(a,b,c\longleftarrow\Gamma_{f}\), check if \(a+b\overset?=c\);
- fetch \(A,B,C\longleftarrow\Gamma_{g}\), check if \(A+B\overset?=C\);
- fetch \(d,e\longleftarrow\Gamma_{f}\) and \(D,E\longleftarrow\Gamma_{g}\) and check \(d\cdot e\overset?=D+E\);
- fetch \(F,G\longleftarrow\Gamma_{g}\) and \(\vec{x}\longleftarrow\mathbb{F}_2^m\), compute a dot product \(p=\langle \vec{b}\mid \vec{x}\rangle\) check and if \(F+G\overset?=p\).

(In the last point, \(\vec{b}\) is a vector that is part of the problem description) Of course there is some structure to these queries which we gloss over for now, but we insist on the fact that the actual computations are incredibly tame. To be precise, operations of type 1. , 2., 3. and the check of 4. are constant time operations on ``bits’’ \(a, b, c, d, e, A, B, C, D, E, F, G \in\{0,1\}\). Computing the dot product \(\langle \vec{b}\mid \vec{x}\rangle\) has linear complexity in the vector size.

This whole ordeal is repeated a \(Q\) times (a few hundred times, say), and if all tests pass, the Verifier outputs \(\texttt{Accept}\).

Good “tests” are the central pillar of the **PCP** Verifier. One might claim, somewhat facetiously, that we really only use the previously mentioned simple-minded master test. All the subtlety resides in how to use it. Making good use of some properties of linear and bilinear maps is front and center in this respect. The whole point of the very expensive transformation alluded to earlier is so that we may use these. The core benefit we extract is that this new proof format brings **uniform separation** to the solution space where previously potential solutions could be arbitrarily close.

It will be quite clear from that construction that proofs which were correctly constructed by an honest Verifier in possession of correct witness data have an acceptance rate of 100%, so **Completeness** is a given. Where things get interesting is with **Soundness**, i.e. in showing that a dishonest Prover can’t come up with fake proofs for **NO** instances of **QUADEQ** that would get accepted \(\geq 50\%\) of the time. This requires some real ingenuity, and this is when the testing we talked about earlier becomes important. Careful analysis of this protocol shows that this Verifier has the soundness required of a weak PCP Verifier, and so **3-SAT** has PCPs. But **3-SAT** is but one decision problem in the infinite collection of decision problems that is **NP** : do we have to start all over for other decision problems in **NP**? No if **wPCP** contains one NP complete problem, then it contains all of **NP**. Since **3-SAT** (and **QUADEQ** for that matter) is NP-complete, the weak PCP theorem is proven :)

We describe how to convert an instance of

3-SAT, a decision problem from the realm ofBoolean algebraand Boolean constraint systems, into an equivalent problem formulated in terms oflinear algebraover the field with two elements \(\mathbb{F}_2\).

All the algebra will happen over the field with two elements \(\mathbb{F}_2\). We quickly recall its arithmetic. As a set \(\mathbb{F}_2=\{0,1\}\). Addition and multiplication are defined by

Note that for all \(x\in\mathbb{F}_2\), \(x^2=x\). The arithmetic of \(\mathbb{F}_2\) is that of the usual integers if all we care about is parity, i.e. even vs. odd numbers. Indeed, we can pass from the integers \(\mathbb{Z}\) to \(\mathbb{F}_2\) by sending an arbitrary integer \(n\in\mathbb{Z}\) to \(p(n) = 0\) if \(n\) is even, and to \(p(n) = 1\) if \(n\) is odd. Then, since the sum of two integers is even *iff* the two integers have the same parity and the product of two integers is odd *iff* both are odd, we see that for arbitrary integers \(n,m\)

\[ p(n+m) = p(n) + p(m) \quad\text{and}\quad p(n\times m) = p(n) \times p(m) \]

where the symbols \(+\) and \(\times\) on the right-hand side of each equation are those of \(\mathbb{F}_2\).

We can express all Boolean operations by using the field operations \(+\) and \(\times\) of \(\mathbb{F}_2\).

The correspondence between basic Boolean operations and algebraic operations is contained in the following table: