Sidon sets and sum-product phenomena

Happy new year to everyone who is still following this blog! For this new post (coming after a nontrivial break), I thought I should perhaps try something different, and so I decided that it might be a good idea to discuss an open problem which I like, and which has been on the back of my mind lately.

Long story short, it’s a problem that Oleksiy Klurman and I came up with together a few years ago, and which we excitedly shared with some of our friends at various workshops and conferences, but we never really got a chance to write it down anywhere (mostly because we couldn’t prove anything particularly interesting about it..). In the meantime, one of us started a blog, so we thought the other day that we might finally have an opportunity to say some things about this problem in a sufficiently formal way and perhaps voice some of our frustrations.

To put this into context, let me start by introducing some notation and discussing some classical results about Sidon sets. For a set of real numbers A, let s_{+}(A) denote the size of the largest subset A' of A with the property that there are no nontrivial solutions to the equation x+y=z+w, where x,y,z,w \in A' are pairwise distinct from each other. Such a set A' is called an additive Sidon subset of A, so s_{+}(A) thus denotes the size of the largest additive Sidon subset of A. One of the most classical problems in additive combinatorics is to determine s_{+}(\left\{1,\ldots,N\right\}) for sufficiently large positive integers N. For brevity, we will denote this function in this case by s_{+}(N).

Let’s discuss a few things about this s_{+}(N). First, since all the pairwise sums live in the interval \left\{1,\ldots,2N\right\}, it is easy to see that {s_{+}(N) \choose 2} \leq 2N, which implies that s_{+}(N) = O(N^{1/2}). In 1941, Erdos and Turan refined this simple observation and proved that s_{+}(N) \leq N^{1/2} + O(N^{1/4}). This was further sharpened by Lindstrom, who established the inequality

\displaystyle s_{+}(N) < N^{1/2} + N^{1/4} + 1,

which is the best known upper bound for s_{+}(N). In order to produce a good lower bound for s_{+}(N), one needs to construct a large subset of \left\{1,\ldots,N\right\} with all of its two element subset sums distinct. In their paper already cited above, Erdos and Turan proceeded as follows. Fix a prime p, and let (k^2) be the unique integer in \left\{1,\ldots,p-1\right\} congruent to k^2 modulo p. It is not hard to check that \left\{2pk + (k^2): 1 \leq k < p\right\} is a Sidon set contained in the interval \left\{2p+1,\ldots,2p(p-1)+1\right\}. In particular, for N = \lceil 2p(p-1)+1\rceil and p sufficiently large this construction already leads to the fact that

\displaystyle s_{+}(N) \geq \frac{N^{1/2}}{\sqrt{2}}(1+o(1))

holds for all N sufficiently large. This is essentially a Freiman isomorphism from the y=x^2 parabola in \mathbb{F}_{p}^{2}. Three different constructions due to Singer, Bose, and Ruzsa, all with a similar algebraic flavor, lead to a slightly better lower bound for s_{+}(N) than the one above, which is of the form

\displaystyle s_{+}(N) \geq N^{1/2}(1+o(1)).

While all of these results are quite classical, determining whether the correct size of s_{+}(N) is closer to the upper bound or to the lower bound (i.e. if the lower order term in Lindstrom’s upper bound is required) and what all these extremal Sidon sets might look like are still major open problems. We refer the interested reader to this consult this nice survey of O’Bryant and this beautiful blog post of Gowers for more information.

Similarly, one can play the same game with multiplication. For a set of real numbers A, let us now define s_{\times}(A) to be the size of the largest subset A' of A with the property that there are no nontrivial solutions to the equation xy=zw, where x,y,z,w \in A' are pairwise distinct from each other. Naturally, we will call A' a multiplicative Sidon subset of A, so like before s_{\times}(A) simply stands for the size of the largest multiplicative Sidon subset of A.

When A = \left\{1,\ldots,N\right\}, the story of s_{\times}(N):=s_{\times}(\left\{1,\ldots,N\right\}) turns out to be a bit more pleasant than the story of s_{+}(N). At first glance, this is perhaps not too surprising. First of all, it is easy to see that s_{\times}(N) is much larger than s_{+}(N). For example, one can consider A' to be the set of prime numbers inside the interval \left\{1,\ldots,N\right\}. This is a multiplicative Sidon set and, by the Prime number theorem, |A'| = \Theta(N/\log N); therefore, we know that s_{\times}(N) = \Omega(N/\log N), which is somewhat promising. Indeed, in 1938 (three years before his additive Sidon paper with Turan), Erdos had in fact already started the study this quantity and managed to further improve upon this simple construction by showing that

\displaystyle s_{\times}(N) \geq \pi(N) + c\frac{N^{3/4}}{(\log N)^{3/2}},

for some absolute constant c >0, where \pi(N) stands as usual for the number of primes in \left\{1,\ldots,N\right\}. Moreover, in the same paper he also proved that s_{\times}(N) \leq \pi(N) + O(N^{3/4}), upper bound which 31 years later he himself refined to s_{\times}(N) \leq \pi(N) + O\left(\frac{N^{3/4}}{(\log N)^{3/2}}\right), thus establishing the correct order of magnitude of the lower order term, namely

\displaystyle s_{\times}(N) = \pi(N) + \Theta\left(\frac{N^{3/4}}{(\log N)^{3/2}}\right).

So, to (abruptly) summarize: while for s_{+}(N) we are still fairly troubled regarding the correct order of magnitude and the nature of the largest additive Sidon subsets in \left\{1,\ldots,N\right\}, for the multiplicative Sidon story we know quite well what s_{\times}(N) is, and we also know that these extremal sets are intimately connected with the primes.

But what do we know about s_{+}(A) and s_{\times}(A) for other sets of reals A? When A is multiplicatively structured, say A = \left\{2,2^{2},\ldots,2^{n}\right\}, we have that s_{+}(A) = n and s_{\times}(A) = s_{+}(\left\{1,\ldots,n\right\}) = \Theta(n^{1/2}), so the roles are in some sense reversed. Needless to say, this situation resembles quite well the dichotomy that motivated the celebrated sum-product conjecture of Erdos and Szemeredi, which roughly states that no set has a polynomial saving for both the size of the sum set and the size of the product set, over the trivial quadratic upper bound. So perhaps, one can formulate an analogous conjecture.. (and, of course, that’s what we did!).

Conjecture. (KP) For every set of reals A, is there an absolute constant c > 0 such that

\displaystyle \max\left\{s_{+}(A),s_{\times}(A)\right\} \geq c|A|^{1-\epsilon}

for every \epsilon > 0?

It is perhaps important to mention that we couldn’t find this anywhere in the literature, but that we did not look terribly hard for a reference (as I said, we more or less just asked around!). So, not only that we’d be (very) excited if someone can solve it, but we’d also be quite grateful if anyone knows a reference where something along these lines has been already asked.

In the remaining time/space, I’d like to draw attention towards a particularly tantalizing special case, which seems already quite tricky. Suppose A is a set of reals with small doubling, namely |A+A| = K|A|, where K may possibly depend on |A|. We know that there are such sets with s_{+}(A) = \Theta(|A|^{1/2}), so the question here is: can one prove that we always have s_{\times}(A) = \Omega(|A|^{1-\epsilon}) for every \epsilon > 0?

By using Solymosi’s theorem that every set A satisfies E^{\times}(A) = O\left(|A+A|^2 \log |A|\right), where E^{\times}(A) stands for the number quadruples (a,b,c,d) \in A^4 with ab=cd and a, b, c, d pairwise distinct, it is not difficult to prove that every set A with |A+A| = K|A| contains a multiplicative Sidon subset of size \Omega_{K}(|A|^{2/3-\epsilon}) for every \epsilon > 0. Indeed, one can take a random subset B of A, obtained by choosing each element of A to be part of B with probability p \in [0,1] independently, and then removing one element from each multiplicative quadruple in B. The end result is some smaller (random) set A' \subset B with no multiplicative quadruples, which on average will have size

\mathbb{E}[|A'|] \geq p|A| - c p^{4} |A+A|^{2}\log |A|,

for some absolute constant c > 0. Thus, there must exist a multiplicative Sidon subset of A of at least that size. Using the fact that |A+A| = K|A| and optimizing for p in terms of |A| then proves the claim. And surprise, surprise, this is more or less the only nontrivial thing that we could say! We think that one can maybe hope to use ideas from the proof of Solymosi’s sum-product theorem more directly rather than the result itself as a blackbox in order to do better than |A|^{2/3}, but we couldn’t find any such argument. Even when K is small enough to allow the application of Freiman’s theorem, we didn’t know how to use the generalized arithmetic progression structure of A to do any better (except perhaps in some situations where one adds some further assumptions on A). So, any ideas would be very welcome!

That being said, I’d also like to add that a friendlier direction which seems to be still worth investigating is the “dual” regime where |AA|=K|A|, where K is an absolute positive constant. By the multiplicative variant of Freiman’s theorem, A must in this case lie in a multiplicative subgroup of \mathbb{C}^{*} of small rank r = r(K) (bounded only in terms of K), so by quantitative variants of Schmidt’s subspace theorem such as the one due to Amoroso and Viada it is also not too difficult to find “almost Sidon sets” of size \Omega(|A|^{1-\epsilon}). To be more precise, for every \epsilon > 0, there exists some positive integer k=k(\epsilon) and a subset A' of A of size \Omega(|A|^{1-\epsilon}) with the property that each sum of two distinct elements in A' has multiplicity at most k. This also follows from a probabilistic deletion argument in the style of the one above, although it is a bit more technical. Nevertheless, it seems likely that finite subsets of multiplicative subgroups of \mathbb{C}^{*} of small rank might actually have linear size additive Sidon subsets for some “easier” reasons (e.g. the primordial subset \left\{2,2^2,\ldots,2^n\right\}, which is contained in a rank 1 multiplicative subgroup, is Sidon itself due to the uniqueness of binary representations).

Later edit: In a comment, Oliver Roche-Newton already provided a simple construction which shows that the original conjecture cannot be true as stated for any \epsilon < 1/4: there exists a set A with no additive/multiplicative Sidon subsets of size greater than |A|^{3/4}. This set however does not have small additive or multiplicative doubling, so the two particular cases described above could still be true. It also still remains to see whether there exists \alpha > 0 such that

\displaystyle \max\left\{s_{+}(A),s_{\times}(A)\right\} \geq c|A|^{1/2+\alpha}

for some absolute constant c>0. This would already be quite interesting (and in the spirit of the sum-product theorem of Erdos and Szemeredi). That being said, it is perhaps also useful to add that

\displaystyle \min\left\{s_{+}(A),s_{\times}(A)\right\} = \Omega\left(|A|^{1/2}\right)

follows from a more general result of Komlos, Sulyok and Szemeredi which implies that s_{+}(A) = \Omega(s_{+}\left(\left\{1,\ldots,|A|\right\}\right) holds for all sets of integers A. It is not difficult to see that once we have this over the integers, s_{+}(A) = \Omega(|A|^{1/2}) also holds for all sets of reals A (by using a diophantine approximation argument, for example). Since s_{\times}(A)=s_{+}\left(\left\{\log a\ :\ a \in A\right\}\right) (ignoring small issues), this also implies s_{\times}(A) = \Omega(|A|^{1/2}).

Later edit 2: In the meantime, I’ve also become aware of improved constructions by Oliver Roche-Newton+Audie Warren and (independently) Ben Green+Sarah Peluse which show that the original conjecture cannot be true as stated for any \epsilon < 1/3: there exists a set A with no additive/multiplicative Sidon subsets of size greater than |A|^{2/3}. Moreover, these constructions also have small additive doubling, so they also show the (rather curious) fact that the 2/3 exponent obtained via probabilistic deletion and Solymosi’s theorem which I sketched (for the size of the largest multiplicative Sidon subset in A when |A+A|=K|A|) is actually sharp. More updates to come. (I hope!)

Discrete Mathematics lecture notes

It’s been a while since I posted any mathematics on this blog, but, in my defense, it has been a crazy year! I hope everyone who reads this is well and that all of your friends and families are safe and healthy.

In a first attempt to revive the situation, I’d like to post some lectures notes I finished putting together recently for a somewhat atypical introductory discrete mathematics course I’ve been teaching (online) at Yale this semester. They are inspired by the material in the lovely book of Jiri Matousek and Jaroslav Nesetril entitled “Invitation to Discrete Mathematics” (2nd ed.), but they do not quite follow the same structure and contain many different twists, so I am hoping they will be of interest even to people who already have the book. That being said, it is perhaps important to also add that these lecture notes are not proofread very seriously and more often than not skip various stories and proofs (as an incentive for my students to show up to class!), but it is very likely that I will be updating these in the near future. In any case, I hope you enjoy!

Chapter 1: Introduction to combinatorics

Chapter 2: The Principle of Inclusion-Exclusion

Chapter 3: Intersecting sets and poset madness

Chapter 4: Introduction to graph theory

Chapter 5: Extremal graph / Ramsey theory

Chapter 6: The Probabilistic Method

If, like me, you prefer to see such materials altogether in one single file, you can do this by clicking here. Have fun!

If anyone also happens to be interested in using them for various purposes (e.g. to teach a similar course at some point) and would like to have the source files and other relevant things (e.g. the homeworks), I’d be happy to provide them. Just shoot me an e-mail!

Configurations with many incidences and no triangles

Given a set \mathcal{P} of points and a set \mathcal{L} of lines, both in \mathbb{R}^{2}, an incidence is a pair (p,\ell) \in \mathcal{P} \times \mathcal{L}. Answering a question of Erdos and Purdy, Szemeredi and Trotter proved the following important theorem:

\displaystyle I(\mathcal{P},\mathcal{L}) = O\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

The Szemeredi-Trotter theorem is one of the most remarkable results in incidence geometry not only because it is extremely useful in applications all across combinatorics, but also because it is one of the few optimal incidence theorems available. Indeed, there are two somewhat different constructions achieving the equality (up to constants): one due to Elekes and another one, less well-known, due to Erdos himself.

In this blog post, we will discuss an interesting result of Solymosi, which showcases something that these two constructions have in common (and as a matter of fact, something that all configurations with \Theta\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+\mathcal{L}\right) incidences have in common).

Theorem. (Solymosi) Let \mathcal{P} be a set of points and let \mathcal{L} be a set of lines, both in \mathbb{R}^{2}. Suppose that

\displaystyle I(\mathcal{P},\mathcal{L})=\Theta\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

Then, there must exist three non-collinear points in \mathcal{P}, with the property that every pair of points among them determines a line which is from the set \mathcal{L}.

From now on, we will also no-so-surprisingly refer to such local configurations simply as triangles, since that’s what they actually are in the plane, however note that this could easily become confusing if one is not careful, since graph theoretically these configurations actually represent copies of a cycle of length six in the incidence graph \mathcal{P} \times \mathcal{L}. Solymosi’s theorem can be therefore rephrased as saying that for every set of points \mathcal{P} and every set of lines \mathcal{L} which do not determine any triangles, we have that

\displaystyle I(\mathcal{P},\mathcal{L})=o\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

We will actually not dwell too much on the proof of Solymosi’s theorem, since in this post I would like to instead focus on a lower bound construction. Nonetheless, I’d like to at least record the strategy, since the idea is very simple and interesting. As a starting point, recall that the Szemeredi-Trotter theorem can be proved in a few ways by amplifying the trivial bounds I(\mathcal{P},\mathcal{L}) \leq |\mathcal{P}| + |\mathcal{L}|^{2} and I(\mathcal{P},\mathcal{L}) \leq |\mathcal{P}| + |\mathcal{L}|^{2}. Most famously, this has been done in an absolutely beautiful indirect manner by Szekely via the so-called crossing number inequality; however, more modernly, this has also been achieved in various other more practical ways by using partitioning theorems (such as the Guth-Katz polynomial partitioning theorem but also using older partitioning results such as the simplicial cutting theorem of Matousek (used also by Solymosi); see for instance these lecture notes of Guth or this blog post of Tao). Solymosi’s main observation is that under the additional assumption that the incidence graph \mathcal{P} \times \mathcal{L} contains no copy of a cycle of length six, one can improve on the trivial inequalities above even before the amplification step. More precisely, one can show for instance that I(\mathcal{P},\mathcal{L}) \leq 2|\mathcal{P}| + o(|\mathcal{L}|^{2}) as follows.

First, let \mathcal{P}' be the set of points in \mathcal{P} which lie on at most two lines from \mathcal{L}; note that

\displaystyle I(\mathcal{P},\mathcal{L}) = I(\mathcal{P}',\mathcal{L}) + I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) \leq 2|\mathcal{P}| + I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}).

Then, consider the graph G(\mathcal{L}) with vertex set \mathcal{L} and with edges defined as follows: for each point x \in \mathcal{P} \backslash \mathcal{P}', consider the set of lines \mathcal{L}(x) which contain of x, and cover up the vertices corresponding to \mathcal{L}(x) in G(\mathcal{L}) by edge-disjoint triangles (up to a small error). The number of edges in G(\mathcal{L}) thus constructed is then roughly the same as the number of incidences I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}), so if for some reason I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) = \Omega(|\mathcal{L}|^{2}), this would mean that G(\mathcal{L}) has \Omega(|\mathcal{L}|^2) edge-disjoint triangles. By the triangle removal lemma, having this many edge-disjoint triangles would further imply that our graph G(\mathcal{L}) must actually have \Omega(|\mathcal{L}|^3) triangles in total, and so some of these triangles would most definitely have to come from different points in \mathcal{P}, by design. It is easy to see that this would immediately be in conflict with the hypothesis that there are no cycles of length six in the incidence graph \mathcal{P} \times \mathcal{L}. Therefore, it follows that I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) = o(|\mathcal{L}|^{2}), which gives the claim above. Amplfying this bound instead of the trivial bound, one can then recover Solymosi’s theorem.

We will now move on to describe a construction of a set points and a set lines with many incidences and no triangles, which shows that Solymosi’s theorem is in some sense sharp in the regime when |\mathcal{P}| \gg |\mathcal{L}|^{2-\epsilon} or |\mathcal{P}| \gg |\mathcal{L}|^{2-\epsilon} for every \epsilon > 0. The construction is in the style of Elekes’s construction for the original Szemeredi-Trotter theorem, but with an additive combinatorics twist to force out triangles.

A set of positive integers A is called a non-averaging set of order k if for each 1 \leq u,v \leq k, the equation

\displaystyle u x_{1} + v x_{2} = (u+v)x_{3}

has no solutions with x_{1}, x_{2}, x_{3} all in A and all pairwise distinct. Furthermore, let s_{k}(n) denote the size of the largest non-averaging set of order t inside the interval \left\{1,\ldots,n\right\}. In particular, note that s_{1}(n) \leq r_{3}(n), where the latter denotes as usual the size of the largest three-term progression free set inside \left\{1,\ldots,n\right\}. It is well-known that there exists some absolute constant c > 0 such that for every sufficiently large n one has

\displaystyle r_{3}(n) \geq n \exp{(-c \sqrt{\log n})},

thanks to a celebrated construction due to Behrend. Interestingly, it is not too hard to modify this construction to show that \left\{1,\ldots,n\right\} also contains very large non-averaging sets of order k, for every k which is reasonably small compared to n. Indeed, the following nice observation of Stanchescu holds.

Theorem (Stanchescu). For every k \geq 1, there exists some absolute constant c>0 such that

\displaystyle s_{k}(n) \geq n \exp{(-c \log k \sqrt{\log n})}

for every sufficiently large n.

We can now finally get to the construction. Let r and s denote some large positive integers, with a precise relationship between them to be specified later. For now, we shall only think of r as being much larger than s. Let A then be a non-averaging set of order s inside \left\{1,\ldots,r\right\} and satisfying

\displaystyle |A| \geq r \exp{(-c \log s \sqrt{\log r})}

for some absolute constant c > 0. Define the set of points \mathcal{P} \subset \mathbb{R}^{2} by

\displaystyle \mathcal{P} = \left\{(x,y):\ x \in A, y \in \left\{1,\ldots,2rs\right\}\right\}.

Let \ell_{m,n} be the line in \mathbb{R}^{2} defined by the equation y=mx+n, and let \mathcal{L} be the set of lines defined by

\displaystyle \mathcal{L} = \left\{\ell_{m,n}: m \in \left\{1,\ldots,s\right\}, n \in \left\{1,\ldots,rs\right\}\right\}.

Note that |\mathcal{P}| = 2 |A|rs, whereas |\mathcal{L}| = rs^{2}. Furthermore, note that for every line \ell_{m,n} in \mathcal{L} and every x \in A, there is a unique element mx + n in \left\{1,\ldots,2rs\right\} such that (x,mx+n) lies on \ell_{m,n}. In particular, each line \ell_{m,n} determines at least |A| incidences with P, so I(P,L) \geq |A|rs^2.

In particular, if say s \ll \exp((\log r)^{c}) for some c < 1/2, then |A| \gg r^{1-\epsilon} for every \epsilon > 0, and

\displaystyle I(\mathcal{P},\mathcal{L}) \geq |A|rs^{2} \gg r^{2-\epsilon}s^{2} \gg (r^{3}s^{3})^{\frac{2}{3}-\frac{\epsilon}{3}} \geq (|\mathcal{P}||\mathcal{L}|)^{\frac{2}{3}-\frac{\epsilon}{3}}.

We next verify that in this configuration there are no non-collinear three points in \mathcal{P}, with each pair lying on a line from \mathcal{L}. We argue this by contradiction. Suppose there are three lines \ell_{1},\ell_{2},\ell_{3} in \mathcal{L} and three points p_{1},p_{2},p_{3} in \mathcal{P} such that their vertices induce a cycle of length six in \mathcal{P} \times \mathcal{L}, and the points p_1, p_2, p_3 are not collinear. Without loss of generality, suppose the lines \ell_{1} and \ell_{2} meet at p_{1} := (x_1,y_1),, lines \ell_{2} and \ell_{3} meet at p_{2} := (x_2,y_2), and the lines \ell_{3} and \ell_{1} meet at p_{3} := (x_3,y_3).

Also, for each i \in \left\{1,2,3\right\}, let \ell_{i} be described by the equation y=m_{i}x+n_{i}, and without loss of generality assume that m_{1} \geq m_{2} \geq m_{3}. It is easy to check that by definition of p_{1}, we have that (m_{1}-m_{2})x_{1} + n_{1}-n_{2}=0, and we also have similar equations for p_{2} and p_{3}.

Adding up these equations, it follows that

\displaystyle (m_{1}-m_{2})x_{1} + (m_{2}-m_{3})x_{2}=(m_{1}-m_{3})x_{3}.

But x_{1},x_{2},x_{3} are all in A and recall A is a non-averaging set of order s, so this would imply that x_{1}=x_{2}=x_{3}. This is a contradiction, since we assumed that the points p_1, p_2, p_3 are not collinear.

It would interesting to see in the future (maybe a future blog post!) whether the triangle removal lemma could be combined instead with the crossing number inequality proof of Szemeredi-Trotter (rather than the cutting method proof) to give an alternate proof for

\displaystyle I(\mathcal{P},\mathcal{L})=o\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

Another natural question also seems to be whether one can improve quantitatively on the bound above by proving new bounds for the triangle removal lemma which are specific to this geometric setting. This of course may be difficult since the only data point where this type of thing has been accomplished so far is the arithmetic setup in vector spaces over finite fields.

On the Caro-Tuza-Wei inequality

I’d like to start off this blog with a post that is written jointly with my good friend Fedor Petrov.

To set things up, recall that a k-uniform hypergraph H represents a pair (V(H),E(H)) where V(H) is a finite set of vertices and E is a family of k-element subsets of V(H). When k=2, H is simply called a graph. For v \in V(H), its degree in H, which we will denote as \deg_{H}(v), is defined to be the cardinality of the set of edges in E(H) which contain v. A subset S\subset V(H) is an independent set of the hypergraph H if there is no edge in E(H) which has all of its k elements that describe it as part of S. Equivalently, this is sometimes written as {S \choose k} \cap E(H) = \emptyset. Furthermore, the independence number of H, denoted by \alpha(H), represents the maximum size of an independent set in H, and lower bounding it in terms of the degrees of the vertices in H is the main subject of this post.

Estimating the independence number of (hyper)graphs under various hypotheses is a classical and important topic in graph theory, since many extremal combinatorics problems can be rephrased as inequalities about \alpha(H) for carefully defined hypergraphs H. We will not aim to give a complete background for this general problem, so we refer the reader to The Probabilistic by Alon and Spencer or a nice paper of Dutta, Mubayi, and Subramanian for more appropriate accounts. The story I’d like to tell here begins with the beautiful result of Caro and Wei, who independently proved that for every graph G,

\displaystyle \alpha(G) \geq \sum_{v \in V} \frac{1}{\deg(v) + 1}.

In particular, after an application of the Cauchy-Schwarz inequality, this implies the well-known Turan’s theorem about the largest number of edges a graph without a complete subgraph of a fixed size can have. Before the Caro-Wei theorem, Spencer extended Turan’s theorem to k-uniform hypergraphs, so extending the above inequality to k-uniform hypergraphs (and thus improving Spencer’s theorem) became a natural next step. Indeed, Caro and Tuza were able to prove the following generalization, which I will take the liberty to refer to as the Caro-Tuza-Wei inequality.

Theorem 1 (Caro-Tuza). Let k \geq 1 be an integer and let H=(V,E) be a (k+1)-uniform hypergraph. Then,

\displaystyle \alpha(H)\geqslant \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}}.

For k=1 this recovers the Caro-Wei theorem, but the proof is much more involved. In this post, we will give a short probabilistic proof of this theorem, by extending the celebrated proof of the Caro-Wei theorem from The Probabilistic by Alon and Spencer [Theorem 1, page 91]. It is important to mention that this is not the first time when this kind of idea is attempted. In fact, in the paper I mentioned above, Dutta, Mubayi and Subramanian generalized the proof of Caro-Wei from Alon and Spencer in an interesting (but rather complicated) way to show that there exists some positive constant d_{k} such that for every k+1 uniform hypergraph H, we have

\displaystyle \alpha(H) \geq d_{k} \sum_{v \in V(H)} \frac{1}{(\deg(v) + 1)^{1/k}}.

This is an easy (and valuable) consequence of the Caro-Tuza theorem, but it seems that the method of Dutta, Mubayi and Subramanian does not quite yield the precise version of Theorem 1. We present an alternative (simpler) approach which achieves this feature.

Probabilistic proof of the Caro-Tuza-Wei Inequality

For each v \in V, let \xi_v be an independent uniform random variable in [0,1]. Let \xi denote the |V|-tuple (\xi_{v})_{v \in V} \in [0,1]^{|V|}. For every edge e\in E in our k-uniform hypergraph H, remove a vertex v from e for which \xi_v=\max_{u\in e} \xi_u. Let S_{\xi} denote the remaining set of vertices. It is easy to check that S_{\xi} is an independent set in H. We claim that the expected value of S_{\xi} over all choices of \xi = (\xi_{v})_{v \in V} in [0,1]^{|V|} is at least the RHS expression from Theorem 1.

Let’s check this. For each vertex v \in V, we estimate the probability that v remains in S_{\xi}. Denote the degree of v by m. For a fixed value \tau \in [0,1] of \xi_v, and a given edge e containing v, the probability that e does not remove v is equal to the probability that there exists u\in e with \xi_u>\tau, which equals 1-\tau^k. Clearly these events for different edges have positive correlation: if some of them hold, it may only help the others to hold. For a linear (k+1)-uniform hypergraph H (which is a (k+1)-uniform hypergraph with the extra property that every two edges have at most one common vertex) they are truly independent. Thus the probability that they hold simultaneously is not less than (1-\tau^k)^{m}, with equality in the linear case. Therefore, the probability that v survives in S_{\xi} is at least the integral \int_0^1 (1-\tau^k)^m d\tau, which we can compute as follows. After two changes of variables, note that

\displaystyle \int_0^1 (1-\tau^k)^m d\tau = \frac1t\int_0^1 (1-\theta)^m\theta^{1/k-1}d\theta = \frac1t \cdot B(m+1,1/k),

where the function B(x,y) stands for the usual beta function

\displaystyle B(x,y) = \int_{0}^{1} \Theta^{x-1}(1-\Theta)^{y-1} d\Theta,

defined for all complex numbers x and y with positive real parts. If

\displaystyle \Gamma(z) = \int_{0}^{\infty} \Theta^{z-1}e^{-\Theta} d\Theta

denotes the standard gamma function, then it is well-known and easy to check that

\displaystyle B(x,y) = \frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)}.

Indeed, one can write

\displaystyle \Gamma(x)\Gamma(y)= \int_{0}^{\infty} \Theta^{x-1}e^{-\Theta}d\Theta \int_{0}^{\infty} \theta^{y-1}e^{-\theta}d\theta = \int_{0}^{\infty}\int_{0}^{\infty}\Theta^{x-1}\theta^{y-1}e^{-(\Theta+\theta)}d\Theta d\theta.

Using the substitutions \Theta = vt and \theta=v(1-t), we have

\displaystyle \Gamma(x)\Gamma(y) = \int_{0}^{1}t^{x-1}(1-t)^{y-1}dt \int_{0}^{\infty} v^{x+y-1}e^{-v}dv.

The latter expression is precisely B(x,y)\Gamma(x+y). It thus follows that

\displaystyle \frac{1}{t} \cdot B(m+1,1/t) = \frac{1}{t} \cdot \frac{\Gamma(m+1)\Gamma(1/t)}{\Gamma(m+1+1/t)} = \frac{1}{{m+1/t \choose m}}.

Putting these together, we conclude that for each v \in V,

\displaystyle \mathbb{P}\left(v\ \text{survives in}\ S_{\xi}\right) \geq \int_0^1 (1-\tau^k)^m d\tau = \frac1{{m+1/t\choose m}}.

By linearity of expectation, it then follows that

\displaystyle \mathbb{E}_{\xi \in \mathbb{T}^{|V|}}[|S_{\xi}|] = \sum_{v \in V} \mathbb{P}\left(v\ \text{survives in}\ S_{\xi}\right) \geq \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}},

so there is some choice of \xi = (\xi_{v})_{v \in V} \in [0,1]^{|V|} for which the corresponding independent set S:=S_{\xi} has size at least \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}}. This concludes the proof.