Apply Hoeffding to figure out the probability a poll of 1000 people is more than 5% off in its estimate. Then specify the rule of thumb for picking a sample size to get within a certain error.
\(\mathbb{P}[|X-\mu| \ge 50] \le 2e^{-\frac{2 \cdot 50^2}{1000}} = 2e^{-5} \approx 0.013\) so we are within 5% additive error with probability around 99%. A good rule of thumb is that if you have \(n\) independent binary random variables, you should expect an error on the estimate of the expected value of around \(\frac{1}{\sqrt{n}}\) percent. So, to get an error of 5%, compute \(\frac{1}{\sqrt{n}} = 0.05\), or \(\sqrt{n}=20, n=400\), then make \(n\) a bit bigger, which is why \(n=500\) sufficed. In general it's good to triple the value you get, but for large deviation (5% is large), it's OK to just make it a little bigger. ------ (Note that using Chernoff bounds can help give tighter results for when the \(\mathbb{E}[X_i]\) is closer to 0 or 1, but this is the best you can do when the expected value is around 0.5.) |
Average Case Analysis
For example, we showed that Max Cut in the \(G(n,\frac{1}{2})\) model has an average case\(1-o(1)\) approximation. In particular, we showed that there is an algorithm which gets a guarantee of \(1-o(1)\) with probability close to 1 on a graph sampled from \(G(n,\frac{1}{2})\). |
Carathéodory's Theorem
In other words, we have \(x = \sum_{i=1}^{n+1} \lambda_i v_i\) where \(\sum_{i=1}^{n+1} \lambda_i = 1\) and \(\lambda_i \ge 0\) for all \(i\). |
Chebyshev's Inequality
A slightly easier version to remember: \(\mathbb{P}[|X-\mathbb{E}[X]| \ge k\cdot \sigma] \le \frac{1}{k^2}\), where \(\sigma\) is the standard deviation of \(k\). In words: the probability \(X\) is more than \(k\) standard deviations away from its mean is at most \(\frac{1}{k^2}\). |
Christofides' Algorithm for Metric TSP
Find a minimum spanning tree \(T\) of the graph: this gives connectivity. Then, compute the minimum cost perfect matching on the nodes with odd degree in \(T\): this makes all the vertices even degree. This is a \(\frac{3}{2}\) approximation. |
Convex function
Another useful characterization is: a function is convex if its Hessian is positive semi-definite. Remember the Hessian \(\nabla^2\) is the matrix of partial derivatives with \(\nabla^2_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}\). |
Convex set
In this class, the primary example of a convex set will be a polyhedron: the intersection of finitely many half-spaces. |
Covariance matrix
\[\mathbb{E}[(x-\mathbb{E}[x])(x-\mathbb{E}[x])^T]\] If the expected value of \(x\) is the 0 vector, this simplifies to \[\mathbb{E}[xx^T]\] |
Definition of Approximation Algorithm
A randomized \(\alpha\)-approximation instead produces a solution of expected cost within a factor of \(\alpha\) for every input. |
Eigenvalue and eigenvector
An eigenvector is a vector \(x \in \mathbb{R}^n\) so that there exists some \(\lambda \in \mathbb{R}\) so that \(Ax = \lambda x\). Therefore we naturally say \(\lambda,x\) form an eigenvalue, eigenvector pair if \(Ax=\lambda x\). |
Ellipsoid method and separation oracle
The ellipsoid method is a polynomial time algorithm that can optimize over any convex set with a separation oracle. (In other words, it can solve any LP whose feasible region is convex and has a separation oracle.) In this course, \(P\) will typically be a convex polyhedron defined by a (possibly exponential size) collection of hyperplanes, and our separation oracle will return one of these hyperplanes. |
Expected Value and Linearity of Expectation
\[\mathbb{E}[X] = \sum_{\omega \in \Omega} X(\omega) \cdot \mathbb{P}(\omega)\]Alternatively, \(\mathbb{E}[X] = \sum_{x \in R_X} x \cdot \mathbb{P}[X=x]\), where \(R_X\) is the range of \(X\). This is often more useful. For a continuous random variable with probability density function \(f\), \[\mathbb{E}[X] = \int_{-\infty}^\infty x \cdot f(x) dx\]Linearity of expectation means that for any variables \(X_1,\dots,X_n\), we have \[\mathbb{E}[\sum_{i=1}^n X_i] = \sum_{i=1}^n \mathbb{E}[X]\] |
Hoeffding's Inequality
\[\mathbb{P}[|X - \mathbb{E}[X]| \ge t] \le 2e^{-\frac{2t^2}{\sum_{i=1}^n(b_i-a_i)^2}}\] |
Integer Linear Program and Linear Program Relaxation
A natural extension is convex programs. Here we only need the feasible region to be a convex set and the objective function to be convex. |
Integrality Gap
|
Law of Large Numbers
There are two ways to state this convergence. One is the weak law, which is the only one you need to know. This says that for all \(\epsilon > 0\), \(\lim_{n \to \infty} \mathbb{P}[|\overline{X_n}-\mu| > \epsilon] = 0\). ------ In case you're curious, the other is the strong law, which is in fact slightly different. It can be stated \(\mathbb{P}[\lim_{n \to\infty} \overline{X_n} = \mu] = 1\). This means that the probability that the sequence of random variables obtained converges to \(\mu\) is 1. There can be some sequences that do not converge, for example the sequence \(0,0,0,\dots\) for independent Bernoullis with success probabablity great than 1, but the strong law says the probability of obtaining such a sequence is 0. The weak law does not preclude the possibility that every realization of the sequence \(X_1,X_2,\dots\) you obtain does not have \(\mu\) as a limit: it could jump up to values much larger than \(\mu\) infinitely many times. The strong law says that with probability 1 this does not occur. |
Linear algebraic view of a vertex of a polyhedron
It is useful to notice that the subset of rows picked in \(\tilde{A}\) must be tight constraints. |
Markov's Inequality
Another slightly more memorable way to write this is \(\mathbb{P}[X \ge k \cdot \mathbb{E}[X]] \le \frac{1}{k}\), or in words, the probability a non-negative random variable is at least \(k\) times its expectation is at most \(\frac{1}{k}\). |
Metric space
1. \(d(x,x) = 0\) for all \(x \in M\): all points have distance 0 to themselves 2. If \(x \not= y\) then \(d(x,y) > 0\) (although this can be made arbitrarily small so often we allow a distance of 0) 3. \(d(x,y) = d(y,x)\) for all \(x,y \in M\) 4. \(d\) obeys the triangle inequality: \(d(x,z) \le d(x,y) + d(y,z)\) for all \(x,y,z \in M\). A standard example of a metric space is Euclidean space, where points are tuples \((x_1,x_2) \in \mathbb{R}^2\) and \(d((x_1,x_2),(y_1,y_2)) = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2}\). |
Minimum spanning tree
A minimum spanning tree (MST) can be found using a greedy procedure called Kruskal's algorithm: keep adding the edge with the smallest weight to your MST as long as it does not create a cycle. Stop when every edge creates a cycle, at which point you must have a spanning tree if the original graph was connected. |
PSD Matrix
(1) \(A\) has no negative eigenvalues (2) \(x^TAx \ge 0\) for all \(x \in \mathbb{R}^n\) (3) \(A\) has a square root \(A^{1/2}\) so that \(A^{1/2}(A^{1/2})^T = A\). |
PTAS, FPTAS, and APX
A fully polynomial time approximation schme (FPTAS) is a \((1+\epsilon)\)-approximation for a problem that runs in time polynomial in the input size and \(\frac{1}{\epsilon}\). For example, \(\frac{n^3}{\epsilon}\). For maximization problems the \(1+\epsilon\) would be replaced with \(1-\epsilon\). A problem is in APX ("approximable") if it has a constant-factor approximation algorithm. A problem is APX-Hard if it has no PTAS unless P=NP. |
Polyhedron and polytope and their matrix form
We will typically write a polyhedron as the set of points \(x \in \mathbb{R}^n\) obeying the equations \(Ax \ge b\) for some matrix \(A \in \mathbb{R}^{m \times n}\) and vector \(b \in \mathbb{R}^m\). Each row of \(A\) represents one of the halfspaces of the polyhedron. |
Proof that every PSD matrix \(Y\) has a distribution with covariance matrix \(Y\)
\[\mathbb{E}[xx^T] = \mathbb{E}[Y^{1/2}rr^TY^{1/2}]=Y^{1/2}\mathbb{E}[rr^T]Y^{1/2}=Y\]where we used that \(\mathbb{E}[rr^T] = I\). |
Psuedopolynomial time
|
Rank-Nullity Theorem
\[\text{rank}(A) + \text{nullity}(A) = n\]where the rank of a matrix is the maximum number of linearly independent rows (or columns) and the nullity is the maximum number of linearly independent vectors \(x\) for which \(Ax=0\) (in other words, the rank of the null space of \(A\)). |
Relax and round framework
This can also be applied for more general convex programs. |
Spectral Norm of a Symmetric Matrix
We also can show: \[||A||_2 = \max_{x \in \mathbb{R}^n, x \not= 0} \frac{|x^TAx|}{x^Tx}\]so, for all \(x \in \mathbb{R}^n\), \[|x^TAx| \le ||A||_2 \cdot ||x||_2^2\] |
Spectral Theorem
\[A = \sum_{i=1}^n \lambda_i v_iv_i^T\]Where recall \(v_1,\dots,v_n\) is orthonormal if \(\langle v_i, v_j\rangle = 0\) for \(i \not= j\) and \(\langle v_i, v_i \rangle = 1\). |
Variance and Variance Properties
We have \(\text{Var}(\sum_{i=1}^n X_i) = \sum_{i=1}^n \text{Var}(X_i)\) if \(X_1,\dots,X_n\) are pairwise independent. Also, \(\text{Var}(aX+b) = a^2\text{Var}(X)\) for \(a,b \in \mathbb{R}\). |