Game Theory Note
Static Game
Strategy form game matrix
Strategies 1 (col player) | Strategies 2 (col player) | |
Strategies 1 (row player, “you”) | row player / column player payoff | … |
Strategies 2 (row player, “you”) | … | … |
e.g., the prisoner’s dilemma
Don’t confess (col player) | confess (col player) | |
Don’t confess (row player, “you”) | -1 (row), -1 (col) | -10, 0 |
confess (row player, “you”) | 0, -10 | -4, -4 |
- Strictly dominant strategy for player p: No matter what strategy the other play choose, player p should choose this strategy
- e.g., in prisoner’s dilemma,
- if your partner choose not to confess (see column 1), you should choose confess (0 > -1)
- if your partner choose to confess (see column 2), you should still choose confess (-4 > -10)
- Therefore confessing is a strictly dominant strategy for you.
- Similarly, confessing is also a strictly dominant strategy for the other player.
- e.g., in prisoner’s dilemma,
- Best response to strategy S: Assume that the other play choose strategy S, the strategy that has the largest payoff for you is the best response to S.
- A dominate strategy for player p is a best response to every strategy of the other player (it is all possible that no player, one player and both two players have dominate strategies)
- If one player has a dominate strategy, the other player can assume that this player will play the dominate strategy.
- Nash equilibrium: if player p has a strategy S and player q has a strategy T, we say (S, T) is a Nash equilibrium if S is a best response to T and T is a best response to S.
- At Nash equilibrium, no one want to unilaterally (done only by one player, not both) switch to another strategy.
- A game can have no Nash equilibrium or multiple Nash equilibrium
- Game:
- A set of players $i \in \{1, \cdots, N\}$
- Each player $i$ have a set of strategies $s_i \in S_i$, $s \in S = \Pi_i S_i$ for all players.
- Payoff/utility function: For each combination of strategies (e.g., pair $(s_1, s_2)$ in two-player game), each player $i$ receives a payoff $u_i(s_1, s_2)$ (e.g., for pair $(\text{confess}, \text{not confess})$, player 1 may receive payoff 0, $u_1(\text{confess}, \text{not confess}) = 0$ and player 2 may receive payoff -10, $u_2(\text{confess}, \text{not confess}) = -10$). $u_i$ is called the utility function of player $i$.
- Can be extended to n-player game
- Pure & Mixed Strategy: a player $i$ choose a strategy $s$ deterministically or with probability $\sigma_i(s)$
- $\Sigma_i$ denotes all the possible probability distribution of $S_i$, $\sigma_i \in \Sigma_i$ for player $i$. $\sigma \in \Sigma = \Pi_i \Sigma_i$ for all players ($\sigma$ is called mixed strategy profile).
- Use expected value of payoffs to evaluate other player’s strategy (to find the best response of mixed strategy). Extend the payoff function from $u_i(s)$ to $u_i(\sigma) = \Pi_{s \in S} u_i(s)\sigma(s)$
- $B_i(\sigma_{-i})$: Best Response of player $i$ to other players’ strategy $\sigma_{-i}$
- For player $i$, given other players’ strategy $\sigma_{-i}$ fixed, $B_i(\sigma_{-i}) = \arg\max_{\sigma_i} u_i(\sigma_i, \sigma_{-i})$ is a set of mixed strategy $\sigma_i$ for player $i$ that maximize payoff function $u_i(\sigma_i, \sigma_{-i})$.
- Nash equilibrium for mixed strategy
- A mixed strategy profile $\sigma^* = (\sigma^\ast_1, \cdots, \sigma^\ast_N)$ is a Nash equilibrium if for all $i \in \{1, \cdots, N\}$, $\sigma^\ast_i \in B_i(\sigma^\ast_{-i})$.
- We can write it in a more compact way: denote $B(\sigma) = [B_1(\sigma_{-1}) \times \cdots \times B_N(\sigma_{-N})]$, then a mixed strategy profile $\sigma^\ast = (\sigma^\ast_1, \cdots, \sigma^\ast_N)$ is a Nash equilibrium if $\sigma^\ast \in B(\sigma^\ast)$.
- Theorem: Every finite game has a mixed strategy Nash equilibrium
- Kakutani’s Fixed Point Theorem
- Pareto optimality
- A strategy profile is Pareto optimal if no player can gain higher payoff without sacrificing others’ payoffs.
- Social optimality: maximizes the sum of the players‘ payoffs
- Solution as a result of learning
- Assumption: expects that his opponent’s output in the future will be the same as it is now
- Asymptotically stable if converges to a particular steady state (a single NE) for all initial state close to it
- Global stable if from every starting state
- Not always converge – circling effect
Extensive Game
Example:
- player 1 has choice C and D
- player 2 has choice E (2, 1) / F (3, 0) for C
- player 2 has choice G (0, 2) / H (1, 3) for D.
EG | EH | FG | FH | |
C | 2, 1 | 2, 1 | 3, 0 | 3, 0 |
D | 0, 2 | 1, 3 | 0, 2 | 1, 3 |
- EG ← (E or F) and (G or H)
- if you choose C, I choose E, if you choose D, I choose G
- if player 1 have 3 choices C, D and I, then we need to use three letters to denote a strategy of player 2
- Subgame Perfect (Nash) Equilibrium
- A strategy profile is a subgame perfect equilibrium if it is an equilibrium for every subgame
- subgame perfection will remove noncredible threats
Potential Game
Use one global function $\Phi(s): S \rightarrow R $ to replace $N$ payoff functions $u_i(s)$ for each player $i$.
$\Phi$ assigns a real value for every $s \in S$
- ordinal potential function: $u_i(x, s_{−i}) − u_i(z, s_{−i}) > 0 \Leftrightarrow \Phi(x, s_{−i}) − Φ(z, s_{−i}) > 0, \forall x, z \in S_i$
- exact potential function: $u_i(x, s_{−i}) − u_i(z, s_{−i}) = \Phi(x, s_{−i}) − Φ(z, s_{−i}) > 0, \forall x, z \in S_i$
- Every finite ordinal potential game has at least one pure strategy Nash equilibrium.
- The global maximum of an ordinal potential function is a pure strategy Nash equilibrium
- there may also be other pure strategy Nash equilibria corresponding to local maxima
- Path / improvement path
- In every finite ordinal potential game, every improvement path is finite.
- Congestion Games
- Every congestion game is a potential game
Learning Nash Equilibria
2-person zero-sum games ($u_1(s_1, s_2) == -u_2(s_1, s_2)$)
M (prob: $y_1$) | T (prob: $y_2$) | |
E (prob: $x_1$) | 3, -3 | -1, 1 |
S (prob: $x_2$) | -2, 2 | 1, -1 |
- Minimax Game: minimize other player’s best response (max) payoff
- For column player, expected payoff $V(M) = -3 x_1 + 2 x_2$, $V(T) = x_1 - x_2$
- So if $x_1, x_2$ is given, column player will choose the larger one of $V(M)$ and $V(T)$ as best response, his payoff will be $\max(-3 x_1 + 2 x_2, x_1 - x_2)$
- Row player should choose $x_1$ and $x_2$ so that column player’s payoff $\max(-3 x_1 + 2 x_2, x_1 - x_2)$ is minimized, that is, $\arg\min_{x_1, x_2} \max(-3 x_1 + 2 x_2, x_1 - x_2)$
- Identially, maximize his own payoff $-\max(-3 x_1 + 2 x_2, x_1 - x_2) = \min(3 x_1 - 2 x_2, - x_1 + x_2)$
- Min/max operation can be represented by linear programming by using an auxilary variable $z$ and constraint all components larger/smaller than $z$.
-
So the linear programming problem is (note that $z$ should be unbounded so we need to add auxilary variable $z’$ so that $z - z’$ is unbounded, $x_1 + x_2 = 1$ is equalty constraint so we need to transit it to two inequality constraint $x_1 + x_2 \leq 1$ and $x_1 + x_2 \geq 1$)
\[\begin{align*} \max \quad & z \\ s.t. \quad 3 x_1 - 2 x_2 &\geq z \\ - x_1 + x_2 &\geq z \\ x_1 + x_2 &= 1 \\ x_1, x_2 &\geq 0 \\ \max \quad & (0, 1, 1, -1, 0, 0)(x_1, x_2, z, z', s_1, s_2)^T \\ s.t. \quad &\begin{pmatrix} 3 & -2 & -1 & 1 & -1 & 0 \\ -1 & 1 & -1 & 1 & 0 & -1\\ 1 & 1 & 0 & 0 & 0 & 0 \end{pmatrix}\begin{pmatrix}x_1\\x_2\\z\\z'\\s_1\\s_2\end{pmatrix} = \begin{pmatrix}0\\0\\1\end{pmatrix}\\ & (x_1, x_2, z, z', s_1, s_2) \geq 0 \\ \max \quad & (0, 1, 1, -1)(x_1, x_2, z, z')^T \\ s.t. \quad &\begin{pmatrix} 3 & -2 & -1 & 1 \\ -1 & 1 & -1 & 1 \\ 1 & 1 & 0 & 0 \\ -1 & -1 & 0 & 0 \\ \end{pmatrix}\begin{pmatrix}x_1\\x_2\\z\\z'\end{pmatrix} \geq \begin{pmatrix}0\\0\\1\\1\end{pmatrix}\\ & (x_1, x_2, z, z') \geq 0 \end{align*}\] -
Similarly, for the column player, we calculate $\arg\min_{y_1, y_2} \max(3 y_1 - y_2, -2 y_1 + y_2)$
\[\begin{align*} \max \quad & w \\ s.t. \quad -3 y_1 + y_2 &\geq w \\ 2 y_1 - y_2 &\geq w \\ y_1 + y_2 &= 1 \\ y_1, y_2 &\geq 0 \\ \max \quad & (0, 0, 1) (y_1, y_2, w)^T \\ s.t. \quad & \begin{pmatrix} 3 & -1 & 1 \\ -2 & 1 & 1 \\ -1 & -1 & 0 \\ 1 & 1 & 0 \\ -1 & 0 & 0 \\ - & -1 & 0 \end{pmatrix} \begin{pmatrix}y_1\\y_2\\w\end{pmatrix} \leq 0\\ \max \quad & (0, 0, 1, -1) (y_1, y_2, w, w')^T \\ s.t. \quad & \begin{pmatrix} 3 & -1 & 1 & -1 \\ -2 & 1 & 1 & -1 \\ -1 & -1 & 0 & 0\\ 1 & 1 & 0 & 0\\ \end{pmatrix} \begin{pmatrix}y_1\\y_2\\w\\w'\end{pmatrix} \leq 0\\ & (y_1, y_2, w, w') \geq 0 \end{align*}\] - Duality
- General-sum game: linear complementarity problem
- Algorithm to solve LCP: the Lemke–Howson algorithm