定义1
设开集\(D \subset \mathbb{R}^n\),函数\(f: D \to \mathbb{R}\),点\(\boldsymbol{x}_0 \in D\),如果存在一个去心球\(B_r(\boldsymbol{\check x}_0) \subset D\),使得对任意的\(\boldsymbol{x} \in B_r(\boldsymbol{\check x}_0)\),都有\(f(\boldsymbol{x}) \ge f(\boldsymbol{x}_0)(f(\boldsymbol{x}) > f(\boldsymbol{x}_0))\),那么\(\boldsymbol{x}_0\)称为\(f\)的一个(严格)极小值点,\(f(\boldsymbol{x}_0)\)称为函数\(f\)的一个(严格)极小值。同样可以定义(严格)极大值点与(严格)极大值。极小值与极大值统称为极值。
定义2
设开集\(D \subset \mathbb{R}^n\),函数\(f: D \to \mathbb{R}\),点\(\boldsymbol{x}_0 \in D\),如果\(\displaystyle \frac{\partial f}{\partial x_i} (\boldsymbol{a})(i=1,2,\cdots,n)\)都存在等于\(0\),则称点\(\boldsymbol{x}_0\)为函数\(f\)的驻点。
定理1
设\(n\)元函数\(f\)在\(\boldsymbol{a} = (a_1,\cdots,a_n)\)处取得极值,且\(\displaystyle \frac{\partial f}{\partial x_i}(\boldsymbol{a})(i=1,2,\cdots,n)\)都存在,那么\(\boldsymbol{a}\)必定是\(f\)的驻点。
证:不妨设\(f\)在\(\boldsymbol{a}\)处取得极小值,那么存在球\(B_{r}(\boldsymbol{a})\),使得对任意的\(\boldsymbol{x} \in
B_{\boldsymbol{r}}(\boldsymbol{a})\),有
\[
f(\boldsymbol{x}) \ge f(\boldsymbol{a})
\]
考虑单变量\(t\)的函数
\[
\varphi(t) = f(a_1,\cdots,a_{t-1},t,a_{t+1},\cdots,a_n)
\]
让\(t\)满足\(|t - a_i| < r\),取\(\boldsymbol{x} =
(a_1,\cdots,a_{t-1},t,a_{t+1},\cdots,a_n)\),可知\(\Vert \boldsymbol{x} - \boldsymbol{a}\Vert = |t -
a_i| < r\),即\(\boldsymbol{x} \in
B_{\boldsymbol{r}}(\boldsymbol{a})\),所以有\(f(\boldsymbol{x}) \ge
f(\boldsymbol{a})\),即\(\varphi(t) \ge
\varphi(a_i)\),从而\(\varphi\)在\(a_i\)处取得极小值,所以有\(\varphi^\prime(a_i) = 0\),即\(\displaystyle \frac{\partial f}{\partial x_i}
(\boldsymbol{a}) = 0\)。
Q.E.D.
定理2
设\(\boldsymbol{x}_0\)是\(n\)元函数\(f\)的一个驻点,函数\(f\)在\(\boldsymbol{x}_0\)的某一邻域内有连续的二阶偏导数,
(1)如果Hesse方阵\(Hf(\boldsymbol{x}_0)\)是严格正(负)定方阵,那么\(\boldsymbol{x}_0\)是\(f\)的一个严格极小(大)值点。
(2)如果Hesse方阵\(Hf(\boldsymbol{x}_0)\)是不定方阵,那么\(\boldsymbol{x}_0\)不是\(f\)的极值点。
证:(1)设\(Hf(\boldsymbol{x}_0)\)是严格正定方阵,由于\(f\)在\(\boldsymbol{x}_0\)的某一邻域内有连续的二阶偏导数,从而由函数导数十二的定理5可知
\[
f(\boldsymbol{x}_0 + \boldsymbol{h}) = f(\boldsymbol{x}_0) +
Jf(\boldsymbol{x}_0)\boldsymbol{h} + \frac{1}{2}\boldsymbol{h}^T
Hf(\boldsymbol{x}_0) \boldsymbol{h} + o(\Vert \boldsymbol{h} \Vert^2)
\quad (\boldsymbol{h} \to \boldsymbol{0})
\]
又因为\(\boldsymbol{x}_0\)是\(f\)的驻点,从而上式可写为
\[
f(\boldsymbol{x}_0 + \boldsymbol{h}) - f(\boldsymbol{x}_0) =
\frac{1}{2} \boldsymbol{h}^T Hf(\boldsymbol{x}_0) \boldsymbol{h} +
o(\Vert \boldsymbol{h} \Vert^2) \tag {1}
\]
设\(\Vert \boldsymbol{y} \Vert =
1\),它的全体是单位球的球面\(\partial
B_1(\boldsymbol{0})\),因为\(Hf(\boldsymbol{x}_0)\)是严格正定的,所以
\[
(y_1,\cdots,y_n) Hf(\boldsymbol{x}_0) \begin{bmatrix} y_1 \\ \vdots
\\ y_m \end{bmatrix} = \sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i
\partial x_j}(\boldsymbol{x}_0) y_iy_j > 0
\]
这时单位球面上的连续函数,而单位球面是一个有界闭集,从而它在单位球面上某点取得最小值,设此最小值为\(m>0\),从而有
\[
\boldsymbol{y}^t Hf(\boldsymbol{x}_0) \boldsymbol{y} \ge m > 0
\]
而
\[
\frac{1}{2} \boldsymbol{h}^T Hf(\boldsymbol{x}_0) \boldsymbol{h} =
\frac{1}{2} \Vert \boldsymbol{h} \Vert^2 \left(
\frac{\boldsymbol{h}^T}{\Vert \boldsymbol{h}\Vert} Hf(\boldsymbol{x}_0)
\frac{\boldsymbol{h}}{\Vert \boldsymbol{h} \Vert}\right) \ge \frac{m}{2}
\Vert \boldsymbol{h} \Vert^2
\]
将其代入(1)式中,得
\[
f(\boldsymbol{x}_0 + \boldsymbol{h}) - f(\boldsymbol{x}_0) = \Vert
\boldsymbol{h} \Vert^2 \left(\frac{m}{2} + o(1) \right) > 0
\]
即当\(\Vert \boldsymbol{h}
\Vert\)充分小时,有\(f(\boldsymbol{x}_0
+ \boldsymbol{h}) > f(\boldsymbol{x}_0)\)。
(2)因为\(Hf(\boldsymbol{x}_0)\)是不定方阵,故存在\(\boldsymbol{p},\boldsymbol{q} \in
\mathbb{R}^n\),使得
\[
\boldsymbol{p}^T Hf(\boldsymbol{x}_0) \boldsymbol{p} < 0 <
\boldsymbol{q}^T Hf(\boldsymbol{x}_0) \boldsymbol{q}
\]
在式(1)中分别取\(\boldsymbol{h}\)为\(\varepsilon \boldsymbol{p}\)和\(\varepsilon \boldsymbol{q}\),得
\[
\begin{aligned}
f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{p}) -
f(\boldsymbol{x}_0) & = \frac{1}{2} (\boldsymbol{p}^T
Hf(\boldsymbol{x}_0) \boldsymbol{p}) \varepsilon^2 + o(\varepsilon^2) \\
&= \left( \frac{1}{2} \boldsymbol{p}^T Hf(\boldsymbol{x}_0)
\boldsymbol{p} + o(1) \right) \varepsilon^2 \\
f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{q}) -
f(\boldsymbol{x}_0) &= \left( \frac{1}{2} \boldsymbol{q}^T
Hf(\boldsymbol{x}_0) \boldsymbol{q} + o(1) \right) \varepsilon^2
\end{aligned}
\]
从而只要\(\varepsilon\)取得充分小,就有
\[
f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{p}) <
f(\boldsymbol{x}_0) < f(\boldsymbol{x}_0 + \varepsilon
\boldsymbol{q})
\]
也就表明\(\boldsymbol{x}_0\)不是\(f\)的极值点。
Q.E.D.
定理3
设\((x_0,y_0)\)是二元函数\(f\)的驻点,\(f\)在\((x_0,y_0)\)的某个邻域内有连续的二阶偏导数,记
\[ a = \frac{\partial^2 f}{\partial x^2}(x_0, y_0) \quad b=\frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) \quad c=\frac{\partial^2 f}{\partial y^2}(x_0, y_0) \]
那么:
(1)当\(ac-b^2 > 0\)且\(a > 0\)时,\(f\)在\((x_0,y_0)\)处有严格极小值;
(2)当\(ac-b^2 > 0\)且\(a < 0\)时,\(f\)在\((x_0,y_0)\)处有严格极大值;
(3)当\(ac-b^2 < 0\)时,\(f\)在\((x_0,y_0)\)处没有极值。
证:由定理2易证。
Q.E.D.
定义3:条件极值
设\(D\)是\(\mathbb{R}^{n+m}\)中的开集,
\[ f(x_1,\cdots,x_n,y_1,\cdots,y_m) \tag{2} \]
是定义在\(D\)上的一个函数,先设变量\(x_1,\cdots,x_n,y_1,\cdots,y_m\)满足以下\(m\)个条件约束:
\[ \left\{ \begin{aligned} & \Phi_1(x_1,\cdots,x_n,y_1,\cdots,y_m) = 0 \\ & \cdots \\ & \Phi_m(x_1, \cdots,x_n,y_1,\cdots,y_m) = 0 \end{aligned} \right. \tag{3} \]
那么函数(2)在条件(3)下的极值称为条件极值。
定理4
设开集\(D \subset \mathbb{R}^{n+m}\),函数\(f: D \to \mathbb{R}\),映射\(\boldsymbol{\Phi}: D \to \mathbb{R}^m\),函数\(f\)映射\(\boldsymbol{\Phi}\)满足以下条件:
(a)\(f,\boldsymbol{\Phi} \in C^1(D)\)
(b)存在\(\boldsymbol{z}_0 = (\boldsymbol{x}_0, \boldsymbol{y}_0) \in D\),满足\(\boldsymbol{\Phi}(\boldsymbol{z}_0) = 0\),其中\(\boldsymbol{x}_0 = (a_1,\cdots,a_n)\),\(\boldsymbol{y}_n=(b_1,\cdots,b_m)\);
(c)\(\det J_{\boldsymbol{y}}\boldsymbol{\Phi}(\boldsymbol{z}_0) \ne 0\)
如果\(f\)在等式(3)的约束下,在\(\boldsymbol{z}_0\)处取得极值,那么存在\(\boldsymbol{\lambda} \in \mathbb{R}^m\),使得
\[ Jf(\boldsymbol{z}_0) + \boldsymbol{\lambda} J\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \]
证:由于\(\boldsymbol{\Phi}\)满足(a),(b),(c)三个条件,根据隐映射定理可知存在\(\boldsymbol{z}_0=(\boldsymbol{x}_0,
\boldsymbol{y}_0)\)的邻域\(U = G \times
H\),其中\(G\)和\(H\)分别是\(\boldsymbol{x_0}\)和\(\boldsymbol{y}_0\)的邻域,使得方程
\[
\boldsymbol{\Phi}(\boldsymbol{x}, \boldsymbol{y}) = \boldsymbol{0}
\]
对任意的\(\boldsymbol{x} \in
G\),在\(H\)中有唯一的解\(\boldsymbol{\varphi}(\boldsymbol{x})\),并且满足\(\boldsymbol{y}_0 =
\boldsymbol{\varphi}(\boldsymbol{x}_0)\)且
\[
J\boldsymbol{\varphi}(\boldsymbol{x}_0) =
-(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0)
\tag{4}
\]
由于\(\boldsymbol{z}_0\)是函数\(f\)在条件式(2)下的极值点,从而\(\boldsymbol{x}_0\)便是函数\(f(\boldsymbol{x},
\boldsymbol{\varphi}(\boldsymbol{x}))\)在\(G\)中的一个极值点,所以\(\boldsymbol{x}_0\)必是\(f(\boldsymbol{x},
\boldsymbol{\varphi}(\boldsymbol{x}))\)的一个驻点,从而有
\[
J_xf(\boldsymbol{z}_0) +
J_yf(\boldsymbol{z}_0)J\boldsymbol{\varphi}(\boldsymbol{x}_0) =
\boldsymbol{0}
\]
将(4)式代入上式可得
\[
J_xf(\boldsymbol{z}_0) -
J_yf(\boldsymbol{z}_0)(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0)
= \boldsymbol{0} \tag{5}
\]
记
\[
\boldsymbol{\lambda} =
-J_yf(\boldsymbol{z}_0)(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1}
\tag{6}
\]
它是一个\(m\)维的向量,从而式(5)变成
\[
J_xf(\boldsymbol{z}_0) +
\boldsymbol{\lambda}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0) =
\boldsymbol{0} \tag{7}
\]
将式(6)改写为
\[
J_yf(\boldsymbol{z}_0) +
\boldsymbol{\lambda}J_y\boldsymbol{\Phi}(\boldsymbol{z}_0) =
\boldsymbol{0} \tag{8}
\]
结合式(7)和式(8)可得
\[
Jf(\boldsymbol{z}_0) +
\boldsymbol{\lambda}J\boldsymbol{\Phi}(\boldsymbol{z}_0) =
\boldsymbol{0}
\]
Q.E.D.
定理5
设\(\boldsymbol{z}_0\)是辅助函数
\[ F(\boldsymbol{z}) = f(\boldsymbol{z}) + \sum_{i=1}^m \lambda_i \Phi_i(\boldsymbol{z}) \]
的一个驻点,其中\(\boldsymbol{z} = (z_1,\cdots,z_{n+m}) = (x_1,\cdots,x_n,y_1,\cdots,y_m)\),记
\[ HF(\boldsymbol{z}_0) = \left( \frac{\partial^2 f}{\partial z_j \partial z_k}(\boldsymbol{z}_0) \right)_{i\le j,k \le m+n} \]
(1)如果\(HF(\boldsymbol{z}_0)\)严格正定,那么\(f\)在\(\boldsymbol{z}_0\)处取得严格的极小值;
(2)如果\(HF(\boldsymbol{z}_0)\)严格负定,那么\(f\)在\(\boldsymbol{z}_0\)处取得严格的极大值。
证:记\(E\)是满足条件式(2)的点的全体,即
\[
E = \left\{ \boldsymbol{z} \in \mathbb{R}^{m+n}:
\boldsymbol{\Phi}(\boldsymbol{z}) = \boldsymbol{0} \right\}
\]
已知\(\boldsymbol{z}_0 \in
E\),再在\(\boldsymbol{z}_0\)的附近取点\(\boldsymbol{z}_0 + \boldsymbol{h} \in
E\),由于
\[
\boldsymbol{\Phi}(\boldsymbol{z}_0) = 0, \quad
\boldsymbol{\Phi}(\boldsymbol{z}_0 + \boldsymbol{h}) = 0
\]
所以
\[
F(\boldsymbol{z}_0) = f(\boldsymbol{z}_0), \quad F(\boldsymbol{z}_0
+ \boldsymbol{h}) = f(\boldsymbol{z}_0 + \boldsymbol{h})
\]
于是对\(F\)利用Taylor公式得
\[
\begin{aligned}
f(\boldsymbol{z}_0 + \boldsymbol{h}) - f(\boldsymbol{z}_0) &=
F(\boldsymbol{z}_0 + \boldsymbol{h}) - F(\boldsymbol{z}_0) \\
&= \sum_{i=j}^{m+n} \frac{\partial F}{\partial
z_j}(\boldsymbol{z}_0)h_i + \frac{1}{2}\sum_{j,k=1}^{m+n}
\frac{\partial^2 f}{\partial z_j \partial z_k}(\boldsymbol{z}_0)h_jh_k +
o(\Vert \boldsymbol{h} \Vert^2) \\
&= \frac{1}{2} \boldsymbol{h}^T HF(\boldsymbol{z}_0)
\boldsymbol{h} + o(\Vert \boldsymbol{h} \Vert^2)
\end{aligned}
\]
接下来的证明方式与定理2完全一样,不再赘述。
Q.E.D.