0%

函数导数十三:极值

定义1

设开集\(D \subset \mathbb{R}^n\),函数\(f: D \to \mathbb{R}\),点\(\boldsymbol{x}_0 \in D\),如果存在一个去心球\(B_r(\boldsymbol{\check x}_0) \subset D\),使得对任意的\(\boldsymbol{x} \in B_r(\boldsymbol{\check x}_0)\),都有\(f(\boldsymbol{x}) \ge f(\boldsymbol{x}_0)(f(\boldsymbol{x}) > f(\boldsymbol{x}_0))\),那么\(\boldsymbol{x}_0\)称为\(f\)的一个(严格)极小值点,\(f(\boldsymbol{x}_0)\)称为函数\(f\)的一个(严格)极小值。同样可以定义(严格)极大值点与(严格)极大值。极小值与极大值统称为极值。

定义2

设开集\(D \subset \mathbb{R}^n\),函数\(f: D \to \mathbb{R}\),点\(\boldsymbol{x}_0 \in D\),如果\(\displaystyle \frac{\partial f}{\partial x_i} (\boldsymbol{a})(i=1,2,\cdots,n)\)都存在等于\(0\),则称点\(\boldsymbol{x}_0\)为函数\(f\)的驻点。

定理1

\(n\)元函数\(f\)\(\boldsymbol{a} = (a_1,\cdots,a_n)\)处取得极值,且\(\displaystyle \frac{\partial f}{\partial x_i}(\boldsymbol{a})(i=1,2,\cdots,n)\)都存在,那么\(\boldsymbol{a}\)必定是\(f\)的驻点。

证:不妨设\(f\)\(\boldsymbol{a}\)处取得极小值,那么存在球\(B_{r}(\boldsymbol{a})\),使得对任意的\(\boldsymbol{x} \in B_{\boldsymbol{r}}(\boldsymbol{a})\),有
\[ f(\boldsymbol{x}) \ge f(\boldsymbol{a}) \]
考虑单变量\(t\)的函数
\[ \varphi(t) = f(a_1,\cdots,a_{t-1},t,a_{t+1},\cdots,a_n) \]
\(t\)满足\(|t - a_i| < r\),取\(\boldsymbol{x} = (a_1,\cdots,a_{t-1},t,a_{t+1},\cdots,a_n)\),可知\(\Vert \boldsymbol{x} - \boldsymbol{a}\Vert = |t - a_i| < r\),即\(\boldsymbol{x} \in B_{\boldsymbol{r}}(\boldsymbol{a})\),所以有\(f(\boldsymbol{x}) \ge f(\boldsymbol{a})\),即\(\varphi(t) \ge \varphi(a_i)\),从而\(\varphi\)\(a_i\)处取得极小值,所以有\(\varphi^\prime(a_i) = 0\),即\(\displaystyle \frac{\partial f}{\partial x_i} (\boldsymbol{a}) = 0\)

Q.E.D.

定理2

\(\boldsymbol{x}_0\)\(n\)元函数\(f\)的一个驻点,函数\(f\)\(\boldsymbol{x}_0\)的某一邻域内有连续的二阶偏导数,
(1)如果Hesse方阵\(Hf(\boldsymbol{x}_0)\)是严格正(负)定方阵,那么\(\boldsymbol{x}_0\)\(f\)的一个严格极小(大)值点。
(2)如果Hesse方阵\(Hf(\boldsymbol{x}_0)\)是不定方阵,那么\(\boldsymbol{x}_0\)不是\(f\)的极值点。

证:(1)设\(Hf(\boldsymbol{x}_0)\)是严格正定方阵,由于\(f\)\(\boldsymbol{x}_0\)的某一邻域内有连续的二阶偏导数,从而由函数导数十二的定理5可知
\[ f(\boldsymbol{x}_0 + \boldsymbol{h}) = f(\boldsymbol{x}_0) + Jf(\boldsymbol{x}_0)\boldsymbol{h} + \frac{1}{2}\boldsymbol{h}^T Hf(\boldsymbol{x}_0) \boldsymbol{h} + o(\Vert \boldsymbol{h} \Vert^2) \quad (\boldsymbol{h} \to \boldsymbol{0}) \]
又因为\(\boldsymbol{x}_0\)\(f\)的驻点,从而上式可写为
\[ f(\boldsymbol{x}_0 + \boldsymbol{h}) - f(\boldsymbol{x}_0) = \frac{1}{2} \boldsymbol{h}^T Hf(\boldsymbol{x}_0) \boldsymbol{h} + o(\Vert \boldsymbol{h} \Vert^2) \tag {1} \]
\(\Vert \boldsymbol{y} \Vert = 1\),它的全体是单位球的球面\(\partial B_1(\boldsymbol{0})\),因为\(Hf(\boldsymbol{x}_0)\)是严格正定的,所以
\[ (y_1,\cdots,y_n) Hf(\boldsymbol{x}_0) \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = \sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\boldsymbol{x}_0) y_iy_j > 0 \]
这时单位球面上的连续函数,而单位球面是一个有界闭集,从而它在单位球面上某点取得最小值,设此最小值为\(m>0\),从而有
\[ \boldsymbol{y}^t Hf(\boldsymbol{x}_0) \boldsymbol{y} \ge m > 0 \]

\[ \frac{1}{2} \boldsymbol{h}^T Hf(\boldsymbol{x}_0) \boldsymbol{h} = \frac{1}{2} \Vert \boldsymbol{h} \Vert^2 \left( \frac{\boldsymbol{h}^T}{\Vert \boldsymbol{h}\Vert} Hf(\boldsymbol{x}_0) \frac{\boldsymbol{h}}{\Vert \boldsymbol{h} \Vert}\right) \ge \frac{m}{2} \Vert \boldsymbol{h} \Vert^2 \]
将其代入(1)式中,得
\[ f(\boldsymbol{x}_0 + \boldsymbol{h}) - f(\boldsymbol{x}_0) = \Vert \boldsymbol{h} \Vert^2 \left(\frac{m}{2} + o(1) \right) > 0 \]
即当\(\Vert \boldsymbol{h} \Vert\)充分小时,有\(f(\boldsymbol{x}_0 + \boldsymbol{h}) > f(\boldsymbol{x}_0)\)
(2)因为\(Hf(\boldsymbol{x}_0)\)是不定方阵,故存在\(\boldsymbol{p},\boldsymbol{q} \in \mathbb{R}^n\),使得
\[ \boldsymbol{p}^T Hf(\boldsymbol{x}_0) \boldsymbol{p} < 0 < \boldsymbol{q}^T Hf(\boldsymbol{x}_0) \boldsymbol{q} \]
在式(1)中分别取\(\boldsymbol{h}\)\(\varepsilon \boldsymbol{p}\)\(\varepsilon \boldsymbol{q}\),得
\[ \begin{aligned} f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{p}) - f(\boldsymbol{x}_0) & = \frac{1}{2} (\boldsymbol{p}^T Hf(\boldsymbol{x}_0) \boldsymbol{p}) \varepsilon^2 + o(\varepsilon^2) \\ &= \left( \frac{1}{2} \boldsymbol{p}^T Hf(\boldsymbol{x}_0) \boldsymbol{p} + o(1) \right) \varepsilon^2 \\ f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{q}) - f(\boldsymbol{x}_0) &= \left( \frac{1}{2} \boldsymbol{q}^T Hf(\boldsymbol{x}_0) \boldsymbol{q} + o(1) \right) \varepsilon^2 \end{aligned} \]
从而只要\(\varepsilon\)取得充分小,就有
\[ f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{p}) < f(\boldsymbol{x}_0) < f(\boldsymbol{x}_0 + \varepsilon \boldsymbol{q}) \]
也就表明\(\boldsymbol{x}_0\)不是\(f\)的极值点。

Q.E.D.

定理3

\((x_0,y_0)\)是二元函数\(f\)的驻点,\(f\)\((x_0,y_0)\)的某个邻域内有连续的二阶偏导数,记
\[ a = \frac{\partial^2 f}{\partial x^2}(x_0, y_0) \quad b=\frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) \quad c=\frac{\partial^2 f}{\partial y^2}(x_0, y_0) \]
那么:
(1)当\(ac-b^2 > 0\)\(a > 0\)时,\(f\)\((x_0,y_0)\)处有严格极小值;
(2)当\(ac-b^2 > 0\)\(a < 0\)时,\(f\)\((x_0,y_0)\)处有严格极大值;
(3)当\(ac-b^2 < 0\)时,\(f\)\((x_0,y_0)\)处没有极值。

证:由定理2易证。

Q.E.D.

定义3:条件极值

\(D\)\(\mathbb{R}^{n+m}\)中的开集,
\[ f(x_1,\cdots,x_n,y_1,\cdots,y_m) \tag{2} \]
是定义在\(D\)上的一个函数,先设变量\(x_1,\cdots,x_n,y_1,\cdots,y_m\)满足以下\(m\)个条件约束:
\[ \left\{ \begin{aligned} & \Phi_1(x_1,\cdots,x_n,y_1,\cdots,y_m) = 0 \\ & \cdots \\ & \Phi_m(x_1, \cdots,x_n,y_1,\cdots,y_m) = 0 \end{aligned} \right. \tag{3} \]
那么函数(2)在条件(3)下的极值称为条件极值。

定理4

设开集\(D \subset \mathbb{R}^{n+m}\),函数\(f: D \to \mathbb{R}\),映射\(\boldsymbol{\Phi}: D \to \mathbb{R}^m\),函数\(f\)映射\(\boldsymbol{\Phi}\)满足以下条件:
(a)\(f,\boldsymbol{\Phi} \in C^1(D)\)
(b)存在\(\boldsymbol{z}_0 = (\boldsymbol{x}_0, \boldsymbol{y}_0) \in D\),满足\(\boldsymbol{\Phi}(\boldsymbol{z}_0) = 0\),其中\(\boldsymbol{x}_0 = (a_1,\cdots,a_n)\)\(\boldsymbol{y}_n=(b_1,\cdots,b_m)\)
(c)\(\det J_{\boldsymbol{y}}\boldsymbol{\Phi}(\boldsymbol{z}_0) \ne 0\)
如果\(f\)在等式(3)的约束下,在\(\boldsymbol{z}_0\)处取得极值,那么存在\(\boldsymbol{\lambda} \in \mathbb{R}^m\),使得
\[ Jf(\boldsymbol{z}_0) + \boldsymbol{\lambda} J\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \]

证:由于\(\boldsymbol{\Phi}\)满足(a),(b),(c)三个条件,根据隐映射定理可知存在\(\boldsymbol{z}_0=(\boldsymbol{x}_0, \boldsymbol{y}_0)\)的邻域\(U = G \times H\),其中\(G\)\(H\)分别是\(\boldsymbol{x_0}\)\(\boldsymbol{y}_0\)的邻域,使得方程
\[ \boldsymbol{\Phi}(\boldsymbol{x}, \boldsymbol{y}) = \boldsymbol{0} \]
对任意的\(\boldsymbol{x} \in G\),在\(H\)中有唯一的解\(\boldsymbol{\varphi}(\boldsymbol{x})\),并且满足\(\boldsymbol{y}_0 = \boldsymbol{\varphi}(\boldsymbol{x}_0)\)
\[ J\boldsymbol{\varphi}(\boldsymbol{x}_0) = -(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0) \tag{4} \]
由于\(\boldsymbol{z}_0\)是函数\(f\)在条件式(2)下的极值点,从而\(\boldsymbol{x}_0\)便是函数\(f(\boldsymbol{x}, \boldsymbol{\varphi}(\boldsymbol{x}))\)\(G\)中的一个极值点,所以\(\boldsymbol{x}_0\)必是\(f(\boldsymbol{x}, \boldsymbol{\varphi}(\boldsymbol{x}))\)的一个驻点,从而有
\[ J_xf(\boldsymbol{z}_0) + J_yf(\boldsymbol{z}_0)J\boldsymbol{\varphi}(\boldsymbol{x}_0) = \boldsymbol{0} \]
将(4)式代入上式可得
\[ J_xf(\boldsymbol{z}_0) - J_yf(\boldsymbol{z}_0)(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \tag{5} \]

\[ \boldsymbol{\lambda} = -J_yf(\boldsymbol{z}_0)(J_y\boldsymbol{\Phi}(\boldsymbol{z}_0))^{-1} \tag{6} \]
它是一个\(m\)维的向量,从而式(5)变成
\[ J_xf(\boldsymbol{z}_0) + \boldsymbol{\lambda}J_x\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \tag{7} \]
将式(6)改写为
\[ J_yf(\boldsymbol{z}_0) + \boldsymbol{\lambda}J_y\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \tag{8} \]
结合式(7)和式(8)可得
\[ Jf(\boldsymbol{z}_0) + \boldsymbol{\lambda}J\boldsymbol{\Phi}(\boldsymbol{z}_0) = \boldsymbol{0} \]

Q.E.D.

定理5

\(\boldsymbol{z}_0\)是辅助函数
\[ F(\boldsymbol{z}) = f(\boldsymbol{z}) + \sum_{i=1}^m \lambda_i \Phi_i(\boldsymbol{z}) \]
的一个驻点,其中\(\boldsymbol{z} = (z_1,\cdots,z_{n+m}) = (x_1,\cdots,x_n,y_1,\cdots,y_m)\),记
\[ HF(\boldsymbol{z}_0) = \left( \frac{\partial^2 f}{\partial z_j \partial z_k}(\boldsymbol{z}_0) \right)_{i\le j,k \le m+n} \]
(1)如果\(HF(\boldsymbol{z}_0)\)严格正定,那么\(f\)\(\boldsymbol{z}_0\)处取得严格的极小值;
(2)如果\(HF(\boldsymbol{z}_0)\)严格负定,那么\(f\)\(\boldsymbol{z}_0\)处取得严格的极大值。

证:记\(E\)是满足条件式(2)的点的全体,即
\[ E = \left\{ \boldsymbol{z} \in \mathbb{R}^{m+n}: \boldsymbol{\Phi}(\boldsymbol{z}) = \boldsymbol{0} \right\} \]
已知\(\boldsymbol{z}_0 \in E\),再在\(\boldsymbol{z}_0\)的附近取点\(\boldsymbol{z}_0 + \boldsymbol{h} \in E\),由于
\[ \boldsymbol{\Phi}(\boldsymbol{z}_0) = 0, \quad \boldsymbol{\Phi}(\boldsymbol{z}_0 + \boldsymbol{h}) = 0 \]
所以
\[ F(\boldsymbol{z}_0) = f(\boldsymbol{z}_0), \quad F(\boldsymbol{z}_0 + \boldsymbol{h}) = f(\boldsymbol{z}_0 + \boldsymbol{h}) \]
于是对\(F\)利用Taylor公式得
\[ \begin{aligned} f(\boldsymbol{z}_0 + \boldsymbol{h}) - f(\boldsymbol{z}_0) &= F(\boldsymbol{z}_0 + \boldsymbol{h}) - F(\boldsymbol{z}_0) \\ &= \sum_{i=j}^{m+n} \frac{\partial F}{\partial z_j}(\boldsymbol{z}_0)h_i + \frac{1}{2}\sum_{j,k=1}^{m+n} \frac{\partial^2 f}{\partial z_j \partial z_k}(\boldsymbol{z}_0)h_jh_k + o(\Vert \boldsymbol{h} \Vert^2) \\ &= \frac{1}{2} \boldsymbol{h}^T HF(\boldsymbol{z}_0) \boldsymbol{h} + o(\Vert \boldsymbol{h} \Vert^2) \end{aligned} \]
接下来的证明方式与定理2完全一样,不再赘述。

Q.E.D.