0%

函数导数十二:高阶偏导数和Taylor公式

定义1

设函数\(f\)在开集\(D\)上的每一点处存在偏导数:
\[ D_if(\boldsymbol{x}) = \frac{\partial f}{\partial x_i}(\boldsymbol{x}) \quad (i=1,2,\cdots,n) \]
称它们为\(f\)的一阶偏导函数,如果对这些偏导函数又可以取偏导数,得出的就是\(f\)的二阶偏导函数,依次可以定义三阶偏导数以及更高阶的偏导数。对于二阶偏导数,将一阶偏导函数\(\displaystyle \frac{\partial f}{\partial x_j}\)再对\(x_i\)求偏导数,即\(\displaystyle \frac{\partial f}{\partial x_i}\left(\frac{\partial f}{\partial x_j}\right)\)记作\(\displaystyle \frac{\partial^2 f}{\partial x_i \partial x_j}\),这里\(i,j\)独立地从\(1\)变到\(n\),如果\(i=j\),那么把\(\displaystyle \frac{\partial^2 f}{\partial x_i \partial x_i}\)记作\(\displaystyle \frac{\partial^2 f}{\partial x_i^2}(i=1,2,\cdots,n)\);如果\(i\ne j\),这类二阶偏导数称为混合偏导数。

定理1

设开集\(D \subset \mathbb{R}^2\)\(f: D \to \mathbb{R}\),如果\(\displaystyle \frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial^2 f}{\partial y \partial x}\)\((x_0,y_0)\)的某个邻域上存在,且\(\displaystyle \frac{\partial^2 f}{\partial y\partial x}\)\((x_0,y_0)\)处连续,那么\(\displaystyle \frac{\partial^2 f}{\partial x \partial y}\)\((x_0,y_0)\)处存在,而且
\[ \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x} \]

证:记
\[ \varphi(h, k) = f(x_0 + h, y_0 + k) - f(x_0+h, y_0) - f(x_0, y_0+k) + f(x_0, y_0) \]

\[ g(x) = f(x, y_0+k) - f(x, y_0) \]
从而由微分中值定理可知
\[ \begin{aligned} \varphi(h, k) & = g(x_0 + h) - g(x_0) \\ & = g^\prime(x_0 + \theta_1 h)h \\ & = \left(\frac{\partial f}{\partial x}(x_0 + \theta_1 h, y_0 + k) - \frac{\partial f}{\partial x}(x_0 + \theta_1 h, y_0) \right)h \\ & = \frac{\partial^2 f}{\partial y \partial x}(x_0 + \theta_1h, y_0 + \theta_2k)hk \end{aligned} \]
由于\(\displaystyle \frac{\partial^2 f}{\partial y \partial x}\)\((x_0, y_0)\)处连续,从而有
\[ \lim \limits_{h\to 0, k \to 0} \frac{\varphi(h ,k)}{hk} = \frac{\partial^2 f}{\partial y \partial x}(x_0, y_0) \]
而又有
\[ \lim \limits_{k \to 0} \frac{\varphi(h ,k)}{hk} = \lim \limits_{k \to 0} \frac{1}{h} \left( \frac{f(x_0+h, y_0+k) - f(x_0+h, y_0)}{k} - \frac{f(x_0, y_0+k) - f(x_0, y_0)}{k}\right) = \frac{1}{h}\left( \frac{\partial f}{\partial y}(x_0 +h, y_0) - \frac{\partial f}{\partial y}(x_0, y_0)\right) \]
所以
\[ \lim \limits_{h \to 0, k \to 0} \frac{\varphi(h ,k)}{hk} = \lim \limits_{h \to 0} \frac{1}{h}\left( \frac{\partial f}{\partial y}(x_0 +h, y_0) - \frac{\partial f}{\partial y}(x_0, y_0)\right) = \frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) \]
所以\(\displaystyle \frac{\partial^2 f}{\partial x \partial y}(x_0, y_0)\)存在,而且
\[ \frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) = \frac{\partial^2 f}{\partial y \partial x}(x_0, y_0) \]

Q.E.D.

定理2

设定义在凸区域\(D \subset \mathbb{R}^n\)上的函数\(f\)可微,则对任何两点\(\boldsymbol{a}, \boldsymbol{b} \in D\),在由\(\boldsymbol{a},\boldsymbol{b}\)确定的线段上存在一点\(\boldsymbol{\xi}\),使得
\[ f(\boldsymbol{b}) - f(\boldsymbol{a}) = Jf(\boldsymbol{\xi})(\boldsymbol{b} - \boldsymbol{a}) \]

证:由\(\boldsymbol{a}\)\(\boldsymbol{b}\)确定的线段上的点可表示为\(\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})\),这里\(t \in [0, 1]\),令
\[ \varphi(t) = f(\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})) \]
那么\(\varphi\)\([0, 1]\)上的可微函数,由单变量的微分中值定理可知,存在\(\theta \in (0,1)\),使得
\[ \varphi(1) - \varphi(0) = \varphi^\prime(\theta) \]

\[ f(\boldsymbol{b}) - f(\boldsymbol{a}) = \boldsymbol{J}f(\boldsymbol{a} + \theta (\boldsymbol{b} - \boldsymbol{a}))(\boldsymbol{b} - \boldsymbol{a}) \]
再令\(\boldsymbol{\xi} = \boldsymbol{a} + \theta(\boldsymbol{b} - \boldsymbol{a})\)即证得结论。

Q.E.D.

定理3

\(D\)\(\mathbb{R}^n\)中的区域,如果对任意的\(\boldsymbol{x} \in D\),有
\[ \frac{\partial f}{\partial x_1}(\boldsymbol{x}) = \cdots = \frac{\partial f}{\partial x_n}(\boldsymbol{x}) = 0 \]
那么\(f\)\(D\)上为一个常数。

证:如果\(D\)是凸区域,则由定理1立即得出结论。如果\(D\)不是凸区域,任取\(\boldsymbol{x}_0 \in D\),令
\[ \begin{aligned} A = \{ \boldsymbol{x} \in D: f(\boldsymbol{x}) = f(\boldsymbol{x_0})\} \\ B = \{ \boldsymbol{x} \in D: f(\boldsymbol{x}) \ne f(\boldsymbol{x_0}) \} \end{aligned} \]
显然\(A\)非空,而\(D=A \cup B\),由于\(D\)是连通开集,若能证明\(A,B\)是开集,则由点列极限六的定理3可知,\(B = \varnothing\),从而证得结论。为了证明\(A\)是开集,任取\(\boldsymbol{a} \in A \subset D\),存在\(B_{\boldsymbol{r}}(\boldsymbol{a}) \in D\),由于\(B_{\boldsymbol{r}}(\boldsymbol{a})\)是凸区域,从而\(f\)\(B_{\boldsymbol{r}}(\boldsymbol{a})\)上是常数,且对任意的\(\boldsymbol{x} \in B_{\boldsymbol{r}}(\boldsymbol{a})\),有
\[ f(\boldsymbol{x}) = f(\boldsymbol{a}) = f(\boldsymbol{x}_0) \]
从而\(B_{\boldsymbol{r}}(\boldsymbol{a}) \subset A\),也就说明\(A\)是开集。同样的方法也可以证明\(B\)是开集。再由上面分析可知命题成立。

Q.E.D.

定理4

\(k,n\)是两个正整数,那么
\[ (x_1 + \cdots + x_n)^k = \sum_{\alpha_1 + \cdots + \alpha_n = k} \frac{k!}{\alpha_1!\cdots\alpha_n!}x_1^{\alpha_1} \cdots x_n^{\alpha_n} \]
这里\(\alpha_1,\cdots,\alpha_n\)是非负整数。如果记\(\boldsymbol{\alpha} = (\alpha_1, \cdots, \alpha_n)\)\(\boldsymbol{x}=(x_1,\cdots,x_n)\),且
\[ \begin{aligned} |\boldsymbol{\alpha}| &= \alpha_1 + \cdots + \alpha_n \\ \boldsymbol{\alpha}! &= \alpha_1!\cdots\alpha_n! \\ \boldsymbol{x}^{\boldsymbol{\alpha}} &= x_1^{\alpha_1} \cdots x_n^{\alpha_n} \end{aligned} \]
则上式可简写为
\[ (x_1 + \cdots + x_n)^k = \sum_{|\boldsymbol{\alpha}|=k}\frac{k!}{\boldsymbol{\alpha}!}\boldsymbol{x}^{\boldsymbol{\alpha}} \]

证:对加项的个数\(n\)作归纳。当\(n=2\)时,该定理就是二项式定理,固然成立。先设\(n-1\)时命题成立,那么当加项的个数为\(n\)时,有
\[ \begin{aligned} (x_1 + \cdots + x_n)^k &= ((x_1 + \cdots + x_{n-1}) + x_n)^k \\ &= \sum_{\alpha_n=0}^k \frac{k!}{\alpha_n!(k-\alpha_n)!}(x_1+\cdots+x_{n-1})^{k-\alpha_n}x_n^{\alpha_n} \\ &= \sum_{\alpha_n=0}^k \frac{k!}{\alpha_n!(k-\alpha_n)!} \sum_{\alpha_1 + \alpha_{n-1}=k-\alpha_n}\frac{(k-\alpha_n)!}{\alpha_1!\cdots\alpha_{n-1}!} x_1^{\alpha_1} \cdots x_{n-1}^{\alpha_{n-1}} x_n^{\alpha_n} \\ & = \sum_{\alpha_1 + \cdots + \alpha_n = k} \frac{k!}{\alpha_1!\cdots\alpha_n!}x_1^{\alpha_1} \cdots x_n^{\alpha_n} \end{aligned} \]

Q.E.D.

定理5:Taylor公式

\(D \subset \mathbb{R}^n\)是一个凸区域,\(f \in C^{m+1}(D)\)\(\boldsymbol{a}=(a_1,\cdots,a_n)\)\(\boldsymbol{a}+\boldsymbol{h} = (a_1+h_1,\cdots,a_n+h_n)\)\(D\)中的两个点,则必存在\(\theta \in (0, 1)\),使得
\[ f(\boldsymbol{a} + \boldsymbol{h}) = \sum_{k=0}^m \sum_{|\boldsymbol{a}|=k} \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} + \boldsymbol{R}_m \]
其中
\[ D^{\boldsymbol{\alpha}}f(\boldsymbol{a}) = \frac{\partial^{\alpha_1+\cdots+\alpha_n}f}{\partial x_1^{\alpha_1} \cdots \partial x_n^{\alpha_n}}(\boldsymbol{a}) \]

\[ \boldsymbol{R}_m = \sum_{|\boldsymbol{\alpha}|=m+1} \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} \]
称为Lagrange余项。

证:固定\(\boldsymbol{a},\boldsymbol{h}\),设\(t \in [0,1]\),考虑\([0,1]\)上的函数\(\varphi(t) = f(\boldsymbol{a} + t \boldsymbol{h})\),显然\(\varphi\)\([0,1]\)上有\(m+1\)阶的连续导数,对\(\varphi\)用单变量函数的Taylor公式,得
\[ \varphi(1) = \varphi(0) + \varphi^\prime(0) + \frac{1}{2!}\varphi^{\prime\prime}(0) + \cdots + \frac{1}{m!}\varphi^{(m)}(0) + \frac{1}{(m+1)!}\varphi^{(m+1)}(\theta) \tag{1} \]
其中\(\theta \in (0, 1)\)。显然\(\varphi(1) = f(\boldsymbol{a} + \boldsymbol{h})\)\(\varphi(0) = f(\boldsymbol{a})\),根据复合函数的求导公式得
\[ \varphi^\prime(t) = \frac{\partial f}{\partial x_1}(\boldsymbol{a} + t \boldsymbol{h})h_1 + \cdots + \frac{\partial f}{\partial x_n}(\boldsymbol{a} + t \boldsymbol{h})h_n = \left(h_1\frac{\partial}{\partial x_1} + \cdots + h_n\frac{\partial}{\partial x_n}\right)f(\boldsymbol{a} + t \boldsymbol{h}) \]
从而可得
\[ \begin{aligned} \varphi^{\prime\prime}(t) &= \left(h_1\frac{\partial}{\partial x_1} + \cdots + h_n\frac{\partial}{\partial x_n}\right)^2f(\boldsymbol{a} + t\boldsymbol{h}) \\ \cdots, \\ \varphi^{(m)}(t) &= \left(h_1\frac{\partial}{\partial x_1} + \cdots + h_n\frac{\partial}{\partial x_n}\right)^mf(\boldsymbol{a} + t\boldsymbol{h}) \end{aligned} \]
根据定理4可知,
\[ \varphi^{(k)}(t) = \sum_{|\boldsymbol{\alpha}|=k}\frac{k!}{\boldsymbol{\alpha}!}\frac{\partial^{\alpha_1}}{\partial x_1^{\alpha_1}}\cdots \frac{\partial ^{\alpha_n}}{\partial x_n^{\alpha_n}}f(\boldsymbol{a} + t\boldsymbol{h})\boldsymbol{h}^{\boldsymbol{\alpha}} = \sum_{|\boldsymbol{\alpha}|=k} \frac{k!}{\boldsymbol{\alpha}!}D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + t\boldsymbol{h}) \boldsymbol{h}^{\boldsymbol{\alpha}} \]
所以
\[ \varphi^{(k)}(0) = \sum_{|\boldsymbol{\alpha}|=k} \frac{k!}{\boldsymbol{\alpha}!}D^{\boldsymbol{\alpha}}f(\boldsymbol{a}) \boldsymbol{h}^{\boldsymbol{\alpha}} \]
将其代入(1)式,即得证明的结论。

Q.E.D.

特别地

Taylor公式的前三项写出来就是
\[ f(\boldsymbol{a} + \boldsymbol{h}) = f(\boldsymbol{a}) + \frac{\partial f}{\partial x_1}(\boldsymbol{a})h_1 + \cdots + \frac{\partial f}{\partial x_n}(\boldsymbol{a})h_n + \frac{1}{2} \sum_{i,j=1}^n\frac{\partial^2 f}{\partial x_i \partial x_j}(\boldsymbol{a})h_ih_j + \cdots \]
如果记
\[ Hf(\boldsymbol{a}) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2}(\boldsymbol{a}) & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}(\boldsymbol{a}) \\ \vdots & & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1}(\boldsymbol{a}) & \cdots & \frac{\partial^2 f}{\partial x_n^2}(\boldsymbol{a}) \end{bmatrix} \]
那么上式可写成
\[ f(\boldsymbol{a} + \boldsymbol{h}) = f(\boldsymbol{a}) + Jf(\boldsymbol{a})\boldsymbol{h} + \frac{1}{2}\boldsymbol{h}^T Hf(\boldsymbol{a}) \boldsymbol{h} + \cdots \]
这里\(Hf\)称为\(f\)的Hesse方阵,它是一个\(n\)阶对称方阵。

定理6

\(D \subset \mathbb{R}^n\)是一个凸区域,\(f \in C^m(D)\)\(\boldsymbol{a}\)\(\boldsymbol{a}+\boldsymbol{h}\)\(D\)中的两个点,那么
\[ f(\boldsymbol{a} + \boldsymbol{h}) = \sum_{k=0}^m \sum_{|\boldsymbol{\alpha}|=k} \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} + o(\Vert \boldsymbol{h} \Vert^m) \quad (\boldsymbol{h} \to \boldsymbol{0}) \]

证:由定理5可知
\[ f(\boldsymbol{a} + \boldsymbol{h}) = \sum_{k=0}^{m-1} \sum_{|\boldsymbol{\alpha}|=k} \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} + \sum_{|\boldsymbol{\alpha}|=m}\frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} \tag{2} \]
其中\(\theta \in (0, 1)\),因为\(f\)\(m\)阶偏导数连续,所以
\[ \lim \limits_{\boldsymbol{h} \to \boldsymbol{0}} D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h}) = D^{\boldsymbol{\alpha}}f(\boldsymbol{a}) \quad (|\boldsymbol{\alpha}|=m) \]
从而有
\[ D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h}) = D^{\boldsymbol{\alpha}}f(\boldsymbol{a}) + o(1) \quad (\boldsymbol{h} \to 0) \]
所以
\[ \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} = \frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} + o(\boldsymbol{h}^{\boldsymbol{\alpha}}) \quad (\boldsymbol{h} \to \boldsymbol{0}) \]
\(|\boldsymbol{\alpha}|=m\)时,有
\[ |\boldsymbol{h}^{\boldsymbol{\alpha}}| = |h_1^{\alpha_1} \cdots h_n^{\alpha_n}| = |h_1|^{\alpha_1} \cdots |h_n|^{\alpha_n} \le \Vert \boldsymbol{h} \Vert^{m} \]
从而
\[ \sum_{|\boldsymbol{\alpha}| = m}\frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a} + \theta \boldsymbol{h})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} = \sum_{|\boldsymbol{\alpha}| = m}\frac{D^{\boldsymbol{\alpha}}f(\boldsymbol{a})}{\boldsymbol{\alpha}!} \boldsymbol{h}^{\boldsymbol{\alpha}} + o(\Vert \boldsymbol{h} \Vert^{m}) \quad (\boldsymbol{h} \to \boldsymbol{0}) \]
将上式代入(2)式中,即证得命题成立。

Q.E.D.

定理7:拟微分平均值定理

\(\boldsymbol{f}: [a,b] \to \mathbb{R}^m\)\([a,b]\)上的连续映射,在开区间\((a,b)\)上可微,那么存在一点\(\xi \in (a,b)\)使得
\[ \Vert \boldsymbol{f}(b) - \boldsymbol{f}(a) \Vert \le \Vert J\boldsymbol{f}(\xi) \Vert (b-a) \]

证:设\(\boldsymbol{u} = \boldsymbol{f}(b) - \boldsymbol{f}(a)\),利用\(\mathbb{R}^m\)中的内积来定义函数
\[ \varphi(t) = \left<\boldsymbol{u}, \boldsymbol{f}(t)\right> \quad (a \le t \le b) \]
易知\(\varphi\)\([a,b]\)上的连续函数,并在开区间\((a,b)\)上可微,对\(\varphi\)使用微分中值定理,可知存在一点\(\xi \in (a,b)\)使得
\[ \varphi(b) - \varphi(a) = (b-a)\varphi^\prime(\xi) = (b-a)\left<\boldsymbol{u}, J\boldsymbol{f}(\xi) \right> \]

\[ \varphi(b) - \varphi(a) = \left< \boldsymbol{u}, \boldsymbol{f}(b) \right> - \left< \boldsymbol{u}, \boldsymbol{f}(a) \right> = \left< \boldsymbol{u}, \boldsymbol{f}(b) - \boldsymbol{f}(a)\right> = \left< \boldsymbol{u}, \boldsymbol{u} \right> = \Vert \boldsymbol{u} \Vert^2 \]
由Cauchy-Schwarz不等式,可得
\[ \Vert \boldsymbol{u} \Vert^2 = (b-a)\left< \boldsymbol{u}, J\boldsymbol{f}(\xi) \right> \le (b-a)\Vert \boldsymbol{u} \Vert \Vert J\boldsymbol{f}(\xi) \Vert \]
\(\boldsymbol{u} \ne \boldsymbol{0}\)时,式子两边消去\(\Vert \boldsymbol{u} \Vert\)即得证原命题;若\(\boldsymbol{u} = \boldsymbol{0}\),命题自然成立。

Q.E.D.

定理8

设凸区域\(D \subset \mathbb{R}^n\),且映射\(\boldsymbol{f}: D \to \mathbb{R}^m\)\(D\)上可微,则对任何\(\boldsymbol{a},\boldsymbol{b} \in D\),在由\(\boldsymbol{a}, \boldsymbol{b}\)所决定的线段上必有一点\(\boldsymbol{\xi}\),使得
\[ \boldsymbol{f}(\boldsymbol{b}) - \boldsymbol{f}(\boldsymbol{a}) \le \Vert J\boldsymbol{f}(\boldsymbol{\xi}) \Vert \Vert \boldsymbol{b} - \boldsymbol{a}\Vert \]

证:由\(\boldsymbol{a}\)\(\boldsymbol{b}\)所决定的线段可表示为
\[ \boldsymbol{r}(t) = \boldsymbol{a} + t (\boldsymbol{b} - \boldsymbol{a}) \quad (0 \le t \le 1) \]

\[ \boldsymbol{g}(t) = \boldsymbol{f} \circ \boldsymbol{r}(t) \]
映射\(g\)\([0,1]\)上连续,在\((0,1)\)内可微,从而
\[ J\boldsymbol{g}(t) = J\boldsymbol{f}(\boldsymbol{r}(t))(\boldsymbol{b} - \boldsymbol{a}) \]
定理7可知存在\(\tau \in (0, 1)\),使得
\[ \Vert \boldsymbol{g}(1) - \boldsymbol{g}(0)\Vert = \Vert J\boldsymbol{g}(\tau)\Vert \]

\[ \Vert \boldsymbol{f}(\boldsymbol{b}) - \boldsymbol{f}(\boldsymbol{a}) \Vert \le \Vert J\boldsymbol{f}(\boldsymbol{r}(\tau))(\boldsymbol{b} - \boldsymbol{a}) \Vert \]
\(\boldsymbol{\xi} = \boldsymbol{r}(\tau)\),可得
\[ \Vert \boldsymbol{f}(\boldsymbol{b}) - \boldsymbol{f}(\boldsymbol{a}) \Vert \le \Vert J\boldsymbol{f}(\boldsymbol{\xi})(\boldsymbol{b} - \boldsymbol{a}) \Vert \le \Vert J\boldsymbol{f}(\boldsymbol{\xi}) \Vert \Vert \boldsymbol{b} - \boldsymbol{a} \Vert \]

Q.E.D.