" - $W$初始化为随机值后,再除以$\\sqrt{\\frac{2}{\\text{dimension of the previous layer}}}$,b初始化为0,这种方式称为\"he initialization\",可以很好地控制每一层权重,使得其不会太大,从而有效降低梯度爆炸发生的概率,尤其对relu激活函数有效。\n",
" \n",
"**4.Xavier initialization:** \n",
"\n",
" - 注:\"Xavier initialization\" $\\sqrt{\\frac{1}{\\text{dimension of the previous layer}}}$\n",