Imported from upstream.

GitOrigin-RevId: 6da410412a6553831e89376f8d5a486edf768000

Imported from upstream.
GitOrigin-RevId: 6da410412a6553831e89376f8d5a486edf768000
713e3194 · Megvii Engine Team · 713e3194 · 713e3194 · 713e3194 · 713e3194
39 changed file
--- a/.gitignore
+++ b/.gitignore
+build
+gen_cpp_docs/xml
+source/autogen
+source/cpp_api
+source/locale
+*.pyc
+*.state_dict
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
+{
+    "restructuredtext.confPath": "${workspaceFolder}/source"
+}
\ No newline at end of file
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
+## Our Standards
+Examples of behavior that contributes to a positive environment for our community include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior include:
+* The use of sexualized language or imagery, and sexual attention or advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others’ private information, such as a physical or email address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a professional setting
+All MegEngine Documents forums and spaces are meant for professional interactions, and any behavior which could reasonably be considered inappropriate in a professional setting is unacceptable.
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at megengine@megvii.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
+## Attribution
+This Code of Conduct is updated from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
--- a/CONTRIBUTOR_LICENSE_AGREEMENT.md
+++ b/CONTRIBUTOR_LICENSE_AGREEMENT.md
+# MegEngine Documents Contributor License Agreement
+In order to clarify the intellectual property license granted with Contributions from any person or entity, the open source project MegEngine Documents ("MegEngine Documents") must have a Contributor License Agreement (CLA) on file that has been signed by each Contributor, indicating agreement to the license terms below. This license is for your protection as a Contributor as well as the protection of MegEngine Documents and its users; it does not change your rights to use your own Contributions for any other purpose.
+This Agreement allows an individual or an entity to submit Contributions to MegEngine Documents, to authorize Contributions submitted by its designated employees to MegEngine Documents, and to grant copyright and patent licenses.
+thereto. You accept and agree to the following terms and conditions for Your present and future Contributions submitted to MegEngine Documents. Except for the license granted herein to MegEngine Documents and recipients of software distributed by MegEngine Documents, You reserve all right, title, and interest in and to Your Contributions.
+1. **Definitions**. "You" (or "Your") shall mean the copyright owner or legal entity authorized by the copyright owner that is making this Agreement with MegEngine Documents. For legal entities, the entity making a Contribution and all other entities that control, are controlled by, or are under common control with that entity are considered to be a single Contributor.
+For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+"Contribution" shall mean the code, documentation or any original work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to MegEngine Documents for inclusion in, or documentation of, any of the products owned or managed by MegEngine Documents (the "Work").
+For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to MegEngine Documents or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, MegEngine Documents for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
+2. **Grant of Copyright License**. Subject to the terms and conditions of this Agreement, You hereby grant to MegEngine Documents and to recipients of software distributed by MegEngine Documents a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
+3. **Grant of Patent License**. Subject to the terms and conditions of this Agreement, You hereby grant to MegEngine Documents and to recipients of software distributed by MegEngine Documents a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a crossclaim or counterclaim in a lawsuit) alleging that Your Contribution, or the Work to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.
+4. You represent that You are legally entitled to grant the above license. If You are an entity, You represent further that each of Your employee designated by You is authorized to submit Contributions on behalf of You. If You are an individual and Your employer(s) has rights to intellectual property that You create that includes Your Contributions, You represent further that You have received permission to make Contributions on behalf of that employer, that Your employer has waived such rights for Your Contributions to MegEngine Documents, or that Your employer has executed a separate CLA with MegEngine Documents.
+5. If you do post content or submit material on MegEngine Documents and unless we indicate otherwise, you grant MegEngine Documents a nonexclusive, royalty-free, perpetual, irrevocable, and fully sublicensable right to use, reproduce, modify, adapt, publish, perform, translate, create derivative works from, distribute, and display such content throughout the world in any media. You grant MegEngine Documents and sublicensees the right to use your GitHub Public Profile, including but not limited to name, that you submit in connection with such content. You represent and warrant that you own or otherwise control all of the rights to the content that you post; that the content is accurate; that use of the content you supply does not violate this policy and will not cause injury to any person or entity; and that you will indemnify MegEngine Documents for all claims resulting from content you supply. MegEngine Documents has the right but not the obligation to monitor and edit or remove any activity or content. MegEngine Documents takes no responsibility and assumes no liability for any content posted by you or any third party.
+6. You represent that each of Your Contributions is Your original creation. Should You wish to submit work that is not Your original creation, You may submit it to MegEngine Documents separately from any Contribution, identifying the complete details of its source and of any license or other restriction (including, but not limited to, related patents, trademarks, and license agreements) of which You are personally aware, and conspicuously marking the work as "Submitted on behalf of a third party: [named here]".
+7. You are not expected to provide support for Your Contributions, except to the extent You desire to provide support. You may provide support for free, for a fee, or not at all. Unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
+8. You agree to notify MegEngine Documents of any facts or circumstances of which You become aware that would make these representations inaccurate in any respect.
+9. This the effective date of this Contributor License Agreement is 2020/3/23. MegEngine Documents reserves the right to update or change this Agreement at any time, by posting the most current version of the Agreement on MegEngine Documents, with a new effective date. All such changes in the Agreement are effective from the effective date. Your continued use of MegEngine Documents after we post any such changes signifies your agreement to those changes. If you do not agree to the then-current Agreement, you must immediately discontinue using MegEngine Documents.
--- a/LICENSE
+++ b/LICENSE
+MegEngine Documents is Licensed under the Apache License, Version 2.0 (the "License")
+Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+2. Grant of Copyright License.
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License.
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution.
+You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+You must give any other recipients of the Work or Derivative Works a copy of this License; and
+You must cause any modified files to carry prominent notices stating that You changed the files; and
+You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+5. Submission of Contributions.
+Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks.
+This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty.
+Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability.
+In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability.
+While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
--- a/README.md
+++ b/README.md
+# MegEngine Documents
+## Prerequisites
+- Install `sphinx>=2.0` and related dependencies by:
+    ```
+    pip3 install -U sphinx sphinx-autodoc-typehints sphinx-serve sphinxcontrib-jupyter nbsphinx jieba
+    pip3 install git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
+    ```
+- reStructuredText (RST) is used for document writing. HTML files can be generated from the RST files for document visualization.
+    For more information about RST, please visit https://sphinx-doc-zh.readthedocs.io/en/latest/rest.html.
+## Generate API document
+1. Make sure you have installed [MegEngine](https://github.com/MegEngine/MegEngine).
+    ```bash
+    pip3 install megengine -f https://megengine.org.cn/whl/mge.html
+    ```
+2. Run [gen_python_docs/gendoc.sh](gen_python_docs/gendoc.sh) to generate HTML files.
+    The script accepts the previous python `site-packages` directory as the argument.
+    Default value is `~/.local/lib/python3.6/site-packages`.
+    Note that the RST files generated from python docstring are put under `source/autogen`.
+    ```bash
+    ./gen_python_docs/gendoc.sh ~/.local/lib/python3.6/site-packages
+    ```
+3. Start local sphinx service by:
+    ```bash
+    sphinx-serve -b build -p 8000
+    ```
+## Write python API document
+* How documents are generated for python codes
+    1. Write comments following docstring rules.
+    2. Run sphinx tool to generate RST files from python docstring.
+    3. Generate HTML files from RST.
+    Refer to [gen_python_docs/gendoc.sh](gen_python_docs/gendoc.sh) for more details.
+* Example python docstring: see [gen_python_docs/example/example.py](gen_python_docs/example/example.py).
+## Run doctest in API document
+API docstring also contains examples written by [doctest](https://docs.python.org/3/library/doctest.html). Run the tests by
+```
+gen_python_docs/gendoc.sh ~/.local/lib/python3.6/site-packages
+sphinx-build -b doctest source build/doctest
+```
+If all tests are passed, you shall see the following similar printouts:
+```
+Doctest summary
+===============
+   16 tests
+    0 failures in tests
+    0 failures in setup code
+    0 failures in cleanup code
+build succeeded.
+```
+Otherwise, please fix any failed test or warning.
--- a/gen_python_docs/example/__init__.py
+++ b/gen_python_docs/example/__init__.py
--- a/gen_python_docs/example/example.py
+++ b/gen_python_docs/example/example.py
+from typing import Tuple, Union
+class ExampleClass:
+    r"""brief information here
+    A new paragraph need a blank line break. If docstring contains backslashes
+    then add 'r' at the beginning.
+    Indent examples:
+        1. simple list(indent is 2! need blank line between father and children list):
+        Args:
+        * item1
+          * subitem1
+        * item2
+        * item3
+        2. definition list(no blank line between head and content):
+        Args:
+            item1
+            item2
+            item3
+        3. block quotes(need a blank line between head and content):
+        Args:
+            item1
+            item2
+            item3
+    Math examples:
+        1. single line math equation:
+        .. math::
+            Z_z = \sum_{(x, y) \in F_z} X_x Y_y
+        2. inline math symbols:
+        We can denote :math:`F` by :math:`(x,y)\mapsto z`, then
+        :math:`R_1((x,y)\mapsto z) = (y, z) \mapsto x`, and
+        :math:`R_2((x,y)\mapsto z) = (z, x) \mapsto y`; it follows
+        :math:`R_1(R_2((x,y)\mapsto z)) = R_2(R_1((x,y)\mapsto z)) =
+        (x,y)\mapsto z`.
+    Note examples:
+        .. note::
+            ``W`` and ``b`` can be provided in the ``kwargs`` to specify the filter
+            and bias.
+    Cross reference examples:
+        1. class, attribute, examples:
+        class example is :class:`~.DataType`. attribute example is :attr:`~.ExampleClass.data_type`.
+        2. reference example:
+        you need to define a reference ahead of a title first and then reference it.
+        see :ref:`example_reference` for more details.
+    Code examples(https://www.sphinx-doc.org/en/1.5/markup/code.html):
+        1. default code rendered as python code(remember the blank line and indent):
+        ::
+            import numpy
+            print(numpy.random.normal(size=(10, 10)))
+        2. show other languages' code:
+        .. code-block:: bash
+            cd $(dirname $0)
+            rm -rf build
+        3. add ``caption`` and reference it using ``name``:
+        .. code-block::
+            :caption: this.py
+            :name: this-py
+            print 'Explicit is better than implicit.'
+        4. include other source file(path is relative to your rst file rather than py file):
+        .. literalinclude:: ../conf.py
+            :language: python
+            :lines: 20-22
+    Doctest examples(https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html):
+        1. use ``test_setup`` to import some packages or define some variables which will be hidden in built pages.
+        .. testsetup::
+            import datetime
+        2. use ``testcode`` and ``testoutput`` for complete code block test.
+        .. testcode::
+            datetime.date(1994, 11, 1)         # this will give no output!
+            print(datetime.date(1994, 11, 1))  # this will give output
+        .. testoutput::
+            1994-11-01
+        3. or use ``doctest`` for interactive code block.ip
+        .. doctest::
+            >>> import datetime
+            >>> datetime.date(1994, 11, 1)
+            datetime.date(1994, 11, 1)
+    """
+    class DataType:
+        FLOAT = "FLOAT"
+        """
+        input/output both float32/float16
+        """
+        INT8x8x16 = "INT8x8x16"
+        INT8x8x32 = "INT8x8x32"
+        FLOAT_IO16xC32 = "FLOAT_IO16xC32"
+        """
+        input/output both float16, the internal compute is float32
+        """
+        QUINT8x8x32 = "QUINT8x8x32"
+        """
+        input QuantizedAsymm8, output QuantizedS32
+        """
+        INT8x8xX = "INT8x8xX"
+        """
+        input int8, output specified by tensor DType
+        """
+        QUINT4x4x32 = "QUINT4x4x32"
+        """
+        input QuantizedAsymm4, output QuantizedS32
+        """
+    _meta_data_type_type = DataType
+    __hyperparam_spec__ = (
+        ('data_type', 'cvt', _meta_data_type_type),
+        ('dilate_shape', 'cvt', _meta_data_type_type),
+        ('compute_mode', 'cvt', _meta_data_type_type),
+    )
+    data_type = _meta_data_type_type.FLOAT
+    """input/output data type"""
+    dilate_shape = (1, 1)
+    _group_spec = None
+    def __init__(self, params: dict):
+        pass
+    def example_func(
+            self,
+            kernel_shape: Union[int, Tuple[int, int]] = None,
+            output_nr_channel: int = None,
+            group: Union[int, str] = None,
+            **kwargs
+    ):
+        r"""a brief function description here.
+        :param kernel_shape: shape of the convolution kernel; it can be omitted
+            only when *W* is given as a :class:`.VarNode`
+        :param output_nr_channel: total numebr of channels for output; it can
+            be omitted only when *W* is given as a :class:`.VarNode`
+        :param group: divide the input, output and filter tensors into groups
+            to form a sparse connection; in such case, the filter would have an
+            extra first dimension as the number of groups, and filter layout is
+            ``[group, output_channel_per_group, input_channel_per_group,
+            spatial_dims...]``. Valid values include:
+            * ``None``: Do not use grouped convolution;
+            * an ``int`` value: Specify the number of groups directly;
+            * ``'chan'``: Channel-wise convolution: number of groups equals
+              to number of channels in the input tensor. In such case,
+              ``output_nr_channel`` can be omitted, and it would be set to
+              number of input channels if it is indeed ommitted.
+        """
+        self._group_spec = group
+        super().__init__(kernel_shape, output_nr_channel, **kwargs)
--- a/gen_python_docs/gendoc.sh
+++ b/gen_python_docs/gendoc.sh
+#!/bin/bash -e
+cd $(dirname $0)
+rm -rf ../build/html
+AUTOGEN=../source/autogen
+rm -rf $AUTOGEN
+if [ -z "$1" ]; then
+    ROOT_PATH=~/.local/lib/python3.6/site-packages
+else
+    ROOT_PATH=$1
+fi
+#if [ ! -f "$ROOT_PATH/megengine/example.py" ]; then
+    #ln -s $PWD/example/example.py $ROOT_PATH/megengine/
+#fi
+export SPHINX_APIDOC_OPTIONS="members,undoc-members,show-inheritance"
+for i in megengine
+do
+    sphinx-apidoc -t templates -M -o $AUTOGEN $(realpath $ROOT_PATH)/$i
+done
+tail -n +4 $AUTOGEN/megengine.data.transform.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.transform.rst
+tail -n +4 $AUTOGEN/megengine.data.transform.vision.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.transform.vision.rst
+tail -n +4 $AUTOGEN/megengine.data.dataset.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.dataset.rst
+tail -n +4 $AUTOGEN/megengine.data.dataset.vision.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.dataset.vision.rst
+# add contents on each page
+# sed -e '9i.. contents::\n' $AUTOGEN/* -i
+# add imported-members on each module
+# sed -e '/:members:/a\ \ \ \ :imported-members:' $AUTOGEN/* -i
+# fix title level
+# sed -e '/ module$/ {n; s/-/^/g}' $AUTOGEN/* -i
+# to avoid warning for unreferenced file
+rm -f $AUTOGEN/modules.rst
+cd ..
+# sphinx-build -b doctest source build/doctest
+sphinx-build source build/html
--- a/gen_python_docs/gendoc_zh.sh
+++ b/gen_python_docs/gendoc_zh.sh
+#!/bin/bash -e
+cd $(dirname $0)
+rm -rf ../build/html
+AUTOGEN=../source/autogen
+rm -rf $AUTOGEN
+if [ -z "$1" ]; then
+    ROOT_PATH=~/.local/lib/python3.6/site-packages
+else
+    ROOT_PATH=$1
+fi
+#if [ ! -f "$ROOT_PATH/megengine/example.py" ]; then
+    #ln -s $PWD/example/example.py $ROOT_PATH/megengine/
+#fi
+export SPHINX_APIDOC_OPTIONS="members,undoc-members,show-inheritance"
+for i in megengine
+do
+    sphinx-apidoc -t templates -M -o $AUTOGEN $(realpath $ROOT_PATH)/$i
+done
+tail -n +4 $AUTOGEN/megengine.data.transform.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.transform.rst
+tail -n +4 $AUTOGEN/megengine.data.transform.vision.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.transform.vision.rst
+tail -n +4 $AUTOGEN/megengine.data.dataset.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.dataset.rst
+tail -n +4 $AUTOGEN/megengine.data.dataset.vision.rst >> $AUTOGEN/megengine.data.rst
+rm $AUTOGEN/megengine.data.dataset.vision.rst
+# add contents on each page
+# sed -e '9i.. contents::\n' $AUTOGEN/* -i
+# add imported-members on each module
+# sed -e '/:members:/a\ \ \ \ :imported-members:' $AUTOGEN/* -i
+# fix title level
+# sed -e '/ module$/ {n; s/-/^/g}' $AUTOGEN/* -i
+# to avoid warning for unreferenced file
+rm -f $AUTOGEN/modules.rst
+cd ..
+sphinx-build -D language="zh_CN" source build/html/zh_CN
--- a/gen_python_docs/templates/package.rst_t
+++ b/gen_python_docs/templates/package.rst_t
+{%- macro automodule(modname, options) -%}
+.. automodule:: {{ modname }}
+{%- for option in options %}
+   :{{ option }}:
+{%- endfor %}
+{%- endmacro %}
+{%- macro toctree(docnames) -%}
+.. toctree::
+   :maxdepth: 1
+   :hidden:
+{% for docname in docnames %}
+   {{ docname }}
+{%- endfor %}
+{%- endmacro %}
+{%- if is_namespace %}
+{{- [pkgname, "namespace"] | join(" ") | e | heading }}
+{% else %}
+{{- [pkgname, "package"] | join(" ") | e | heading }}
+{% endif %}
+{%- if modulefirst and not is_namespace %}
+{{ automodule(pkgname, automodule_options) }}
+{% endif %}
+{%- if submodules %}
+{% if separatemodules %}
+{{ toctree(submodules) }}
+{%- else %}
+{%- for submodule in submodules %}
+{% if show_headings %}
+{{- submodule | e | heading(2) }}
+{% endif %}
+{{ automodule(submodule, automodule_options) }}
+{% endfor %}
+{%- endif %}
+{% endif %}
+{%- if not modulefirst and not is_namespace %}
+Module contents
+---------------
+{{ automodule(pkgname, automodule_options) }}
+{% endif %}
--- a/source/advanced/deployment.rst
+++ b/source/advanced/deployment.rst
+.. _deployment:
+模型部署
+==============================
+MegEngine 的一大核心优势是“训练推理一体化”，其中“训练”是在 Python 环境中进行的，而“推理”则特指在 C++ 环境下使用训练完成的模型进行推理。而将模型迁移到无需依赖 Python 的环境中，使其能正常进行推理计算，被称为 **部署** 。部署的目的是简化除了模型推理所必需的一切其它依赖，使推理计算的耗时变得尽可能少，比如手机人脸识别场景下会需求毫秒级的优化，而这必须依赖于 C++ 环境才能实现。
+本章从一个训练好的异或网络模型（见 `MegStudio 项目 <https://studio.brainpp.com/public-project/53>`_ ）出发，讲解如何将其部署到 CPU（X86）环境下运行。主要分为以下步骤：
+1. 将模型序列化并导出到文件；
+2. 编写读取模型的 C++ 脚本；
+3. 编译 C++ 脚本成可执行文件。
+模型序列化
+------------------------------
+为了将模型进行部署，首先我们需要使模型不依赖于 Python 环境，这一步称作 **序列化** 。序列化只支持静态图，这是因为“剥离” Python 环境的操作需要网络结构是确定不可变的，而这依赖于静态图模式下的编译操作（详情见 :ref:`dynamic_and_static_graph` ），另外编译本身对计算图的优化也是部署的必要步骤。
+在 MegEngine 中，序列化对应的接口为 :meth:`~.trace.dump` ，对于一个训练好的网络模型，我们使用以下代码来将其序列化：
+.. code-block::
+    from megengine.jit import trace
+    # 使用 trace 装饰该函数，详情见“动态图与静态图”、“静态图的两种模式”章节
+    # pred_fun 经过装饰之后已经变成了 trace 类的一个实例，而不仅仅是一个函数
+    @trace(symbolic=True)
+    def pred_fun(data, *, net):
+        net.eval()
+        pred = net(data)
+        pred_normalized = F.softmax(pred)
+        return pred_normalized
+    # 使用 trace 类的 trace 接口无需运行直接编译
+    pred_fun.trace(data, net=xor_net)
+    # 使用 trace 类的 dump 接口进行部署
+    pred_fun.dump("xornet_deploy.mge", arg_names=["data"])
+这里再解释一下编译与序列化相关的一些操作。编译会将被 :class:`~.trace` 装饰的函数（这里的 ``pred_fun`` ）视为计算图的全部流程，计算图的输入严格等于 ``pred_fun`` 的位置参数（positional arguments，即参数列表中星号 ``*`` 前的部分，这里的 ``data`` 变量），计算图的输出严格等于函数的返回值（这里的 ``pred_normalized`` ）。而这也会进一步影响到部署时模型的输入和输出，即如果运行部署后的该模型，会需要一个 ``data`` 格式的输入，返回一个 ``pred_normalized`` 格式的值。
+为了便于我们在 C++ 代码中给序列化之后的模型传入输入数据，我们需要给输入赋予一个名字，即代码中的 ``arg_names`` 参数。由于该示例中 ``pred_fun`` 只有一个位置参数，即计算图只有一个输入，所以传给 ``arg_names`` 的列表也只需一个字符串值即可，可以是任意名字，用于在 C++ 代码中引用，详情见下节内容。
+总结一下，我们对在静态图模式下训练得到的模型，可以使用 :meth:`~.trace.dump` 方法直接序列化，而无需对模型代码做出任何修改，这就是“训练推理一体化”的由来。
+编写 C++ 程序读取模型
+------------------------------
+接下来我们需要编写一个 C++ 程序，来实现我们期望在部署平台上完成的功能。在这里我们基于上面导出的异或网络模型，实现一个最简单的功能，即给定两个浮点数，输出对其做异或操作，结果为 0 的概率以及为 1 的概率。
+在此之前，为了能够正常使用 MegEngine 底层 C++ 接口，需要先按照 :ref:`installation` 从源码编译安装 MegEngine，并执行 ``make install`` 保证 MegEngine 相关 C++ 文件被正确安装。
+实现上述异或计算的示例 C++ 代码如下（引自 `xor-deploy.cpp <https://github.com/MegEngine/MegEngine/blob/master/sdk/xor-deploy/xor-deploy.cpp>`_ ）：
+.. literalinclude:: src/xornet_deploy.cpp
+    :language: cpp
+简单解释一下代码的意思，我们首先通过 ``serialization::GraphLoader`` 将模型加载进来，接着通过 ``tensor_map`` 和上节指定的输入名称 ``data`` ，找到模型的输入指针，再将运行时提供的输入 ``x`` 和 ``y`` 赋值给输入指针，然后我们使用 ``network.graph->compile`` 将模型编译成一个函数接口，并调用执行，最后将得到的结果 ``predict`` 进行输出，该输出的两个值即为异或结果为 0 的概率以及为 1 的概率 。
+编译并执行
+------------------------------
+为了更完整地实现“训练推理一体化”，我们还需要支持同一个 C++ 程序能够交叉编译到不同平台上执行，而不需要修改代码。之所以能够实现不同平台一套代码，是由于底层依赖的算子库（内部称作 MegDNN）实现了对不同平台接口的封装，在编译时会自动根据指定的目标平台选择兼容的接口。
+.. note::
+    目前发布的版本我们开放了对 CPU（X86、X64）和 GPU（CUDA）平台的支持，后续会继续开放对 ARM 平台的支持。
+我们在这里以 CPU 平台为例，直接使用 gcc 或者 clang （用 ``$CXX`` 指代）进行编译即可：
+.. code-block:: bash
+    $CXX -o xor_deploy -I$MGE_INSTALL_PATH/include xor_deploy.cpp -L$MGE_INSTALL_PATH/lib64/ -lmegengine
+上面的 ``$MGE_INSTALL_PATH`` 指代了编译安装时通过 ``CMAKE_INSTALL_PREFIX`` 指定的安装路径。编译完成之后，通过以下命令执行即可：
+.. code-block:: bash
+    LD_LIBRARY_PATH=$MGE_INSTALL_PATH:$LD_LIBRARY_PATH ./xor_deploy xornet_deploy.mge 0.6 0.9
+这里将 ``$MGE_INSTALL_PATH`` 加进 ``LD_LIBRARY_PATH`` 环境变量，确保 MegEngine 库可以被编译器找到。上面命令对应的输出如下：
+.. code-block:: none
+    Predicted: 0.999988 1.2095e-05
+至此我们便完成了从 Python 模型到 C++ 可执行文件的部署流程。
--- a/source/advanced/distributed.rst
+++ b/source/advanced/distributed.rst
+.. _distributed:
+分布式训练
+==============================
+本章我们将介绍如何在 MegEngine 中高效地利用多GPU进行分布式训练。分布式训练是指同时利用一台或者多台机器上的 GPU 进行并行计算。在深度学习领域，最常见的并行计算方式是在数据层面进行的，即每个 GPU 各自负责一部分数据，并需要跑通整个训练和推理流程。这种方式叫做 **数据并行** 。
+目前 MegEngine 开放的接口支持单机多卡和多机多卡的数据并行方式。
+单机多卡
+------------------------------
+单机多卡是最为常用的方式，比如单机四卡、单机八卡，足以支持我们完成大部分模型的训练。我们本节按照以下顺序进行介绍：
+#. 多进程间的通信机制
+#. 如何初始化分布式训练
+#. 数据处理流程
+#. 进程间训练状态如何同步
+#. 如何在多进程环境中将模型保存与加载
+通信机制简介
+''''''''''''''''''''''''''''''
+在 MegEngine 中，对多 GPU 的管理基于 Python 自带的多进程库 :py:mod:`~.multiprocess` 。假设一台机器上有 8 张显卡，那么我们需要通过 :py:class:`.multiprocess.Process` 创建 8 个进程，与显卡一一对应。而为了能让这 8 个各自独立的进程能一同进行模型训练，我们需要管理它们之间的通信。
+首先我们会给每个进程分配一个进程序号（rank），从 0 到 7，作为每个进程的身份标识。通过 :py:class:`.multiprocess.Process` 的 ``target`` 参数指明所有进程需要执行的目标函数，同时在函数参数中指明每个进程自己的序号，从而使得所有进程执行同一段代码却能分工合作，完成不重复的任务，如下代码所示：
+.. code-block::
+    import multiprocess as mp
+    for rank in range(num_devices):
+        p = mp.Process(
+            target=run,
+            args=(
+                num_devices, rank, # ... 省略更多参数
+            )
+        )
+除了让每个进程能分辨各自的身份，我们还需要指定一个通信的接口，在 MegEngine 中我们采用的是 IP 地址和端口号的方式。在多机多卡中，由于存在多台机器，我们需要事先指定一台机器为主节点（master node），将其 IP 地址和用于通信的端口号提供给所有机器，让所有机器都可以访问该主节点，从而进行通信；而在单机多卡中，我们只需设置主节点为本机地址 ``localhost`` 即可。
+有了身份识别机制和通信方式，整个通信机制就基本完整了。
+初始化分布式训练
+''''''''''''''''''''''''''''''
+在 MegEngine 中，我们通过 :func:`~.init_process_group` 来初始化分布式训练。其接收以下参数
+* ``master_ip`` (str) – 主节点的 IP 地址；
+* ``master_port`` (int) – 所有进程通信使用的端口；
+* ``world_size`` (int) – 总共有多少进程参与该计算；
+* ``rank`` (int) – 当前进程的序号；
+* ``dev`` (int) - 当前进程绑定的 GPU 设备在本机器上的 ID。
+我们只需在每个进程执行的目标函数中，调用该接口，并传入与每个进程匹配的参数，即可开启多进程间的通信。如下代码所示：
+.. code-block::
+    import megengine.distributed as dist
+    def run(num_devices, rank, server, port):
+        # 由于仅一台机器，所以设备数与进程数一一对应，进程的序号等于设备ID
+        dist.init_process_group(
+            master_ip=server,
+            master_port=port,
+            world_size=num_devices,
+            rank=rank,
+            dev=rank
+        )
+数据处理流程
+''''''''''''''''''''''''''''''
+在初始化分布式训练环境之后，我们便可以按照正常的流程进行训练了，但是由于需要每个进程处理不同的数据，我们还需要在数据部分做一些额外的操作。
+在这里我们以载入 MNIST 数据为例，展示如何对数据做切分，使得每个进程拿到不重叠的数据。此处我们将整个数据集载入内存后再进行切分。这种方式比较低效，仅作为原理示意，更加高效的方式见 :ref:`dist_dataloader` 。
+.. code-block::
+        mnist_datasets = load_mnist_datasets() # 下载并读取 MNIST 数据集，见“数据加载”文档
+        data_train, label_train = mnist_datasets['train'] # 得到训练集的数据和标签
+        size = ceil(len(data_train) / num_devices) # 将所有数据划分为 num_devices 份
+        l = size * rank # 得到本进程负责的数据段的起始索引
+        r = min(size * (rank + 1), len(data_train)) # 得到本进程负责的数据段的终点索引
+        data_train = data_train[l:r, :, :, :] # 得到本进程的数据
+        label_train = label_train[l:r] # 得到本进程的标签
+至此我们便得到了每个进程各自负责的、互不重叠的数据部分。
+训练状态同步
+''''''''''''''''''''''''''''''
+在目标函数中每个进程的训练流程与单机单卡的训练并没有差异。之所以可以这样，是因为 MegEngine 将多进程间参数状态的同步隐藏在了 :class:`~.Optimizer` 中。
+具体来说， :class:`~.Optimizer` 通过 :func:`~.util.is_distributed` 得知当前处于分布式训练状态，会在构造函数和 :meth:`~.Optimizer.step` 中自动完成多进程间参数的同步，即调用 :func:`~.distributed.functional.bcast_param` 。
+所以每个进程在执行训练代码阶段，定义 :class:`~.Optimizer` 以及每个迭代中调用 :meth:`~.Optimizer.step` 修改参数值时，都会自动广播自己进程当时的参数值，实现所有进程在开始训练时以及每轮迭代之后的训练状态是统一的。
+模型保存与加载
+''''''''''''''''''''''''''''''
+在 MegEngine 中，依赖于上面提到的状态同步机制，我们保持了各个进程状态的一致，使得可以很容易地实现模型的保存和加载。
+具体来说，由于我们在定义优化器时会进行参数同步，所以我们只需在定义优化器之前，在主进程（rank 0 进程）中加载模型参数，那么其它进程便会被自动更新为加载后的参数。
+同理，保存参数只需要在每个迭代执行完 :meth:`~.Optimizer.step` 之后进行，也能保证此时保存的状态是所有进程相同的。
+可以参考以下示例代码实现：
+.. code-block::
+        # 加载模型参数
+        if rank == 0:
+            net.load_state_dict(checkpoint['net'])
+        opt = SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
+        # ... 省略部分代码
+        # 保存模型参数
+        opt.step()
+        if rank == 0:
+            checkpoint = {
+                'net': net.state_dict(),
+                'acc': best_acc,
+            }
+            mge.save(checkpoint, path)
+.. _dist_dataloader:
+使用 DataLoader 进行数据加载
+-----------------------------------------
+在上一节，为了简单起见，我们将整个数据集全部载入内存，实际中，我们可以通过 :class:`~.dataloader.DataLoader` 来更高效地加载数据。关于 :class:`~.dataloader.DataLoader` 的基本用法可以参考基础学习的 :ref:`data_load` 部分。
+:class:`~.dataloader.DataLoader` 会自动帮我们处理分布式训练时数据相关的问题，可以实现使用单卡训练时一样的数据加载代码，具体来说：
+* 所有采样器 :class:`~.sampler.Sampler` 都会自动地做类似上文中数据切分的操作，使得所有进程都能获取互不重复的数据。
+* 每个进程的 :class:`~.dataloader.DataLoader` 还会自动调用分布式相关接口实现内存共享，避免不必要的内存占用，从而显著加速数据读取。
+总结一下，在分布式训练时，你无需对使用 :class:`~.dataloader.DataLoader` 的方式进行任何修改，一切都能无缝地切换。完整的例子见 `MegEngine/models <https://github.com/MegEngine/models/blob/master/official/vision/classification/resnet/train.py>`_ 。
+多机多卡
+------------------------------
+在 MegEngine 中，我们能很方便地将上面单机多卡的代码修改为多机多卡，只需修改传给 :func:`~.init_process_group` 的总共进程数目 ``world_size`` 和当前进程序号 ``rank`` 参数。即只需在计算每台机器中每个进程的序号时，考虑到机器节点 ID （ ``node_id`` ）即可。另外选择其中一台机器作为主节点（master node），将其 IP 地址和通信端口提供给所有机器即可。
+首先需要修改目标函数传入的参数：
+* 新增 ``num_nodes`` ：表示总共有多少机器；
+* 新增 ``node_id`` ：表示当前机器的 ID；
+* ``num_devices`` -> ``devs_per_node`` ：表示每个机器上拥有的 GPU 数量；
+* ``rank`` -> ``local_rank`` ：表示当前进程在当前机器上的序号；
+* ``server`` -> ``master_ip`` ：从原先的本机地址（localhost）变为主节点的内网 IP 地址；
+* ``port`` -> ``master_port`` ：表示主节点用于通信的端口；
+然后需要计算得到全局的进程序号（global_rank），代码如下所示：
+.. code-block::
+    import megengine.distributed as dist
+    def run(num_nodes, node_id, devs_per_node, local_rank, master_ip, master_port):
+        world_size = num_nodes * devs_per_node
+        global_rank = devs_per_node * node_id + local_rank
+        dist.init_process_group(server, port, world_size, global_rank, local_rank)
+其它部分与单机版本完全相同。最终只需在每个机器上执行相同的 Python 程序，即可实现多机多卡的分布式训练。
\ No newline at end of file
--- a/source/advanced/index.rst
+++ b/source/advanced/index.rst
+.. _advanced:
+引言
+==============================
+在这部分，您将了解 MegEngine 的一些高级用法。
+为了学习这部分内容，您需要掌握 :ref:`基础学习 <basic>` 内容。
+这部分共包含四个小节，彼此相对独立，您可以根据个人兴趣和需求进行选择性阅读。
+1. :ref:`distributed` ：介绍如何进行分布式训练模型。
+2. :ref:`parameter_more_setting` ：介绍更加细粒度的参数优化设置方法。
+3. :ref:`sublinear` ：介绍 MegEngine 的亚线性内存优化技术。
+4. :ref:`two_static_mode` ：介绍 MegEngine 中静态图的两种模式。
+5. :ref:`deployment` ：介绍如何将 MegEngine 模型在 C++ 环境下运行。
+.. toctree::
+    :maxdepth: 2
+    :hidden:
+    distributed
+    parameter_more_setting
+    sublinear
+    two_static_mode
+    deployment
--- a/source/advanced/load_pytorch.rst
+++ b/source/advanced/load_pytorch.rst
+.. _load_pytorch:
+在 MegEngine 中嵌入 PyTorch 子图（Experimental）
+===================================================
+MegEngine 支持在网络搭建过程中嵌入 PyTorch 模块。
+该功能可以方便用户轻松地将已有的 PyTorch 模块移植到 MegEngine 框架中使用。
+安装本章节所需的 Python 库
+.. code-block:: bash
+    pip install torch torchvision ninja --user
+对于一个已有的 PyTorch 模块，我们可以利用 MegEngine 中提供的 :class:`~.pytorch.PyTorchModule` 将它包裹（wrap）成与 MegEngine :class:`~.Module` 兼容的模块。
+为了方便演示，假设有一个现成的基于 PyTorch 实现的特征提取模块 ``LeNetExtractor`` （不包含 LeNet 网络结构中的分类层）。在 MegEngine 框架中，我们将这个 PyTorch 模块包裹，只需额外实现一层线性分类器，即可完成 LeNet 网络的搭建。
+代码如下：
+.. code-block::
+    import megengine.module as M 
+    from megengine.module.pytorch import PyTorchModule
+    from megengine.core.graph import get_default_device
+    class LeNet(M.Module):
+        def __init__(self, lenet_extractor):
+            super(LeNet, self).__init__()
+            # 将其包裹
+            self.lenet_extractor_wrap = PyTorchModule(lenet_extractor, get_default_device())
+            # 用 MegEngine 搭一个线性分类器
+            self.mge_classifier = M.Linear(84, 10)
+        def forward(self, x):
+            x = self.lenet_extractor_wrap(x)
+            x = self.mge_classifier(x)
+            return x
+    # 假设我们已经有了 lenet_extractor
+    lenet = LeNet(lenet_extractor)
+    # 网络训练和测试代码省略
+    # ...
+基于 PyTorch 的 LeNetExtractor 代码如下：
+.. code-block::
+    import torch
+    import torch.nn as nn
+    # 创建一个 PyTorch 版的 LeNet 特征提取模块
+    class LeNet_Extract(nn.Module):
+        def __init__(self):
+            super(LeNet_Torch_Extract, self).__init__()
+            # 单信道图片, 两层  5x5 卷积 + ReLU + 池化
+            self.conv1 = nn.Conv2d(1, 6, 5)
+            self.relu1 = nn.ReLU()
+            self.pool1 = nn.MaxPool2d(2, 2)
+            self.conv2 = nn.Conv2d(6, 16, 5)
+            self.relu2 = nn.ReLU()
+            self.pool2 = nn.MaxPool2d(2, 2)
+            # 两层全连接 + ReLU
+            self.fc1 = nn.Linear(16 * 5 * 5, 120)
+            self.relu3 = nn.ReLU()
+            self.fc2 = nn.Linear(120, 84)
+            self.relu4 = nn.ReLU()
+        def forward(self, x):
+            x = self.pool1(self.relu1(self.conv1(x)))
+            x = self.pool2(self.relu2(self.conv2(x)))
+            # 拉平 [C, H, W] 三个维度
+            x = x.view(-1, 16*5*5)
+            x = self.relu3(self.fc1(x))
+            x = self.relu4(self.fc2(x))
+            return x
--- a/source/advanced/parameter_more_setting.rst
+++ b/source/advanced/parameter_more_setting.rst
+.. _parameter_more_setting:
+更细粒度的参数优化设置
+==============================
+在 :ref:`train_and_evaluation` 当中网络使用如下优化器进行训练：
+.. testcode::
+    import megengine.optimizer as optim
+    optimizer = optim.SGD(
+        le_net.parameters(), # 参数列表，将指定参数与优化器绑定
+        lr=0.05,  # 学习速率
+    )
+这个优化器对所有参数都使用同一学习速率进行优化，我们将在本章中介绍如何做到对不同的参数采用不同的学习速率。
+本章我们沿用 :ref:`network_build` 中创建的 ``LeNet`` ，下述的优化器相关代码可以用于取代 :ref:`train_and_evaluation` 中对应的代码。
+不同参数使用不同的学习速率
+------------------------------
+:class:`~.Optimizer` 支持将网络的参数进行分组，不同的参数组可以采用不同的学习速率进行训练。 一个参数组由一个字典表示，这个字典中必然有键值对： ``'params': param_list`` ，用来指定参数组包含的参数。该字典还可以包含 ``'lr':learning_rate`` 来指定此参数组的学习速率。此键值对有时可省略，省略后参数组的学习速率由优化器指定。所有待优化参数组的字典会组成一个列表作为 :class:`~.Optimizer` 实例化时的第一个参数传入。
+为了更好的说明参数组，我们首先使用 :class:`~.Module` 提供的 :meth:`~.Module.named_parameters` 函数来对网络参数进行分组。这个函数返回一个包含网络所有参数并且以参数名字为键、参数变量为值的字典：
+.. testcode::
+    for (name, param) in le_net.named_parameters():
+        print(name, param.shape) # 打印参数的名字和对应张量的形状
+.. testoutput::
+    classifer.bias (10,)
+    classifer.weight (10, 84)
+    conv1.bias (1, 6, 1, 1)
+    conv1.weight (6, 1, 5, 5)
+    conv2.bias (1, 16, 1, 1)
+    conv2.weight (16, 6, 5, 5)
+    fc1.bias (120,)
+    fc1.weight (120, 400)
+    fc2.bias (84,)
+    fc2.weight (84, 120)
+根据参数的名字我们可以将 ``LeNet`` 中所有卷积的参数分为一组，所有全连接层的参数分为另一组：
+.. testcode::
+    conv_param_list = []
+    fc_param_list = []
+    for (name, param) in le_net.named_parameters():
+        # 所有卷积的参数为一组，所有全连接层的参数为另一组
+        if 'conv' in name:
+            conv_param_list.append(param)
+        else:
+            fc_param_list.append(param)
+分组后即可根据下述代码对不同参数组设置不同的学习速率：
+.. testcode::
+    import megengine.optimizer as optim
+    optimizer = optim.SGD(
+        # 参数组列表即param_groups，每个参数组都可以自定义学习速率，也可不自定义，统一使用优化器设置的学习速率
+        [
+            {'params': conv_param_list},  # 卷积参数所属的参数组，未自定义学习速率
+            {'params': fc_param_list, 'lr': 0.01} # 全连接层参数所属的参数组，自定义学习速率为0.01
+        ],
+        lr=0.05,  # 参数组例表中未指定学习速率的参数组服从此设置，如所有卷积参数
+    )
+优化器中设置的参数组列表对应于 :attr:`~.Optimizer.param_groups` 属性。我们可以通过其获取不同参数组的学习速率。
+.. testcode::
+    # 打印每个参数组所含参数的数量和对应的学习速率
+    print(len(optimizer.param_groups[0]['params']), optimizer.param_groups[0]['lr'])
+    print(len(optimizer.param_groups[1]['params']), optimizer.param_groups[1]['lr'])
+.. testoutput::
+    4 0.05
+    6 0.01
+训练中对学习速率的更改
+''''''''''''''''''''''''''''''
+MegEngine 也支持在训练过程中对学习速率进行修改，比如部分参数训练到一定程度后就不再需要优化，此时将对应参数组的学习速率设为零即可。我们修改 :ref:`train_and_evaluation` 中的训练代码进行示例说明。修改后的训练代码总共训练四个epoch，我们会在第二个epoch结束时将所有全连接层参数的学习速率置零，并在每个epoch当中输出 ``LeNet`` 中全连接层的部分参数值以显示是否被更新。
+.. testcode::
+    import megengine as mge
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    # 输出参数的初始值
+    print("original parameter: {}".format(optimizer.param_groups[1]['params'][0]))
+    for epoch in range(4):
+        for step, (batch_data, batch_label) in enumerate(dataloader):
+            data.set_value(batch_data)
+            label.set_value(batch_label)
+            optimizer.zero_grad() # 将参数的梯度置零
+            logits = le_net(data)
+            loss = F.cross_entropy_with_softmax(logits, label)
+            optimizer.backward(loss) # 反传计算梯度
+            optimizer.step()  # 根据梯度更新参数值
+        # 输出 LeNet 中全连接层的部分参数值
+        print("epoch: {}, parameter: {}".format(epoch, optimizer.param_groups[1]['params'][0]))
+        if epoch == 1:
+            # 将所有全连接层参数的学习速率改为0.0
+            optimizer.param_groups[1]['lr'] = 0.0
+            print("\nset lr zero\n")
+.. testoutput::
+    original parameter: Tensor([0. 0. 0. 0. 0. 0. 0. 0. 0. 0.])
+    epoch: 0, parameter: Tensor([-0.0037  0.0245 -0.0075 -0.0002 -0.0063  0.007   0.0036  0.0009 -0.0128 -0.0053])
+    epoch: 1, parameter: Tensor([-0.0028  0.0246 -0.0083 -0.0007 -0.0068  0.007   0.0033  0.0001 -0.0116 -0.0047])
+    set lr zero
+    epoch: 2, parameter: Tensor([-0.0028  0.0246 -0.0083 -0.0007 -0.0068  0.007   0.0033  0.0001 -0.0116 -0.0047])
+    epoch: 3, parameter: Tensor([-0.0028  0.0246 -0.0083 -0.0007 -0.0068  0.007   0.0033  0.0001 -0.0116 -0.0047])
+从输出可以看到在学习速率设为0之前参数值是在不断更新的，但是在设为0之后参数值就不再变化。
+同时多数网络在训练当中会不断减小学习速率，如下代码展示了 MegEnging 是如何在训练过程中线性减小学习速率的：
+.. testcode::
+    total_epochs = 10
+    learning_rate = 0.05 # 初始学习速率
+    for epoch in range(total_epochs):
+        # 设置当前epoch的学习速率
+        for param_group in optimizer.param_groups: # param_groups中包含所有需要此优化器更新的参数
+            # 学习速率线性递减，每个epoch调整一次
+            param_group["lr"] = learning_rate * (1-float(epoch)/total_epochs)
+固定部分参数不优化
+------------------------------
+除了将不训练的参数分为一组并将学习速率设为零外，MegEngine 还提供了其他途径来固定参数不进行优化：仅将需要优化的参数与优化器绑定即可。如下代码所示，我们仅对 ``LeNet`` 中的卷积参数进行优化：
+.. testcode::
+    import megengine.optimizer as optim
+    le_net = LeNet()
+    param_list = []
+    for (name, param) in le_net.named_parameters():
+        if 'conv' in name: # 仅训练LeNet中的卷积参数
+            param_list.append(param)
+    optimizer = optim.SGD(
+        param_list, # 参数
+        lr=0.05,  # 学习速率
+    )
+下述代码将上面的设置加入到了具体训练当中，能够更加直观的看到各个参数的梯度差异：
+.. testcode::
+    learning_rate = 0.05
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    total_epochs = 1 # 为例减少输出，本次训练仅训练一个epoch
+    for epoch in range(total_epochs):
+        # 设置当前epoch的学习速率
+        for param_group in optimizer.param_groups:
+            param_group["lr"] = learning_rate * (1-float(epoch)/total_epochs)
+        total_loss = 0
+        for step, (batch_data, batch_label) in enumerate(dataloader):
+            data.set_value(batch_data)
+            label.set_value(batch_label)
+            optimizer.zero_grad() # 将参数的梯度置零
+            logits = le_net(data)
+            loss = F.cross_entropy_with_softmax(logits, label)
+            optimizer.backward(loss) # 反传计算梯度
+            optimizer.step()  # 根据梯度更新参数值
+            total_loss += loss.numpy().item()
+        # 输出每个参数的梯度
+        for (name, param) in le_net.named_parameters():
+            if param.grad is None:
+                print(name, param.grad)
+            else:
+                print(name, param.grad.sum())
+.. testoutput::
+    classifer.bias None
+    classifer.weight None
+    conv1.bias Tensor([0.1187])
+    conv1.weight Tensor([-0.8661])
+    conv2.bias Tensor([-0.0737])
+    conv2.weight Tensor([-27.0589])
+    fc1.bias None
+    fc1.weight None
+    fc2.bias None
+    fc2.weight None
+从输出可以看到除了卷积参数有梯度外其余参数均没有梯度也就不会更新。
--- a/source/advanced/src/xornet_deploy.cpp
+++ b/source/advanced/src/xornet_deploy.cpp
+#include <stdlib.h>
+#include <iostream>
+#include "megbrain/serialization/serializer.h"
+using namespace mgb;
+cg::ComputingGraph::OutputSpecItem make_callback_copy(SymbolVar dev,
+                                                      HostTensorND& host) {
+    auto cb = [&host](DeviceTensorND& d) { host.copy_from(d); };
+    return {dev, cb};
+}
+int main(int argc, char* argv[]) {
+    // 运行编译后的该程序，需要提供模型文件名、用于进行异或操作的两个值（x 和 y）
+    std::cout << " Usage: ./xornet_deploy model_name x_value y_value"
+              << std::endl;
+    if (argc != 4) {
+        std::cout << " Wrong argument" << std::endl;
+        return 0;
+    }
+    // 读取通过运行参数指定的模型文件
+    std::unique_ptr<serialization::InputFile> inp_file =
+            serialization::InputFile::make_fs(argv[1]);
+    // 加载通过运行参数指定的计算输入
+    float x = atof(argv[2]);
+    float y = atof(argv[3]);
+    // 使用 GraphLoader 将模型文件转成 LoadResult，包括了计算图和输入等信息
+    auto loader = serialization::GraphLoader::make(std::move(inp_file));
+    serialization::GraphLoadConfig config;
+    serialization::GraphLoader::LoadResult network =
+            loader->load(config, false);
+    // 通过 dump 时指定的名称拿到输入 Tensor
+    auto data = network.tensor_map["data"];
+    // 给输入 Tensor 赋值
+    float* data_ptr = data->resize({1, 2}).ptr<float>();
+    data_ptr[0] = x;
+    data_ptr[1] = y;
+    // 将网络编译为异步执行函数
+    // 输出output_var为一个字典的列表，second拿到键值对中的值，并存在 predict 中
+    HostTensorND predict;
+    std::unique_ptr<cg::AsyncExecutable> func =
+            network.graph->compile({make_callback_copy(
+                    network.output_var_map.begin()->second, predict)});
+    func->execute();
+    func->wait();
+    // 输出值为对输入计算异或值 0 和 1 两个类别的概率
+    float* predict_ptr = predict.ptr<float>();
+    std::cout << " Predicted: " << predict_ptr[0] << " " << predict_ptr[1]
+              << std::endl;
+}
--- a/source/advanced/sublinear.rst
+++ b/source/advanced/sublinear.rst
+.. _sublinear:
+亚线性内存优化
+==============================
+使用大 batch size 通常能够提升深度学习模型性能。然而，我们经常遇到的困境是有限的 GPU 内存资源无法满足大 batch size 模型训练。为了缓解这一问题， MegEngine 提供了亚线性内存 ( sublinear memory ) 优化技术用于降低网络训练的内存占用量。该技术基于 `gradient checkpointing <https://arxiv.org/abs/1604.06174>`_ 算法，通过事先搜索最优的计算图节点作为前向传播和反向传播检查点（ checkpoints ），省去其它中间结果存储，大幅节约了内（显）存使用。
+用户通过如下的环境变量设置开启亚线性内存优化：
+.. testcode::
+    import os
+    # MGB_COMP_GRAPH_OPT 用于设置计算图的一些选项。
+    # 用户通过设置 enable_sublinear_memory_opt=1 打开亚线性内存优化选项
+    os.environ["MGB_COMP_GRAPH_OPT"] = "enable_sublinear_memory_opt=1"
+    # 用户需要指定搜索检查点算法的迭代次数
+    num_iterations = "50"
+    os.environ["MGB_SUBLINEAR_MEMORY_GENETIC_NR_ITER"] = num_iterations
+亚线性内存技术仅适用于 MegEngine 静态图模式。这种内存优化方式在编译计算图和训练模型时会有少量的额外时间开销。下面我们以 `ResNet50 <https://arxiv.org/abs/1512.03385>`_ 为例，说明使用亚线性内存优化能够大幅节约网络训练显存使用。
+.. testcode::
+    import os
+    import megengine as mge
+    from megengine.jit import trace
+    import megengine.hub as hub
+    import megengine.optimizer as optim
+    import megengine.functional as F
+    import numpy as np
+    def train_resnet_demo(batch_size, enable_sublinear):
+        os.environ["MGB_COMP_GRAPH_OPT"] = "enable_sublinear_memory_opt={}".format(enable_sublinear)
+        os.environ["MGB_SUBLINEAR_MEMORY_GENETIC_NR_ITER"] = '50'
+        # 我们从 megengine hub 中加载一个 resnet50 模型。
+        resnet = hub.load("megengine/models", "resnet50")
+        optimizer = optim.SGD(
+            resnet.parameters(),
+            lr=0.1,
+        )
+        data = mge.tensor()
+        label = mge.tensor(dtype="int32")
+        # symbolic参数说明请参见 静态图的两种模式
+        @trace(symbolic=True)
+        def train_func(data, label, *, net, optimizer):
+            pred = net(data)
+            loss = F.cross_entropy_with_softmax(pred, label)
+            optimizer.backward(loss)
+        resnet.train()  # 将网络设置为训练模式
+        for i in range(10):
+            # 使用假数据
+            batch_data = np.random.randn(batch_size, 3, 224, 224).astype(np.float32)
+            batch_label = np.random.randint(1000, size=(batch_size,)).astype(np.float32)
+            data.set_value(batch_data)
+            label.set_value(batch_label)
+            optimizer.zero_grad()
+            train_func(data, label, net=resnet, optimizer=optimizer)
+            optimizer.step()
+    # 设置使用单卡 GPU ，显存容量为 11 GB
+    mge.set_default_device('gpux')
+    # 不使用亚线性内存优化，允许的batch_size最大为 100 左右
+    train_resnet_demo(100, enable_sublinear=0)
+    # 使用亚线性内存优化，允许的batch_size最大为 200 左右
+    train_resnet_demo(200, enable_sublinear=1)
--- a/source/advanced/two_static_mode.rst
+++ b/source/advanced/two_static_mode.rst
+.. _two_static_mode:
+静态图的两种模式
+=======================================
+在前面的 :ref:`dynamic_and_static_graph` 中，我们介绍了静态图的优点，以及如何使用 :class:`~.trace` 功能实现动静态图的转换。本节中，我们进一步介绍静态图的两种模式。
+使用 :class:`~.trace` 装饰一个训练（或者测试）函数时，可以指定 ``symbolic`` 参数，示例代码如下:
+.. code-block::
+    @trace(symbolic=True) # 设置为静态图模式
+    def train_func(data, label, *, opt, net):
+        pass
+``symbolic`` 的取值为True或者False，其含义如下:
+1. True 表示“静态构造”或者“根据符号构造”。此时，计算图中的所有数据节点（即张量）被视为符号（即 ``symbolic``）。它们仅仅作为占位符（placeholder），不产生实际的内存分配，也没有实际的值。此时计算图的编译过程完全取决于计算图的结构，而不取决于张量的具体值，是真正的“静态”。
+2. False 表示“动态构造”或者“根据值构造”。此时，被 :class:`~.trace` 装饰的函数在第一次被调用时，会根据输入的数据执行一次计算，这次计算会构建出一个动态图。然后，这个动态图会被编译为一个静态图。此后，该函数的所有调用都会运行这个静态图，而不再依赖调用时输入的值。此种模式可以视为“动态构建第一次，此后静态运行”。 **MegEngine 默认使用此模式。** 这也是PyTorch中的 trace 功能所采用的模式。
+下面我们通过示例代码说明两种模式下构图过程的区别。
+.. code-block::
+    from megengine.jit import trace
+    # @trace(symbolic=False) # “动态构造”
+    @trace(symbolic=True) # “静态构造”
+    def train_func(data, label, *, opt, net): 
+        logits = net(data)
+        print(logits[0]) # 因网络输出太多，此处仅打印部分
+        loss = F.cross_entropy_with_softmax(logits, label)
+        opt.backward(loss)
+        return logits, loss
+.. testoutput::
+    Tensor(None)
+如上所示，当 ``symbolic=True`` 时，网络的输出 Tensor 并未被赋值。如果我们将 ``symbolic`` 改为 False，重新执行上面的代码将得到：
+.. testoutput::
+    Tensor([-0.2423  0.0192  0.3368  0.5445 -0.1023  0.3589 -0.5626 -0.472  -0.4287 0.2468])
+可以看到，此时网络的输出 Tensor 是有结果值的。也就说，计算图确实被构造和执行了。
+在绝大部分情况下，两种模式下构造出的静态图并没有区别，使用中也没有分别。然而，它们有一些细微的区别需要注意。
+``symbolic=False`` 的模式下，由于第一次运行和构建计算图的过程依赖于输入，这提供了一定的“动态灵活性”。根据第一次运行时信息的不同，可以构建出不同的静态图。这种灵活性是 ``symbolic=True`` 的模式无法提供的。例如，可以在网络搭建中写诸如“如果条件满足，则执行分支1，否则执行分支2”的语句。注意，如果这样的条件语句在循环中，那么在循环的第一次执行中构造出的静态图将固定不再改变，即使在循环的后续执行中，该条件语句的结果发生了变化。这是容易造成问题和误解的地方。
+``symbolic=False`` 的模式的一个缺点是，由于第一次的运行在动态图模式下，无法利用静态图的内存优化，通常会耗费更大的内存。这可能导致本来在静态图模式下可以运行的网络，在第一次运行时由于内存不够而失败。
+与之相对，``symbolic=True`` 的模式具有静态图完全的优点和缺点：始终高效，但缺乏灵活性。如果网络中包含了需要运行时动态信息才能计算的条件语句，该模式将会失败。
+具体应用中，用户需要根据情况灵活选择使用哪种模式。
--- a/source/api.rst
+++ b/source/api.rst
+.. _api:
+API Reference
+=============
+.. _example_reference:
+.. toctree::
+    :maxdepth: 2
+    autogen/megengine.core
+    autogen/megengine.functional
+    autogen/megengine.module
+    autogen/megengine.module.pytorch
+    autogen/megengine.optimizer
+    autogen/megengine.data
+    autogen/megengine.jit
+    autogen/megengine.distributed
+    autogen/megengine.random
+    autogen/megengine.utils
+    autogen/megengine.hub
--- a/source/basic/basic_concepts.rst
+++ b/source/basic/basic_concepts.rst
+.. _basic_concepts:
+基本概念
+==============================
+MegEngine 是基于计算图的深度神经网络学习框架。
+本节内容会简要介绍计算图及其相关基本概念，以及它们在 MegEngine 中的实现。
+计算图（Computation Graph）
+------------------------------
+我们通过一个简单的数学表达式 :math:`y = (w * x) + b` 来介绍计算图的基本概念，如下图所示：
+.. figure::
+    ./fig/computer_graph.png
+    :scale: 60%
+    图1
+从中我们可以看到，计算图中存在：
+* 数据节点（图中的实心圈）：如输入数据 :math:`x` 、 :math:`w` 、 :math:`b` ，运算得到的中间数据 :math:`p` ，以及最终的运算输出 :math:`y` ；
+* 计算节点（图中的空心圈）：图中 * 和 + 分别表示计算节点 **乘法** 和 **加法**，是施加在数据节点上的运算；
+* 边（图中的箭头）：表示数据的流向，体现了数据节点和计算节点之间的依赖关系；
+如上，便是一个简单的计算图示例。计算图是一个包含数据节点和计算节点的有向图（可以是有环的，也可以是无环的），
+是数学表达式的形象化表示。在深度学习领域，任何复杂的深度神经网络本质上都可以用一个计算图表示出来。
+**前向传播** 是计算由计算图表示的数学表达式的值的过程。在图1中，变量 :math:`x` 和 :math:`w` ，从左侧输入，首先经过乘法运算得到中间结果 :math:`p` ，
+接着，:math:`p` 和输入变量 :math:`b` 经过加法运算，得到右侧最终的输出 :math:`y` ，这就是一个完整的前向传播过程。
+在 MegEngine 中，我们用张量（Tensor）表示计算图中的数据节点，以及用算子（Operator）实现数据节点之间的运算。
+张量（Tensor）
+------------------------------
+与 PyTorch，TensorFlow 等深度学习框架类似，MegEngine 使用张量（Tensor）来表示计算图中的数据。
+张量（Tensor）可以看做 NumPy 中的数组，它可以是标量、向量、矩阵或者多维数组。
+我们可以通过 NumPy 或者 Python List 来创建一个 Tensor 。
+.. testcode::
+    import numpy as np
+    import megengine as mge
+    # 初始化一个维度为 (2, 5) 的 ndarray，并转化成 MegEngine 的 Tensor
+    # 注：目前 MegEngine Tensor 不支持 float64 数值类型，所以这里我们显式指定了 ndarray 的数值类型
+    a = mge.tensor(np.random.random((2,5)).astype('float32'))
+    print(a)
+    # 初始化一个长度为3的列表，并转化成 Tensor
+    b = mge.tensor([1., 2., 3.])
+    print(b)
+输出:
+.. testoutput::
+    Tensor([[0.2976 0.4078 0.5957 0.3945 0.9413]
+    [0.7519 0.3313 0.0913 0.3345 0.3256]])
+    Tensor([1. 2. 3.])
+我们可以通过 :meth:`~.Tensor.set_value` 来更改 Tensor 的值。
+.. testcode::
+    c = mge.tensor()
+    # 此时 Tensor 尚未被初始化，值为 None
+    print(c)
+    c.set_value(np.random.random((2,5)).astype("float32"))
+    # 此时我们将 Tensor c 进行了赋值
+    print(c)
+输出：
+.. testoutput::
+    Tensor(None)
+    Tensor([[0.68   0.9126 0.7312 0.3037 0.8082]
+     [0.1965 0.0413 0.395  0.6975 0.9103]])
+通过 :meth:`dtype <.Tensor.dtype>` 属性我们可以获取 Tensor 的数据类型；
+通过 :meth:`~.Tensor.astype` 方法我们可以拷贝创建一个指定数据类型的新 Tensor ，原 Tensor 不变。
+.. testcode::
+    print(c.dtype)
+    d = c.astype("float16")
+    print(d.dtype)
+输出：
+.. testoutput::
+    <class 'numpy.float32'>
+    <class 'numpy.float16'>
+通过 :meth:`shape <.Tensor.shape>` 属性，我们可以获取 Tensor 的形状：
+.. testcode::
+    print(c.shape)
+输出为一个Tuple：
+.. testoutput::
+    (2, 5)
+通过 :meth:`~.Tensor.numpy` 方法，我们可以将 Tensor 转换为 numpy.ndarray：
+.. testcode::
+    a = mge.tensor(np.random.random((2,5)).astype('float32'))
+    print(a)
+    b = a.numpy()
+    print(b)
+输出：
+.. testoutput::
+    Tensor([[0.2477 0.9139 0.8685 0.5265 0.341 ]
+     [0.6463 0.0599 0.555  0.1881 0.4283]])
+    [[0.2477342  0.9139376  0.8685143  0.526512   0.34099308]
+     [0.64625365 0.05993681 0.5549845  0.18809062 0.42833906]]
+算子（Operator）
+-----------------------------------------
+MegEngine 中通过算子 (Operator） 来表示运算。
+类似于 NumPy，MegEngine 中的算子支持基于 Tensor 的常见数学运算和操作。
+下面介绍几个简单示例：
+Tensor 的加法：
+.. testcode::
+    a = mge.tensor(np.random.random((2,5)).astype('float32'))
+    print(a)
+    b = mge.tensor(np.random.random((2,5)).astype('float32'))
+    print(b)
+    print(a + b)
+输出：
+.. testoutput::
+    Tensor([[0.119  0.5816 0.5693 0.3495 0.4687]
+     [0.4559 0.524  0.3877 0.0287 0.9086]])
+    Tensor([[0.2488 0.5017 0.0975 0.2759 0.3443]
+     [0.8404 0.7221 0.5179 0.5839 0.1876]])
+    Tensor([[0.3678 1.0833 0.6667 0.6254 0.813 ]
+     [1.2963 1.2461 0.9056 0.6126 1.0962]])
+Tensor 的切片：
+.. testcode::
+    print(a[1, :])
+输出：
+.. testoutput::
+    Tensor([0.4559 0.524  0.3877 0.0287 0.9086])
+Tensor 形状的更改：
+.. testcode::
+    a.reshape(5, 2)
+输出：
+.. testoutput::
+    Tensor([[0.4228 0.2097]
+     [0.9081 0.5133]
+     [0.2152 0.7341]
+     [0.0468 0.5756]
+     [0.3852 0.2363]])
+:meth:`~.Tensor.reshape` 的参数允许存在单个维度的缺省值，用 -1 表示。此时，reshape 会自动推理该维度的值：
+.. testcode::
+    # 原始维度是 (2, 5)，当给出 -1的缺省维度值时，可以推理出另一维度为10
+    a = a.reshape(1, -1)
+    print(a.shape)
+输出：
+.. testoutput::
+    (1, 10)
+MegEngine 的 :mod:`~.megengine.functional` 提供了更多的算子，比如深度学习中常用的矩阵乘操作、卷积操作等。
+Tensor 的矩阵乘：
+.. testcode::
+    import megengine.functional as F
+    a = mge.tensor(np.random.random((2,3)).astype('float32'))
+    print(a)
+    b = mge.tensor(np.random.random((3,2)).astype('float32'))
+    print(b)
+    c = F.matrix_mul(a, b)
+    print(c)
+输出：
+.. testoutput::
+    Tensor([[0.8021 0.5511 0.7935]
+    [0.6992 0.9318 0.8736]])
+    Tensor([[0.6989 0.3184]
+     [0.5645 0.0286]
+     [0.2932 0.2545]])
+    Tensor([[1.1044 0.4731]
+     [1.2708 0.4716]])
+更多算子可以参见 :mod:`~.megengine.functional` 部分的文档。
+不同设备上的 Tensor
+----------------------------
+创建的Tensor可以位于不同device，这根据当前的环境决定。
+通过 :meth:`device <.Tensor.device>` 属性查询当前 Tensor 所在的设备。
+.. testcode::
+    print(a.device)
+输出：
+.. testoutput::
+    # 如果你是在一个GPU环境下
+    gpu0:0
+通过 :meth:`~.Tensor.to` 可以在另一个 device 上生成当前 Tensor 的拷贝，比如我们将刚刚在 GPU 上创建的 Tensor ``a`` 迁移到 CPU 上：
+.. testcode::
+    # 下面代码是否能正确执行取决于你当前所在的环境
+    b = a.to("cpu0")
+    print(b.device)
+输出：
+.. testoutput::
+    cpu0:0
+反向传播和自动求导
+-----------------------------
+**反向传播** 神经网络的优化通常通过随机梯度下降来进行。我们需要根据计算图的输出，通过链式求导法则，对所有的中间数据节点求梯度，这一过程被称之为 “反向传播”。
+例如，我们希望得到图1中输出 :math:`y` 关于输入 :math:`w` 的梯度，那么反向传播的过程如下图所示：
+.. figure::
+    ./fig/back_prop.png
+    :scale: 60%
+    图2
+首先 :math:`y = p + b` ，因此 :math:`\partial y / \partial p = 1` ；
+接着，反向追溯，:math:`p = w * x` ，因此，:math:`\partial p / \partial w = x` 。
+根据链式求导法则，:math:`\partial y / \partial w = (\partial y / \partial p) * (\partial p / \partial w)` ，
+因此最终 :math:`y` 关于输入 :math:`w` 的梯度为 :math:`x` 。
+**自动求导** MegEngine 为计算图中的张量提供了自动求导功能，以上图的例子说明：
+我们假设图中的 :math:`x` 是 shape 为 (1, 3) 的张量， :math:`w` 是 shape 为 (3, 1) 的张量，
+:math:`b` 是一个标量。
+利用MegEngine 计算 :math:`y = x * w + b` 的过程如下：
+.. testcode::
+    import megengine.functional as F
+    x = mge.tensor(np.random.normal(size=(1, 3)).astype('float32'))
+    w = mge.tensor(np.random.normal(size=(3, 1)).astype('float32'))
+    b = mge.tensor(np.random.normal(size=(1, )).astype('float32'))
+    p = F.matrix_mul(x, w)
+    y = p + b
+我们可以直接调用 :func:`~.graph.grad` 方法来计算输出 :math:`y` 关于 :math:`w` 的偏导数：:math:`\partial y  / \partial w` 。
+.. testcode::
+    import megengine.functional as F
+    # 在调用 F.grad() 进行梯度计算时，第一个参数（target）须为标量，y 是 (1, 1) 的向量，通过索引操作 y[0] 将其变成维度为 (1, ) 的标量
+    # use_virtual_grad 是一个涉及动静态图机制的参数，这里可以先不做了解
+    grad_w = F.grad(y[0], w, use_virtual_grad=False)
+    print(grad_w)
+输出：
+.. testoutput::
+    Tensor([[-1.5197]
+     [-1.1563]
+     [ 1.0447]])
+可以看到，求出的梯度本身也是 Tensor。
\ No newline at end of file
--- a/source/basic/data_load.rst
+++ b/source/basic/data_load.rst
+.. _data_load:
+数据加载与处理
+==========================================
+在网络训练与测试中，数据的加载和预处理往往会耗费大量的精力。
+MegEngine 提供了一系列接口来规范化这些处理工作。
+利用 ``Dataset`` 封装一个数据集
+-----------------------------------------
+数据集是一组数据的集合，例如 MNIST、Cifar10等图像数据集。
+:class:`~.meta_dataset.Dataset` 是 MegEngine 中表示数据集的抽象类。
+我们自定义的数据集类应该继承 :class:`~.meta_dataset.Dataset` 并重写下列方法：
+* :meth:`~.MapDataset.__init__` ：一般在其中实现读取数据源文件的功能。也可以添加任何其它的必要功能；
+* :meth:`~.MapDataset.__getitem__` ：通过索引操作来获取数据集中某一个样本，使得可以通过 for 循环来遍历整个数据集；
+* :meth:`~.MapDataset.__len__` ：返回数据集大小；
+下面是一个简单示例。
+我们根据下图所示的二分类数据，创建一个 :class:`~.meta_dataset.Dataset` 。
+每个数据是一个二维平面上的点，横坐标和纵坐标在 [-1, 1] 之间。共有两个类别标签（图1中的蓝色 * 和红色 +），标签为0的点处于一、三象限；标签为1的点处于二、四象限。
+.. figure::
+    ./fig/dataset.png
+    :scale: 60%
+    图1
+该数据集的创建过程如下：
+* 在 :meth:`~.MapDataset.__init__` 中利用 NumPy 随机生成 ndarray 作为数据；
+* 在 :meth:`~.MapDataset.__getitem__` 中返回 ndarray 中的一个样本；
+* 在 :meth:`~.MapDataset.__len__` 中返回整个数据集中样本的个数；
+.. code-block::
+    import numpy as np
+    from typing import Tuple
+    # 导入需要被继承的 Dataset 类
+    from megengine.data.dataset import Dataset
+    class XORDataset(Dataset):
+        def __init__(self, num_points):
+            """
+            生成如图1所示的二分类数据集，数据集长度为 num_points
+            """
+            super().__init__()
+            # 初始化一个维度为 (50000, 2) 的 NumPy 数组。
+            # 数组的每一行是一个横坐标和纵坐标都落在 [-1, 1] 区间的一个数据点 (x, y)
+            self.data = np.random.rand(num_points, 2).astype(np.float32) * 2 - 1
+            # 为上述 NumPy 数组构建标签。每一行的 (x, y) 如果符合 x*y < 0，则对应标签为1，反之，标签为0
+            self.label = np.zeros(num_points, dtype=np.int32)
+            for i in range(num_points):
+                self.label[i] = 1 if np.prod(self.data[i]) < 0 else 0
+        # 定义获取数据集中每个样本的方法
+        def __getitem__(self, index: int) -> Tuple:
+            return self.data[index], self.label[index]
+        # 定义返回数据集长度的方法
+        def __len__(self) -> int:
+            return len(self.data)
+    np.random.seed(2020)
+    # 构建一个包含 30000 个点的训练数据集
+    xor_train_dataset = XORDataset(30000)
+    print("The length of train dataset is: {}".format(len(xor_train_dataset)))
+    # 通过 for 遍历数据集中的每一个样本
+    for cor, tag in xor_train_dataset:
+        print("The first data point is: {}, {}".format(cor, tag))
+        break;
+    print("The second data point is: {}".format(xor_train_dataset[1]))
+输出：
+.. testoutput::
+    The length of train dataset is: 30000
+    The first data point is: [0.97255366 0.74678389], 0
+    The second data point is: (array([ 0.01949105, -0.45632857]), 1)
+MegEngine 中也提供了一些已经继承自 :class:`~.meta_dataset.Dataset` 的数据集类，方便我们使用，比如 :class:`~.meta_dataset.ArrayDataset` 。
+:class:`~.meta_dataset.ArrayDataset` 允许通过传入单个或多个 NumPy 数组，对它进行初始化。其内部实现如下：
+* :meth:`~.ArrayDataset.__init__` ：检查传入的多个 NumPy 数组的长度是否一致；不一致则无法成功创建；
+* :meth:`~.ArrayDataset.__getitem__` ：将多个 NumPy 数组相同索引位置的元素构成一个 tuple 并返回；
+* :meth:`~.ArrayDataset.__len__` ：返回数据集的大小；
+以图1所示的数据集为例，我们可以通过坐标数据和标签数据的数组直接构造 :class:`~.meta_dataset.ArrayDataset` ，无需用户自己定义数据集类。
+.. code-block::
+    from megengine.data.dataset import ArrayDataset
+    # 准备 NumPy 形式的 data 和 label 数据
+    np.random.seed(2020)
+    num_points = 30000
+    data = np.random.rand(num_points, 2).astype(np.float32) * 2 - 1
+    label = np.zeros(num_points, dtype=np.int32)
+    for i in range(num_points):
+        label[i] = 1 if np.prod(data[i]) < 0 else 0
+    # 利用 ArrayDataset 创建一个数据集类
+    xor_dataset = ArrayDataset(data, label)
+通过 Sampler 从 Dataset 中采样
+-----------------------------------------
+:class:`~.dataset.Dataset` 仅能通过一个固定的顺序（其 `__getitem__` 实现）访问所有样本，
+而 :class:`~.sampler.Sampler` 使得我们可以以所期望的方式从 :class:`~.dataset.Dataset` 中采样，生成训练和测试的批（minibatch）数据。
+:class:`~.sampler.Sampler` 本质上是一个数据集中数据索引的迭代器，它接收 :class:`~.dataset.Dataset` 的实例 和批大小（batch_size）来进行初始化。
+MegEngine 中提供各种常见的采样器，如 :class:`~.sampler.RandomSampler` （通常用于训练）、 :class:`~.sampler.SequentialSampler` （通常用于测试） 等。
+下面我们以它们为例，来熟悉 :class:`~.sampler.Sampler` 的基本用法：
+.. code-block::
+    # 导入 MegEngine 中采样器
+    from megengine.data import RandomSampler
+    # 创建一个随机采样器
+    random_sampler = RandomSampler(dataset=xor_dataset, batch_size=4)
+    # 获取迭代sampler时每次返回的数据集索引
+    for indices in random_sampler:
+        print(indices)
+        break;
+输出：
+.. testoutput::
+    [19827, 2614, 8788, 8641]
+可以看到，在 batch_size 为4时，每次迭代 sampler 返回的是长度为4的列表，列表中的每个元素是随机采样出的数据索引。
+如果你创建的是一个序列化采样器 :class:`~.sampler.SequentialSampler` ，那么每次返回的就是顺序索引。
+.. code-block::
+    from megengine.data import SequentialSampler
+    sequential_sampler = SequentialSampler(dataset=xor_dataset, batch_size=4)
+    # 获取迭代sampler时返回的数据集索引信息
+    for indices in sequential_sampler:
+        print(indices)
+        break;
+输出：
+.. testoutput::
+    [0, 1, 2, 3]
+用户也可以继承 Sampler 自定义采样器，这里不做详述。
+用 DataLoader 生成批数据
+------------------------------------------
+MegEngine 中，:class:`~.dataloader.DataLoader` 本质上是一个迭代器，它通过 :class:`~.meta_dataset.Dataset` 和 :class:`~.sampler.Sampler` 生成 minibatch 数据。
+下列代码通过 for 循环获取每个 minibatch 的数据。
+.. code-block::
+    from megengine.data import DataLoader
+    # 创建一个 DataLoader，并指定数据集和顺序采样器
+    xor_dataloader = DataLoader(
+        dataset=xor_dataset,
+        sampler=sequential_sampler,
+    )
+    print("The length of the xor_dataloader is: {}".format(len(xor_dataloader)))
+    # 从 DataLoader 中迭代地获取每批数据
+    for idx, (cor, tag) in enumerate(xor_dataloader):
+        print("iter %d : " % (idx), cor, tag)
+        break;
+输出：
+.. testoutput::
+    The length of the xor_dataloader is: 7500
+    iter 0 :  [[ 0.97255366  0.74678389]
+     [ 0.01949105 -0.45632857]
+     [-0.32616254 -0.56609147]
+     [-0.44704571 -0.31336881]] [0 1 0 0]
+DataLoader 中的数据变换（Transform）
+-------------------------------------------
+在深度学习模型的训练中，我们经常需要对数据进行各种转换，比如，归一化、各种形式的数据增广等。
+:class:`~.meta_transform.Transform` 是数据变换的基类，其各种派生类提供了常见的数据转换功能。
+:class:`~.dataloader.DataLoader` 构造函数可以接收一个 :class:`~.meta_transform.Transform` 参数，
+在构建 minibatch 时，对该批数据进行相应的转换操作。
+接下来通过 MNIST 数据集（MegEngine 提供了 MNIST Dataset）来熟悉 Transform 的使用。
+首先我们构建一个不做 Transform 的 MNIST DataLoader，并可视化第一个 minibatch 数据。
+.. code-block::
+    # 从 MegEngine 中导入 MNIST 数据集
+    from megengine.data.dataset import MNIST
+    # 若你是一次下载 MNIST 数据集，download 需设置成 True
+    # 若你已经下载 MNIST 数据集，通过 root 指定 MNIST数据集 raw 路径
+    # 通过 设置 train=True/False 获取训练集或测试集
+    mnist_train_dataset = MNIST(root="./dataset/MNIST", train=True, download=True)
+    # mnist_test_dataset = MNIST(root="./dataset/MNIST", train=False, download=True)
+    sequential_sampler = SequentialSampler(dataset=mnist_train_dataset, batch_size=4)
+    mnist_train_dataloader = DataLoader(
+        dataset=mnist_train_dataset,
+        sampler=sequential_sampler,
+    )
+    for i, batch_sample in enumerate(mnist_train_dataloader):
+        batch_image, batch_label = batch_sample[0], batch_sample[1]
+        # 下面可以将 batch_image, batch_label 传递给网络做训练，这里省略
+        # trainging code ...
+        # 中断
+        break
+    print("The shape of minibatch is: {}".format(batch_image.shape))
+    # 导入可视化 Python 库，若没有请安装
+    import matplotlib.pyplot as plt
+    def show(batch_image, batch_label):
+        for i in range(4):
+            plt.subplot(1, 4, i+1)
+            plt.imshow(batch_image[i][:,:,-1], cmap='gray')
+            plt.xticks([])
+            plt.yticks([])
+            plt.title("label: {}".format(batch_label[i]))
+        plt.show()
+    # 可视化数据
+    show(batch_image, batch_label)
+输出：
+.. testoutput::
+    The shape of minibatch is: (4, 28, 28, 1)
+可视化第一批 MNIST 数据：
+.. figure::
+    ./fig/mnist_batch.png
+    :scale: 60%
+    图2
+然后，我们构建一个做 :class:`~.vision.transform.RandomResizedCrop` transform 的 MNIST DataLoader，并查看此时第一个 minibatch 的图片。
+.. code-block::
+    # 导入 MegEngine 已支持的一些数据增强操作
+    from megengine.data.transform import RandomResizedCrop
+    dataloader = DataLoader(
+        mnist_train_dataset,
+        sampler=sequential_sampler,
+        # 指定随机裁剪后的图片的输出size
+        transform=RandomResizedCrop(output_size=28),
+    )
+    for i, batch_sample in enumerate(dataloader):
+        batch_image, batch_label = batch_sample[0], batch_sample[1]
+        break;
+    show(batch_image, batch_label)
+可视化第一个批数据：
+.. figure::
+    ./fig/mnist_aug.png
+    :scale: 60%
+    图3
+可以看到，此时图片经过了随机裁剪并 resize 回原尺寸。
+组合变换（Compose Transform）
+`````````````````````````````````````````````
+我们经常需要做一系列数据变换。比如：
+* 数据归一化：我们可以通过 :class:`~.meta_transform.Transform` 中提供的 :class:`~.vision.transform.Normalize` 类来实现；
+* Pad：对图片的每条边补零以增大图片尺寸，通过 :class:`~.transform.Pad` 类来实现；
+* 维度转换：将 (Batch-size, Hight, Width, Channel) 维度的 minibatch 转换为 (Batch-size, Channel, Hight, Width)（因为这是 MegEngine 支持的数据格式），通过 :class:`~.vision.transform.ToMode` 类来实现；
+* 其他的转换操作
+为了方便使用，MegEngine 中的 :class:`~.vision.transform.Compose` 类允许我们组合多个 Transform 并传递给 :class:`~.dataloader.DataLoader` 的 transform 参数。
+接下来我们通过 :class:`~.vision.transform.Compose` 类将之前的 :class:`~.vision.transform.RandomResizedCrop` 操作与 :class:`~.vision.transform.Normalize` 、 :class:`~.vision.transform.Pad` 和 :class:`~.vision.transform.ToMode` 操作组合起来，
+实现多种数据转换操作的混合使用。运行如下代码查看转换 minibatch 的维度信息。
+.. code-block::
+    from megengine.data.transform import RandomResizedCrop, Normalize, ToMode, Pad, Compose
+    # 利用 Compose 组合多个 Transform 操作
+    dataloader = DataLoader(
+        mnist_train_dataset,
+        sampler=sequential_sampler,
+        transform=Compose([
+            RandomResizedCrop(output_size=28),
+            # mean 和 std 分别是 MNIST 数据的均值和标准差，图片数值范围是 0~255
+            Normalize(mean=0.1307*255, std=0.3081*255),
+            Pad(2),
+            ToMode('CHW'),
+        ])
+    )
+    for i, batch_sample in enumerate(dataloader):
+        batch_image, batch_label = batch_sample[0], batch_sample[1]
+        break;
+    print("The shape of the batch is now: {}".format(batch_image.shape))
+输出：
+.. testoutput::
+    The shape of the batch is now: (4, 1, 32, 32)
+可以看到此时 minibatch 数据的 channel 维换了位置，且图片尺寸变为32。
+:class:`~.dataloader.DataLoader` 中其他参数的用法请参考 :class:`~.dataloader.DataLoader` 文档。
--- a/source/basic/dynamic_and_static_graph.rst
+++ b/source/basic/dynamic_and_static_graph.rst
+.. _dynamic_and_static_graph:
+动态图与静态图
+==============================
+:ref:`train_and_evaluation` 中的网络基于 **动态计算图** ，其核心特点是计算图的构建和计算同时发生（define by run）。在计算图中定义一个 Tensor 时，其值就已经被计算且确定了。这种模式在调试模型时较为方便，能够实时得到中间结果的值。但是，由于所有节点都需要被保存并且可以被访问，这导致我们难以对整个计算图进行优化。
+静态图
+------------------------------
+MegEngine支持 **静态计算图** 模式。该模式将计算图的构建和计算分开（define and run）。在构建阶段，MegEngine 根据完整的计算流程对原始的计算图（即前面的动态计算图）进行优化和调整得到更省内存和计算量更少的计算图，这个过程称之为 **编译** 。编译之后图的结构不再改变，也就是所谓的“静态”。在计算阶段，MegEngine 根据输入数据执行编译好的计算图得到计算结果。
+静态计算图模式下，我们只能保证最终结果和动态图一致，但中间过程对于用户来说是个黑盒，无法像动态图一样随时拿到中间计算结果。
+下面我们举例说明静态图编译过程中可能进行的内存和计算优化：
+.. figure::
+    ./fig/op_fuse.png
+    :scale: 50%
+在上图左侧的计算图中，为了存储 ``x`` 、 ``w`` 、 ``p`` 、 ``b``， ``y`` 五个变量，动态图需要 ``40`` 个字节（假设每个变量占用 8 字节的内存）。在静态图中，由于我们只需要知道结果 ``y`` ，可以让 ``y`` 复用中间变量 ``p`` 的内存，实现“原地”（inplace）修改。这样，静态图所占用的内存就减少为 ``32`` 个字节。
+MegEngine 还采用 **算子融合** （Operator Fuse）的方式减少计算开销。以上图为例，我们可以将乘法和加法融合为一个三元操作（假设硬件支持） **乘加** ，降低计算量。
+注意，只有了解了完整的计算流程后才能进行上述优化。
+动态图转静态图
+------------------------------
+MegEngine 提供了很方便的动静态图转换的方法，几乎无需代码改动即可实现转换。 如同 :ref:`train_and_evaluation` ，动态图的训练代码如下：
+.. testcode::
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    total_epochs = 10
+    for epoch in range(total_epochs):
+        total_loss = 0
+        for step, (batch_data, batch_label) in enumerate(dataloader):
+            optimizer.zero_grad() # 将参数的梯度置零
+            # 以下五行代码为网络的计算和优化，后续转静态图时将进行处理
+            data.set_value(batch_data)
+            label.set_value(batch_label)
+            logits = le_net(data)
+            loss = F.cross_entropy_with_softmax(logits, label)
+            optimizer.backward(loss) # 反传计算梯度
+            optimizer.step()  # 根据梯度更新参数值
+            total_loss += loss.numpy().item()
+        print("epoch: {}, loss {}".format(epoch, total_loss/len(dataloader)))
+我们可以通过以下两步将上面的动态图转换为静态图：
+1. 将循环内的网络计算和优化代码（共5行）提取成一个单独的训练函数，并返回任意你需要的结果（如计算图的结果和损失函数值），如下面例子中的 ``train_func`` ；
+2. 用 :mod:`~.megengine.jit` 包中的 :class:`~.trace` `装饰器 <https://docs.python.org/zh-cn/3/glossary.html#term-decorator>`_ 来装饰这个函数，将其中的代码变为静态图代码。
+代码如下：
+.. code-block::
+    from megengine.jit import trace
+    @trace
+    def train_func(data, label, *, opt, net): # *号前为位置参数，*号后为关键字参数
+        # 此处data和label不再需要先创建tensor然后通过set_value赋值，这些操作在trace内部完成
+        logits = net(data)
+        loss = F.cross_entropy_with_softmax(logits, label)
+        opt.backward(loss)
+        return logits, loss
+对于上述代码，我们作进一步的解释：
+* **jit** ： `即时编译 <https://zh.wikipedia.org/wiki/%E5%8D%B3%E6%99%82%E7%B7%A8%E8%AD%AF>`_ （Just-in-time compilation）的缩写，这里作为整个静态图相关 Package 的名字。
+* **trace** ：得到静态图的一种方式，直译为“ `追溯 <https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation>`_ ”。它通过追溯输出（比如损失值、预测值等）所依赖的网络结构，得到整体的计算图，再进行编译。
+* **参数列表** ： :class:`~.trace` 在编译静态图时会根据传入参数是位置参数还是关键字参数来采取不同的处理方式。位置参数用于传入网络的输入如数据和标签，关键字参数用于传入其它变量，如网络和优化器等。
+.. note::
+    一般来说，静态图不支持依赖于运行时信息的条件语句。
+静态图转动态图
+------------------------------
+经过 :class:`~.trace` 装饰的静态图代码可以通过停用 :class:`~.trace` 变为动态图代码，有两种方式：
+1. 修改环境变量：对于完整运行一个 ``.py`` 文件的情况，MegEngine 建议使用环境变量进行控制，这样 **无需对代码进行修改就可以自由的实现动静态图的切换** ：
+.. code-block:: bash
+    export MGE_DISABLE_TRACE=1
+2. 修改 :class:`~.trace` 的类属性：如果是 notebook 等难以切换环境变量的环境，可以在调用 trace 装饰的函数之前设置 trace 的 :attr:`~.trace.enabled` 属性为False：
+.. code-block::
+    trace.enabled = False # 关闭trace
+完整训练示例
+------------------------------
+下面的代码将 :ref:`train_and_evaluation` 中的训练代码改为静态图模式：
+.. testcode::
+    from megengine.data import DataLoader
+    from megengine.data.transform import ToMode, Pad, Normalize, Compose
+    from megengine.data import RandomSampler
+    from megengine.data.dataset import MNIST
+    # 读取训练数据并进行预处理
+    mnist_train_dataset = MNIST(root="./dataset/MNIST", train=True, download=True)
+    dataloader = DataLoader(
+        mnist_train_dataset,
+        transform=Compose([
+            Normalize(mean=0.1307*255, std=0.3081*255),
+            Pad(2),
+            ToMode('CHW'),
+        ]),
+        sampler=RandomSampler(dataset=mnist_train_dataset, batch_size=64, drop_last=True), # 训练时一般使用RandomSampler来打乱数据顺序
+    )
+    # 网络和优化器的创建
+    le_net = LeNet()
+    optimizer = optim.SGD(
+        le_net.parameters(), # 参数列表
+        lr=0.05,  # 学习速率
+    )
+    trace.enabled = True # 开启trace，使用静态图模式
+    total_epochs = 10
+    for epoch in range(total_epochs):
+        total_loss = 0
+        for step, (data, label) in enumerate(dataloader):
+            optimizer.zero_grad() # 将参数的梯度置零
+            label = label.astype('int32') # 交叉熵损失的label需要int32类型        
+            # 调用被 trace 装饰后的函数
+            logits, loss = train_func(data, label, opt=optimizer, net=le_net)
+            optimizer.step()  # 根据梯度更新参数值
+            total_loss += loss.numpy().item()
+        print("epoch: {}, loss {}".format(epoch, total_loss/len(dataloader)))
+静态图下的测试
+------------------------------
+静态图模式下网络的测试同样需要将测试过程提取成一个单独的测试函数并使用 :class:`~.trace` 进行装饰。测试函数如下所示，接收测试数据和网络作为参数并返回网络输出：
+.. code-block::
+    @trace
+    def eval_func(data, *, net): # *号前为位置参数，*号后为关键字参数
+        logits = net(data)
+        return logits
+下面的代码将 :ref:`train_and_evaluation` 中的测试代码改为静态图模式：
+.. testcode::
+    import megengine as mge
+    # 读取测试数据并进行预处理
+    mnist_train_dataset = MNIST(root="./dataset/MNIST", train=False, download=True)
+    dataloader_test = DataLoader(
+        mnist_train_dataset,
+        transform=Compose([
+            Normalize(mean=0.1307*255, std=0.3081*255),
+            Pad(2),
+            ToMode('CHW'),
+        ]),
+    )
+    trace.enabled = True # 开启trace，使用静态图模式
+    le_net.eval() # 将网络设为测试模式
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32")
+    correct = 0
+    total = 0
+    for idx, (batch_data, batch_label) in enumerate(dataloader_test):
+        data.set_value(batch_data)
+        label.set_value(batch_label)
+        logits = eval_func(data, net=le_net) # 测试函数
+        predicted = F.argmax(logits, axis=1)
+        correct += (predicted==label).sum().numpy().item()
+        total += label.shape[0]
+    print("correct: {}, total: {}, accuracy: {}".format(correct, total, float(correct)/total))
--- a/source/basic/fig/back_prop.png
+++ b/source/basic/fig/back_prop.png
--- a/source/basic/fig/computer_graph.png
+++ b/source/basic/fig/computer_graph.png
--- a/source/basic/fig/dataset.png
+++ b/source/basic/fig/dataset.png
--- a/source/basic/fig/lenet.jpg
+++ b/source/basic/fig/lenet.jpg
--- a/source/basic/fig/loss_curve.png
+++ b/source/basic/fig/loss_curve.png
--- a/source/basic/fig/mnist_aug.png
+++ b/source/basic/fig/mnist_aug.png
--- a/source/basic/fig/mnist_batch.png
+++ b/source/basic/fig/mnist_batch.png
--- a/source/basic/fig/op_fuse.png
+++ b/source/basic/fig/op_fuse.png
--- a/source/basic/fig/plot_xor.jpg
+++ b/source/basic/fig/plot_xor.jpg
--- a/source/basic/fig/resnet-block.svg
+++ b/source/basic/fig/resnet-block.svg
--- a/source/basic/fig/xor.png
+++ b/source/basic/fig/xor.png
--- a/source/basic/index.rst
+++ b/source/basic/index.rst
+.. _basic:
+引言
+==============================
+在这部分，您将了解 MegEngine 的基础概念和基本使用方法。
+为了学习这部分内容，您需要：
+* 根据 :ref:`installation` 成功安装 MegEngine。
+* 具备 Python 和 NumPy 的基础知识。
+您不需要：
+* 了解其它的深度学习框架，如 PyTorch。
+* 具备机器学习和深度神经网络的背景知识。
+这部分共包含五个小节，由浅入深，需要按照顺序学习。
+#. :ref:`basic_concepts` ：介绍计算图、张量、算子等概念。
+#. :ref:`network_build` ：介绍在 MegEngine 中搭建网络的两种方式。
+#. :ref:`data_load` ：介绍在 MegEngine 中加载和处理数据的方法。
+#. :ref:`train_and_evaluation` ：介绍如何进行模型的训练与测试。
+#. :ref:`dynamic_and_static_graph` ：介绍动态图和静态图的概念，优缺点和使用方法。实际应用中，对于较大的网络，推荐使用静态图。
+对于上述内容非常熟悉的用户（例如 PyTorch 用户），也建议您快速阅读了解这部分的内容，因为它们是 :ref:`进阶学习 <advanced>` 的基础。
+.. toctree::
+    :maxdepth: 2
+    :hidden:
+    basic_concepts
+    network_build
+    data_load
+    train_and_evaluation
+    dynamic_and_static_graph
--- a/source/basic/network_build.rst
+++ b/source/basic/network_build.rst
+.. _network_build:
+网络搭建
+==============================
+在 :ref:`basic_concepts` 中我们介绍了计算图、张量和算子，神经网络可以看成一个计算图。在 MegEngine 中，我们按照计算图的拓扑结构，将张量和算子连接起来，即可完成对网络的搭建。MegEngine 提供了基于 :mod:`~.megengine.functional` 和基于 :class:`~.Module` 的两种方式搭建网络。 :mod:`~.megengine.functional` 仅提供最基本的算子功能，数据连接的工作完全由用户完成； :class:`~.Module` 对网络模块（包含若干算子及其参数的基本单元）进行了进一步的封装，代码更易复用和维护。
+基于 :mod:`~.megengine.functional` 搭建网络
+------------------------------
+:mod:`~.megengine.functional` 包提供了常用的算子函数（如 :func:`~.functional.nn.conv2d` 、 :func:`~.functional.nn.linear` 等）。这些函数接受参与计算的张量并返回计算结果。参与计算的张量通常包括两类：输入数据和该算子自身的参数，其中后者是网路中需要学习的变量。比如，二维卷积（ :func:`~.functional.nn.conv2d` ）接受多通道的二维图像作为输入数据，把卷积核作为参数，输出经卷积操作后的多通道二维图像。
+算子的输入和输出数据都是 :class:`~.Tensor` 类型。算子的参数通常由 :class:`~.Parameter` 类表示。 :class:`~.Parameter` 是 :class:`~.Tensor` 的子类，其对象（即网络参数）可以被优化器更新。更多内容参见 :ref:`train_and_evaluation` 。
+下面的例子实现了一个两层卷积网络（使用 `ReLU <https://en.wikipedia.org/wiki/Rectifier_(neural_networks)>`_ 作为激活函数）：
+.. testcode::
+    import megengine as mge
+    import megengine.functional as F
+    import numpy as np
+    def two_layer_conv(x):
+        # (8, 3, 3, 3) 代表（输出信道数，输入信道数，卷积核高度，卷积核宽度）
+        conv_weight = mge.Parameter(np.random.randn(8, 3, 3, 3).astype(np.float32))
+        # 对于 8 个卷积核，提供 8 个 bias
+        conv_bias = mge.Parameter(np.zeros((1, 8, 1, 1), dtype=np.float32))
+        x = F.conv2d(x, conv_weight, conv_bias)
+        x = F.relu(x)
+        conv_weight = mge.Parameter(np.random.randn(16, 8, 3, 3).astype(np.float32))
+        conv_bias = mge.Parameter(np.zeros((1, 16, 1, 1), dtype=np.float32))
+        x = F.conv2d(x, conv_weight, conv_bias)
+        x = F.relu(x)
+        return x
+    # 输入形状为 (2, 3, 32, 32) 的张量
+    x = mge.tensor(np.random.randn(2, 3, 32, 32).astype(np.float32))
+    out = two_layer_conv(x)
+    print(out.shape)  # 输出： (2, 16, 28, 28)
+基于 :class:`~.Module` 搭建网络
+------------------------------
+在上面的代码中，对于每一个需要参数的算子，都需要单独定义其网络参数。由于“ conv + relu ”这样的组合出现了两次，代码显得臃肿。对于更加复杂的网络，这样的写法可读性、可复用性和可维护性会比较差。
+为了更好的封装和复用算子， MegEngine 在 :mod:`~.megengine.functional` 基础上提供了 :mod:`~.megengine.module` 包。
+:mod:`megengine.module` 包定义了抽象的网络模块基类 :class:`~.Module` 。它是构造网络的基本单元，可以被组合和叠加。它定义了网络模块的基本接口和属性，如“前向传播"等。所有 :class:`~.Module` 子类都需要实现 :class:`~.Module` 定义的两个抽象方法，介绍如下：
+* :class:`__init__() <.Module>` ：在构造方法中创建这个模块，包括定义网络参数、构造和连接其子模块等工作。
+* :meth:`~.Module.forward` ： 该方法定义前向传播计算流程。它接受输入数据并返回前向传播的计算结果。注意， :class:`~.Module` 对象是可被调用的 （ callable ），其实现就是 :meth:`~.Module.forward` 。
+:mod:`megengine.module` 包提供了常用的网络基本模块，如 :class:`~.conv.Conv2d` 、:class:`~.linear.Linear` 等。以 :class:`~.conv.Conv2d` 为例，该类的 :class:`__init__() <.conv.Conv2d>` 方法定义并初始化卷积核参数，其 :meth:`~.conv.Conv2d.forward` 方法执行卷积操作。
+基于各种常用的网络模块，我们可以方便地搭建非常复杂的网络。例如，上一个例子的网络定义可以简化成如下写法：
+.. testcode::
+    import megengine.module as M
+    # 为了演示，我们在这里定义了一个简单的卷积模块。注意： MegEngine 已经提供了更为通用的 Conv2d 模块。
+    class ConvReLU(M.Module):
+        def __init__(self, in_channels, out_channels):
+            # 先调用父类的初始化
+            super().__init__()
+            # 定义卷积权重和 bias ，作为模块参数
+            self.conv_weight = mge.Parameter(np.random.randn(out_channels, in_channels, 3, 3).astype(np.float32))
+            self.conv_bias = mge.Parameter(np.zeros((1, out_channels, 1, 1), dtype=np.float32))
+            # 将激活函数 ReLU 作为子模块
+            self.relu = M.ReLU()
+        def forward(self, x):
+            x = F.conv2d(x, self.conv_weight, self.conv_bias)
+            x = self.relu(x)
+            return x
+    # 基于 ConvReLU ，定义一个两层卷积网络
+    class TwoLayerConv(M.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv_relu1 = ConvReLU(3, 8)
+            self.conv_relu2 = ConvReLU(8, 16)
+        def forward(self, x):
+            x = self.conv_relu1(x)
+            x = self.conv_relu2(x)
+            return x
+    # 输入形状为 (2, 3, 32, 32) 的张量
+    x = mge.tensor(np.random.randn(2, 3, 32, 32).astype(np.float32))
+    two_layer_conv_module = TwoLayerConv()
+    out = two_layer_conv_module(x)
+    print(out.shape)  # 输出： (2, 16, 28, 28)
+使用 :class:`~.Module` 定义的网络比使用 :mod:`~.megengine.functional` 进一步封装了内部实现，更易复用，统一的接口使得代码更易维护。 我们推荐使用 :class:`~.Module` 搭建网络。
+此外， :class:`~.Module` 其它常用的方法如下：
+* :meth:`~.Module.parameters` ： 该方法返回包含网络参数的迭代器。
+* :meth:`~.Module.named_parameters` ： 该方法返回包含参数名称及对应网络参数的迭代器。
+* :meth:`~.Module.state_dict`：返回以参数名称和网络参数为键值对的有序字典，可用于保存训练好的模型。比如，对于上面定义的 ``ConvReLU`` 模块，打印它的一个实例的 ``state_dict`` ：
+.. testcode::
+    conv_relu = ConvReLU(2, 3)
+    print(conv_relu.state_dict())
+输出的参数信息有卷积的权重项 ``'conv_weight'`` 和偏置项 ``'conv_bias'`` ：
+.. testoutput::
+    OrderedDict([('conv_bias', array([[[[0.]],
+            [[0.]],
+            [[0.]]]], dtype=float32)), ('conv_weight', array([[[[-0.53457755,  0.2799128 , -0.6624546 ],
+            [-0.9222688 ,  1.2226251 , -0.5591961 ],
+            [-0.45538583, -0.95166504,  1.1570141 ]],
+            [[-0.89926094,  0.09956062, -0.7329557 ],
+            [-0.67284465,  0.34817234,  0.6731445 ],
+            [ 0.61970276,  1.8007269 ,  1.6130987 ]]],
+        [[[ 1.7108068 , -1.7188625 , -0.52539474],
+            [-0.04049037,  0.03099988, -1.4271212 ],
+            [-0.9138133 ,  0.3976046 , -1.1582668 ]],
+            [[-1.2193677 ,  0.24107741, -0.50833786],
+            [ 0.9088649 , -0.2747458 , -0.1261102 ],
+            [ 0.00594431,  0.65737075,  1.5280651 ]]],
+        [[[ 0.24874896, -1.3824748 ,  2.2161844 ],
+            [-0.6629168 ,  1.0220655 , -0.53007567],
+            [ 0.37829646,  1.1993718 ,  1.0667052 ]],
+            [[-0.66264534, -0.6392335 , -0.41280702],
+            [ 1.7417566 ,  0.75295806, -0.4228349 ],
+            [-0.94973356,  2.4136777 , -0.06665667]]]], dtype=float32))])
+最后，我们来搭建更加复杂的、经典的 `LeNet <http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf>`_ 网络，其结构如下图：
+.. figure::
+    ./fig/lenet.jpg
+    :scale: 60%
+    图1 LeNet ( http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf )
+使用 :class:`~.Module` 搭建 LeNet 的代码如下：
+.. testcode::
+    class LeNet(M.Module):
+        def __init__(self):
+            super(LeNet, self).__init__()
+            # 单信道图片, 两层  5x5 卷积 + ReLU + 池化
+            self.conv1 = M.Conv2d(1, 6, 5)
+            self.relu1 = M.ReLU()
+            self.pool1 = M.MaxPool2d(2, 2)
+            self.conv2 = M.Conv2d(6, 16, 5)
+            self.relu2 = M.ReLU()
+            self.pool2 = M.MaxPool2d(2, 2)
+            # 两层全连接 + ReLU
+            self.fc1 = M.Linear(16 * 5 * 5, 120)
+            self.relu3 = M.ReLU()
+            self.fc2 = M.Linear(120, 84)
+            self.relu4 = M.ReLU()
+            # 分类器
+            self.classifer = M.Linear(84, 10)
+        def forward(self, x):
+            x = self.pool1(self.relu1(self.conv1(x)))
+            x = self.pool2(self.relu2(self.conv2(x)))
+            # F.flatten 将原本形状为 (N, C, H, W) 的张量x从第一个维度（即C）开始拉平成一个维度，
+            # 得到的新张量形状为 (N, C*H*W) 。 等价于 reshape 操作： x = x.reshape(x.shape[0], -1)
+            x = F.flatten(x, 1)
+            x = self.relu3(self.fc1(x))
+            x = self.relu4(self.fc2(x))
+            x = self.classifer(x)
+            return x
+    # 输入形状为 (2, 1, 32, 32) 的张量
+    x = mge.tensor(np.random.randn(2, 1, 32, 32).astype(np.float32))
+    le_net = LeNet()
+    # 调用网络，即执行 le_net 的 forward 成员方法，返回网络处理结果
+    out = le_net(x)
+    print(out.shape)  # 输出： (2, 10)
--- a/source/basic/train_and_evaluation.rst
+++ b/source/basic/train_and_evaluation.rst
+.. _train_and_evaluation:
+网络的训练和测试
+==============================
+本章我们以 :ref:`network_build` 中的 ``LeNet`` 为例介绍网络的训练和测试。 ``LeNet`` 的实例化代码如下所示：
+.. testcode::
+    # 实例化
+    le_net = LeNet()
+网络的训练和保存
+------------------------------
+在此我们仿照 :ref:`data_load` 中的方式读取 `MNIST <http://yann.lecun.com/exdb/mnist/>`_ 数据。 下面的代码和之前基本一样，我们删除了注释并去掉了 ``RandomResizedCrop`` （MNIST 数据集通常不需要此数据增广）。
+.. testcode::
+    from megengine.data import DataLoader
+    from megengine.data.transform import ToMode, Pad, Normalize, Compose
+    from megengine.data import RandomSampler
+    from megengine.data.dataset import MNIST
+    # 读取训练数据并进行预处理
+    mnist_train_dataset = MNIST(root="./dataset/MNIST", train=True, download=True)
+    dataloader = DataLoader(
+        mnist_train_dataset,
+        transform=Compose([
+            Normalize(mean=0.1307*255, std=0.3081*255),
+            Pad(2),
+            ToMode('CHW'),
+        ]),
+        sampler=RandomSampler(dataset=mnist_train_dataset, batch_size=64, drop_last=True), # 训练时一般使用RandomSampler来打乱数据顺序
+    )
+损失函数
+``````````````````````````````
+有了数据之后通过前向传播可以得到网络的输出，我们用 **损失函数** （loss function）来度量网络输出与训练目标之间的差距。
+MegEngine 提供了各种常见损失函数，具体可见API文档中的 :mod:`~.functional.loss` 部分。 例如，分类任务经常使用 :func:`交叉熵损失 <.cross_entropy>` （cross entropy），而回归任务一般使用 :func:`均方误差损失 <.square_loss>` （square loss）。此处我们以交叉熵损失为例进行说明。
+用 :math:`p(x)` 表示真实的数据分布， :math:`q(x)` 表示网络输出的数据分布，交叉熵损失的计算公式如下：
+.. math::
+    Loss_{cross-entropy} = \sum_{x} p(x)\log(q(x))
+如下代码展示了如何使用交叉熵损失：
+.. testcode::
+    import megengine as mge
+    import megengine.functional as F
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    for step, (batch_data, batch_label) in enumerate(dataloader):
+        data.set_value(batch_data)
+        label.set_value(batch_label)
+        logits = le_net(data)
+        # logits为网络的输出结果，label是数据的真实标签即训练目标
+        loss = F.cross_entropy_with_softmax(logits, label) # 交叉熵损失函数
+优化器
+``````````````````````````````
+**网络训练** 即通过更新网络参数来最小化损失函数的过程，这个过程由 MegEngine 中的 **优化器** （optimizer）来完成。
+优化器首先通过反向传播获取所有网络参数相对于损失函数的梯度，然后根据具体的优化策略和梯度值来更新参数。
+MegEngine 提供了基于各种常见优化策略的优化器，如 :class:`~.adam.Adam` 和 :class:`~.sgd.SGD` 。 它们都继承自 :class:`~.Optimizer` 基类，主要包含参数梯度的计算（ :meth:`~.Optimizer.backward` ）和参数更新（ :meth:`~.Optimizer.step` ）这两个方法。
+下面我们通过一个最简单的优化策略来示例说明，参数更新公式如下：
+.. math::
+    weight = weight - learning\_rate * gradient
+此处的 ``learning_rate`` 代表学习速率，用来控制参数每次更新的幅度。在 MegEngine 中此更新方式对应的优化器是 :class:`~.sgd.SGD` 。 我们首先创建一个优化器：
+.. testcode::
+    import megengine.optimizer as optim
+    optimizer = optim.SGD(
+        le_net.parameters(), # 参数列表，将指定参数与优化器绑定
+        lr=0.05,  # 学习速率
+    )
+然后通过 ``dataloader`` 读取一遍训练数据，并利用优化器对网络参数进行更新，这样的一轮更新我们称为一个epoch：
+.. testcode::
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    for step, (batch_data, batch_label) in enumerate(dataloader):
+        data.set_value(batch_data)
+        label.set_value(batch_label)
+        optimizer.zero_grad() # 将参数的梯度置零
+        logits = le_net(data)
+        loss = F.cross_entropy_with_softmax(logits, label)
+        optimizer.backward(loss) # 反传计算梯度
+        optimizer.step()  # 根据梯度更新参数值
+训练示例
+``````````````````````````````
+完整的训练流程通常需要运行多个epoch，代码如下所示：
+.. testcode::
+    import megengine as mge
+    import megengine.optimizer as optim
+    # 网络和优化器的创建
+    le_net = LeNet()
+    optimizer = optim.SGD(
+        le_net.parameters(), # 参数列表
+        lr=0.05,  # 学习速率
+    )
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32") # 交叉熵损失函数的标签数据需要是整型类型
+    total_epochs = 10
+    for epoch in range(total_epochs):
+        total_loss = 0
+        for step, (batch_data, batch_label) in enumerate(dataloader):
+            data.set_value(batch_data)
+            label.set_value(batch_label)
+            optimizer.zero_grad() # 将参数的梯度置零
+            logits = le_net(data)
+            loss = F.cross_entropy_with_softmax(logits, label)
+            optimizer.backward(loss) # 反传计算梯度
+            optimizer.step()  # 根据梯度更新参数值
+            total_loss += loss.numpy().item()
+        print("epoch: {}, loss {}".format(epoch, total_loss/len(dataloader)))
+训练输出如下：
+.. testoutput::
+    epoch: 0, loss 0.22623900164399877
+    epoch: 1, loss 0.07118050173928966
+    epoch: 2, loss 0.050515039509092044
+    epoch: 3, loss 0.0389270530823056
+    epoch: 4, loss 0.0309853484441587
+    epoch: 5, loss 0.025080320053271498
+    epoch: 6, loss 0.02029314023363145
+    epoch: 7, loss 0.016173969717602186
+    epoch: 8, loss 0.013455517796447727
+    epoch: 9, loss 0.010755786676661053
+GPU和CPU切换
+``````````````````````````````
+MegEngine 在GPU和CPU同时存在时默认使用GPU进行训练。用户可以调用 :func:`~.core.device.set_default_device` 来根据自身需求设置默认计算设备。
+如下代码设置默认设备为CPU：
+.. testcode::
+    import megengine as mge
+    # 默认使用CPU
+    mge.set_default_device('cpux')
+如下代码设置默认设备为GPU:
+.. testcode::
+    # 默认使用GPU
+    mge.set_default_device('gpux')
+更多用法可见 :func:`~.core.device.set_default_device` API 文档。
+如果不想修改代码，用户也可通过环境变量 ``MGE_DEFAULT_DEVICE`` 来设置默认计算设备：
+.. code-block:: bash
+    # 默认使用CPU
+    export MGE_DEFAULT_DEVICE='cpux'
+    # 默认使用GPU
+    export MGE_DEFAULT_DEVICE='gpux'
+网络的保存
+``````````````````````````````
+网络训练完成之后需要保存，以便后续使用。在之前 :ref:`network_build` 部分，我们介绍了网络模块 Module 中  :meth:`~.Module.state_dict`  的功能： :meth:`~.Module.state_dict` 遍历网络的所有参数，将其组成一个有序字典并返回。 我们通过 MegEngine 中的 :func:`~.serialization.save` 保存这些网络参数。
+.. testcode::
+    path = "lenet.mge"  # 我们约定用".mge"拓展名表示 MegEngine 模型文件
+    mge.save(le_net.state_dict(), path)
+网络的加载和测试
+------------------------------
+网络的加载
+``````````````````````````````
+测试时我们可以通过 :func:`~.serialization.load` 来读取 ``lenet.mge`` ，它会返回 :meth:`~.Module.state_dict` 字典对象，其中保存了模型中的模块名称和对应参数。 接着，我们可以通过 Module 的 :meth:`~.Module.load_state_dict` 方法将该字典对象加载到 ``le_net`` 模型。
+.. testcode::
+    state_dict = mge.load("lenet.mge")
+    # 将参数加载到网络
+    le_net.load_state_dict(state_dict)
+:meth:`~.Module.eval` 和  :meth:`~.Module.train`
+``````````````````````````````
+有少数算子训练和测试时行为不一致，例如 :class:`~.module.dropout.Dropout` 和 :class:`~.module.batchnorm.BatchNorm2d` 。 :class:`~.module.dropout.Dropout` 在训练时会以一定的概率概率将指定层的部分输出置零而在测试时则不会对输出进行任何更改。 :class:`~.module.batchnorm.BatchNorm2d` 在训练时会不断统计更新对应张量的均值和标准差，测试时则不会更新这两个值。
+为了保证训练和测试行为的正确，MegEngine 通过 :meth:`~.Module.eval` 和 :meth:`~.Module.train` 来设置算子的状态。在 MegEngine 当中网络默认为训练模式，所以上述训练代码未调用 :meth:`~.Module.train` 函数来设置状态。
+在此我们以 :class:`~.module.dropout.Dropout` 为例展示这两个函数的作用：
+.. testcode::
+    import megengine as mge
+    from megengine.module import Dropout
+    dropout = Dropout(drop_prob=0.2) # 创建一个Dropout实例，每个值有0.2的概率置零
+    data = mge.tensor(np.random.randn(10).astype('float32')) # 原始数据
+    print("origin:", data)
+    dropout.train() # 训练时
+    print("\ntrain:", dropout(data))
+    dropout.eval() # 测试时
+    print("\neval:", dropout(data))
+.. testoutput::
+    origin: Tensor([ 0.1939 -0.1846 -1.1319 -0.8897  0.7057  1.3106  1.6901 -0.8686 -0.2685 -0.6046])
+    train: Tensor([ 0.2423 -0.2307 -0. -1.1121  0.8821  1.6383  2.1127 -0. -0.3357 -0.7557])
+    eval: Tensor([ 0.1939 -0.1846 -1.1319 -0.8897  0.7057  1.3106  1.6901 -0.8686 -0.2685 -0.6046])
+从输出可以看到训练时 :class:`~.module.dropout.Dropout` 将原始数据中的20%的值（两个）置0，其余值则乘了1.25（ :math:`\frac{1}{1-0.2}` ）；测试时 :class:`~.module.dropout.Dropout` 未对原始数据进行任何处理。
+测试代码示例
+``````````````````````````````
+在此我们使用 MNIST 测试数据集对训好的网络进行测试。 具体测试代码如下所示，和训练代码相比主要是去掉了优化器的相关代码：
+.. testcode::
+    import megengine as mge
+    # 读取测试数据并进行预处理
+    mnist_test_dataset = MNIST(root="./dataset/MNIST", train=False, download=True)
+    dataloader_test = DataLoader(
+        mnist_test_dataset,
+        transform=Compose([
+            Normalize(mean=0.1307*255, std=0.3081*255),
+            Pad(2),
+            ToMode('CHW'),
+        ]),
+    )
+    le_net.eval() # 设置为测试模式
+    data = mge.tensor()
+    label = mge.tensor(dtype="int32")
+    correct = 0
+    total = 0
+    for idx, (batch_data, batch_label) in enumerate(dataloader_test):
+        data.set_value(batch_data)
+        label.set_value(batch_label)
+        logits = le_net(data)
+        predicted = F.argmax(logits, axis=1)
+        correct += (predicted==label).sum().numpy().item()
+        total += label.shape[0]
+    print("correct: {}, total: {}, accuracy: {}".format(correct, total, float(correct)/total))
+测试输出如下，可以看到经过训练的 ``LeNet`` 在 MNIST 测试数据集上的准确率已经达到98.84%：
+.. testoutput::
+    correct: 9884, total: 10000, accuracy: 0.9884
--- a/source/conf.py
+++ b/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+# -- Project information -----------------------------------------------------
+project = 'MegEngine Documents'
+copyright = '2020, Megvii'
+author = 'Megvii'
+# -- General configuration ---------------------------------------------------
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.todo',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.viewcode',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.doctest',
+    'sphinxcontrib.jupyter',
+    'sphinx_autodoc_typehints',
+    'nbsphinx',
+]
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+locale_dirs = ['locale/']
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = 'en'
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = True
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'pydata_sphinx_theme'
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+# --  Options for jupyter output ---------------------------------------------------
+jupyter_kernels = {
+    'python3': {
+        'kernelspec': {
+            'display_name': 'Python',
+            'language': 'python3',
+            'name': 'python3'
+        },
+        'file_extension': '.py'
+    },
+}
+# -- Options for skipping specific docs --------------------------------------
+skip_blacklist = frozenset(
+    [
+        '__weakref__', '__module__', '__doc__', '__abstractmethods__',
+        '__hyperparam_spec__', '__hyperparam_trans_dict__', '__param_init_spec__'
+    ]
+)
+skip_whitelist = frozenset(['', ''])
+def handle_skip(app, what, name, obj, skip, options):
+    if name.startswith('_abc_') or name in skip_blacklist:
+        return True
+    if name.startswith('__testcase'):
+        return False
+    if (name.startswith('__') and name.endswith('__') and
+            getattr(obj, '__doc__', None)):
+        return False
+    return skip
+# -- Options for doctest -----------------------------------------------------
+doctest_global_setup = '''
+import numpy as np
+np.random.seed(0)
+import megengine as mge
+np.set_printoptions(precision=4)
+'''
+def setup(app):
+    app.connect("autodoc-skip-member", handle_skip)
--- a/source/index.rst
+++ b/source/index.rst
+欢迎使用 MegEngine
+==============================
+.. toctree::
+    :maxdepth: 2
+    :includehidden:
+    :hidden:
+MegEngine 简介
+------------------------------
+MegEngine 是旷视完全自主研发的深度学习框架，中文名为“天元”，是旷视 AI 战略的重要组成部分，负责 AI 三要素（算法，算力，数据）中的“算法”。MegEngine 的研发始于 2014 年，旷视内部全员使用。如今，旷视的所有算法均基于 MegEngine 进行训练和推理。
+MegEngine 是工业级的深度学习框架，架构先进，性能优异，移植性强。MegEngine 强调产品化能力，在此基础上保证研发过程的快捷便利。
+MegEngine 具有几个特点。一是“训练推理一体”。MegEngine 支持多种硬件平台（ CPU，GPU，ARM ）。不同硬件上的推理框架和 MegEngine 的训练框架无缝衔接。部署时无需做额外的模型转换，速度/精度和训练保持一致，有效解决了 AI 落地中“部署环境和训练环境不同，部署难”的问题。
+二是“动静合一”。动态图易调试，静态图好部署。鱼和熊掌如何兼得，是现代深度学习框架的核心诉求。MegEngine 在静态图的基础上，逐渐加入支持完整动态图的功能。在动态模式下加速研发过程，无需改变模型代码一键切换至静态模式下的部署，为科研和算法工程师同时提供便利。
+三是“兼容并包”。MegEngine 的顶层 API 基于 Python，采取了类似于 PyTorch 的风格。简单直接，易于上手，便于现有项目进行移植或整合。为更好地帮助学习实践，MegEngine 同时提供了“开箱即用”的在线深度学习工具 `MegStudio <https://studio.brainpp.com/>`_ ，和汇聚了顶尖算法和模型的预训练模型集合 `Model Hub <https://megengine.org.cn/model-hub/>`_ 。
+四是“灵活高效”。MegEngine 底层的高性能算子库对于不同的硬件架构进行了深度适配和优化，并提供高效的亚线性内存优化策略，对于生产环境繁多的计算设备提供了极致的性能保证。高效易用的分布式训练实现能有效支持富有弹性的大规模训练。
+MegEngine 的上述特点使其成为了最适合工业级研发的框架之一。更多特性还在持续开发中，也欢迎更多的开发者加入。
+学习 MegEngine
+------------------------------
+官方文档分为 :ref:`基础学习 <basic>`  和 :ref:`进阶学习 <advanced>` 两大部分。
+基础部分循序渐进地介绍 MegEngine 中的基本概念和用法，从计算图、张量和算子开始，介绍网络的搭建，数据的加载和处理，网络训练和测试，动态图和静态图。读者只需要了解 Python 就能顺利学习这部分内容。对于有其它深度学习框架（如 PyTorch ）使用经验的读者，学习这部分内容会非常轻松。
+进阶部分介绍了 MegEngine 中各种高级用法和话题，内容相对独立，供有经验的开发者参考。目前包括分布式训练，C++ 环境中的模型部署等。更多的进阶内容后续会陆续补充。
+详细的编程接口说明请参见 :ref:`api` 。
+推荐读者通过在线深度学习工具 `MegStudio <https://studio.brainpp.com/>`_ 进行更为便捷的学习。
+.. _installation:
+安装说明
+------------------------------
+您可以通过包管理器 pip 安装 MegEngine：
+.. code-block:: bash
+    pip3 install megengine -f https://megengine.org.cn/whl/mge.html
+再在 python 中导入 megengine 验证安装成功：
+.. code-block::
+    import megengine as mge
+目前 MegEngine 安装包集成了使用 GPU 运行代码所需的 CUDA 10.1 环境，不区分 CPU 版本和 GPU 版本。如果您想运行 GPU 程序，请保证机器本身配有 NVIDIA 显卡，并且 `驱动 <https://developer.nvidia.com/cuda-toolkit-archive>`_ 版本高于 418.x 。
+对于大部分用户，通过包管理器安装打包完毕的 MegEngine 足够应对所有使用需求了，但是如果需要使用最近更新还未发布的特性，则可能需要从源码编译安装。另外如果对 :ref:`deployment` 有需求或者希望参与到 MegEngine 的核心开发工作中，也需要了解从源码进行安装。详细内容请参考 `README <https://github.com/MegEngine/MegEngine/blob/master/README.md>`_ 。
+.. note::
+    MegEngine 目前仅支持 Linux 环境下安装，以及 Python3.5 及以上的版本（不支持 Python2 ）。
+    对于 Windows 10 用户，可以通过安装 `WSL(Windows Subsystem for Linux) <https://docs.microsoft.com/en-us/windows/wsl/install-win10>`_ 进行体验，但是由于 WSL 还不支持访问 GPU，所以目前只能使用 CPU 运行 MegEngine 程序。
+.. toctree::
+    :hidden:
+    :maxdepth: 2
+    :includehidden:
+    :titlesonly:
+    首页 <self>
+    基础学习 <basic/index>
+    进阶学习 <advanced/index>
+    api
+.. footer::
+    当前版本 |release| @ |today|