提交 8c852541 编写于 作者: 绝不原创的飞龙's avatar 绝不原创的飞龙

2024-02-05 13:18:52

上级 eb48eee8
此差异已折叠。
- en: Broadcasting semantics
id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 广播语义
- en: 原文:[https://pytorch.org/docs/stable/notes/broadcasting.html](https://pytorch.org/docs/stable/notes/broadcasting.html)
id: totrans-1
prefs:
- PREF_BQ
type: TYPE_NORMAL
zh: 原文:[https://pytorch.org/docs/stable/notes/broadcasting.html](https://pytorch.org/docs/stable/notes/broadcasting.html)
- en: Many PyTorch operations support NumPy’s broadcasting semantics. See [https://numpy.org/doc/stable/user/basics.broadcasting.html](https://numpy.org/doc/stable/user/basics.broadcasting.html)
for details.
id: totrans-2
prefs: []
type: TYPE_NORMAL
zh: 许多PyTorch操作支持NumPy的广播语义。有关详细信息,请参阅[https://numpy.org/doc/stable/user/basics.broadcasting.html](https://numpy.org/doc/stable/user/basics.broadcasting.html)。
- en: In short, if a PyTorch operation supports broadcast, then its Tensor arguments
can be automatically expanded to be of equal sizes (without making copies of the
data).
id: totrans-3
prefs: []
type: TYPE_NORMAL
zh: 简而言之,如果PyTorch操作支持广播,则其张量参数可以自动扩展为相等大小(而不会复制数据)。
- en: General semantics
id: totrans-4
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 一般语义
- en: 'Two tensors are “broadcastable” if the following rules hold:'
id: totrans-5
prefs: []
type: TYPE_NORMAL
zh: 如果满足以下规则,则两个张量是“可广播的”:
- en: Each tensor has at least one dimension.
id: totrans-6
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 每个张量至少有一个维度。
- en: When iterating over the dimension sizes, starting at the trailing dimension,
the dimension sizes must either be equal, one of them is 1, or one of them does
not exist.
id: totrans-7
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 在迭代维度大小时,从尾部维度开始,维度大小必须要么相等,要么其中一个为1,要么其中一个不存在。
- en: 'For Example:'
id: totrans-8
prefs: []
type: TYPE_NORMAL
zh: 例如:
- en: '[PRE0]'
id: totrans-9
prefs: []
type: TYPE_PRE
zh: '[PRE0]'
- en: 'If two tensors `x`, `y` are “broadcastable”, the resulting tensor size is calculated
as follows:'
id: totrans-10
prefs: []
type: TYPE_NORMAL
zh: 如果两个张量`x`,`y`是“可广播的”,则结果张量大小计算如下:
- en: If the number of dimensions of `x` and `y` are not equal, prepend 1 to the dimensions
of the tensor with fewer dimensions to make them equal length.
id: totrans-11
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 如果`x`和`y`的维度数不相等,则在较少维度的张量的维度前添加1,使它们长度相等。
- en: Then, for each dimension size, the resulting dimension size is the max of the
sizes of `x` and `y` along that dimension.
id: totrans-12
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 然后,对于每个维度大小,结果维度大小是沿该维度的`x`和`y`的大小的最大值。
- en: 'For Example:'
id: totrans-13
prefs: []
type: TYPE_NORMAL
zh: 例如:
- en: '[PRE1]'
id: totrans-14
prefs: []
type: TYPE_PRE
zh: '[PRE1]'
- en: In-place semantics
id: totrans-15
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 就地语义
- en: One complication is that in-place operations do not allow the in-place tensor
to change shape as a result of the broadcast.
id: totrans-16
prefs: []
type: TYPE_NORMAL
zh: 一个复杂之处在于就地操作不允许就地张量由于广播而改变形状。
- en: 'For Example:'
id: totrans-17
prefs: []
type: TYPE_NORMAL
zh: 例如:
- en: '[PRE2]'
id: totrans-18
prefs: []
type: TYPE_PRE
zh: '[PRE2]'
- en: Backwards compatibility
id: totrans-19
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 向后兼容性
- en: Prior versions of PyTorch allowed certain pointwise functions to execute on
tensors with different shapes, as long as the number of elements in each tensor
was equal. The pointwise operation would then be carried out by viewing each tensor
as 1-dimensional. PyTorch now supports broadcasting and the “1-dimensional” pointwise
behavior is considered deprecated and will generate a Python warning in cases
where tensors are not broadcastable, but have the same number of elements.
id: totrans-20
prefs: []
type: TYPE_NORMAL
zh: PyTorch的先前版本允许某些逐点函数在具有不同形状的张量上执行,只要每个张量中的元素数量相等即可。然后,逐点操作将通过将每个张量视为1维来执行。PyTorch现在支持广播,而“1维”逐点行为被视为已弃用,并且在张量不可广播但具有相同数量的元素的情况下会生成Python警告。
- en: 'Note that the introduction of broadcasting can cause backwards incompatible
changes in the case where two tensors do not have the same shape, but are broadcastable
and have the same number of elements. For Example:'
id: totrans-21
prefs: []
type: TYPE_NORMAL
zh: 请注意,广播的引入可能会导致向后不兼容的更改,即两个张量的形状不相同,但可以广播并且具有相同数量的元素的情况。例如:
- en: '[PRE3]'
id: totrans-22
prefs: []
type: TYPE_PRE
zh: '[PRE3]'
- en: 'would previously produce a Tensor with size: torch.Size([4,1]), but now produces
a Tensor with size: torch.Size([4,4]). In order to help identify cases in your
code where backwards incompatibilities introduced by broadcasting may exist, you
may set torch.utils.backcompat.broadcast_warning.enabled to True, which will generate
a python warning in such cases.'
id: totrans-23
prefs: []
type: TYPE_NORMAL
zh: 以前会产生一个大小为torch.Size([4,1])的张量,但现在会产生一个大小为torch.Size([4,4])的张量。为了帮助识别代码中可能存在的由广播引入的向后不兼容性,您可以将torch.utils.backcompat.broadcast_warning.enabled设置为True,在这种情况下会生成一个Python警告。
- en: 'For Example:'
id: totrans-24
prefs: []
type: TYPE_NORMAL
zh: 例如:
- en: '[PRE4]'
id: totrans-25
prefs: []
type: TYPE_PRE
zh: '[PRE4]'
- en: CPU threading and TorchScript inference
id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: CPU 线程和 TorchScript 推理
- en: 原文:[https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
id: totrans-1
prefs:
- PREF_BQ
type: TYPE_NORMAL
zh: '[https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)'
- en: 'PyTorch allows using multiple CPU threads during TorchScript model inference.
The following figure shows different levels of parallelism one would find in a
typical application:'
id: totrans-2
prefs: []
type: TYPE_NORMAL
zh: PyTorch 允许在 TorchScript 模型推理期间使用多个 CPU 线程。以下图显示了在典型应用程序中可能找到的不同级别的并行性:
- en: '[![../_images/cpu_threading_torchscript_inference.svg](../Images/8df78fa0159321538b2e2a438f6cae52.png)](../_images/cpu_threading_torchscript_inference.svg)'
id: totrans-3
prefs: []
type: TYPE_NORMAL
zh: '[![../_images/cpu_threading_torchscript_inference.svg](../Images/8df78fa0159321538b2e2a438f6cae52.png)](../_images/cpu_threading_torchscript_inference.svg)'
- en: 'One or more inference threads execute a model’s forward pass on the given inputs.
Each inference thread invokes a JIT interpreter that executes the ops of a model
inline, one by one. A model can utilize a `fork` TorchScript primitive to launch
an asynchronous task. Forking several operations at once results in a task that
is executed in parallel. The `fork` operator returns a `Future` object which can
be used to synchronize on later, for example:'
id: totrans-4
prefs: []
type: TYPE_NORMAL
zh: 一个或多个推理线程在给定输入上执行模型的前向传递。每个推理线程调用 JIT 解释器,逐个执行模型的操作。模型可以利用 `fork` TorchScript
原语启动一个异步任务。一次分叉多个操作会导致并行执行的任务。`fork` 操作符返回一个 `Future` 对象,可以稍后用于同步,例如:
- en: '[PRE0]'
id: totrans-5
prefs: []
type: TYPE_PRE
zh: '[PRE0]'
- en: PyTorch uses a single thread pool for the inter-op parallelism, this thread
pool is shared by all inference tasks that are forked within the application process.
id: totrans-6
prefs: []
type: TYPE_NORMAL
zh: PyTorch 使用一个线程池来进行操作间的并行处理,这个线程池被应用程序进程中的所有分叉推理任务共享。
- en: In addition to the inter-op parallelism, PyTorch can also utilize multiple threads
within the ops (intra-op parallelism). This can be useful in many cases, including
element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and
others.
id: totrans-7
prefs: []
type: TYPE_NORMAL
zh: 除了操作间的并行性,PyTorch 还可以利用操作内的多个线程(操作内的并行性)。这在许多情况下都很有用,包括大张量的逐元素操作、卷积、GEMM、嵌入查找等。
- en: Build options
id: totrans-8
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 构建选项
- en: PyTorch uses an internal ATen library to implement ops. In addition to that,
PyTorch can also be built with support of external libraries, such as [MKL](https://software.intel.com/en-us/mkl)
and [MKL-DNN](https://github.com/intel/mkl-dnn), to speed up computations on CPU.
id: totrans-9
prefs: []
type: TYPE_NORMAL
zh: PyTorch 使用内部的 ATen 库来实现操作。除此之外,PyTorch 还可以构建支持外部库,如 [MKL](https://software.intel.com/en-us/mkl)
和 [MKL-DNN](https://github.com/intel/mkl-dnn),以加速 CPU 上的计算。
- en: 'ATen, MKL and MKL-DNN support intra-op parallelism and depend on the following
parallelization libraries to implement it:'
id: totrans-10
prefs: []
type: TYPE_NORMAL
zh: ATen、MKL 和 MKL-DNN 支持操作内的并行性,并依赖以下并行化库来实现:
- en: '[OpenMP](https://www.openmp.org/) - a standard (and a library, usually shipped
with a compiler), widely used in external libraries;'
id: totrans-11
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[OpenMP](https://www.openmp.org/) - 一个标准(通常随编译器一起提供的库),在外部库中被广泛使用;'
- en: '[TBB](https://github.com/intel/tbb) - a newer parallelization library optimized
for task-based parallelism and concurrent environments.'
id: totrans-12
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[TBB](https://github.com/intel/tbb) - 一个针对任务并行性和并发环境进行了优化的较新的并行化库。'
- en: OpenMP historically has been used by a large number of libraries. It is known
for a relative ease of use and support for loop-based parallelism and other primitives.
id: totrans-13
prefs: []
type: TYPE_NORMAL
zh: OpenMP 历史上被许多库使用。它以相对易用和支持基于循环的并行性和其他原语而闻名。
- en: TBB is used to a lesser extent in external libraries, but, at the same time,
is optimized for the concurrent environments. PyTorch’s TBB backend guarantees
that there’s a separate, single, per-process intra-op thread pool used by all
of the ops running in the application.
id: totrans-14
prefs: []
type: TYPE_NORMAL
zh: TBB 在外部库中使用较少,但同时也针对并发环境进行了优化。PyTorch 的 TBB 后端保证应用程序中所有运行的操作都使用一个单独的、每个进程的手术过程线程池。
- en: Depending of the use case, one might find one or another parallelization library
a better choice in their application.
id: totrans-15
prefs: []
type: TYPE_NORMAL
zh: 根据使用情况,一个人可能会发现在他们的应用程序中选择一个或另一个并行化库更好。
- en: 'PyTorch allows selecting of the parallelization backend used by ATen and other
libraries at the build time with the following build options:'
id: totrans-16
prefs: []
type: TYPE_NORMAL
zh: PyTorch 允许在构建时选择 ATen 和其他库使用的并行化后端,具体的构建选项如下:
- en: '| Library | Build Option | Values | Notes |'
id: totrans-17
prefs: []
type: TYPE_TB
zh: '| | 构建选项 | | 备注 |'
- en: '| --- | --- | --- | --- |'
id: totrans-18
prefs: []
type: TYPE_TB
zh: '| --- | --- | --- | --- |'
- en: '| ATen | `ATEN_THREADING` | `OMP` (default), `TBB` | |'
id: totrans-19
prefs: []
type: TYPE_TB
zh: '| ATen | `ATEN_THREADING` | `OMP`(默认),`TBB` | |'
- en: '| MKL | `MKL_THREADING` | (same) | To enable MKL use `BLAS=MKL` |'
id: totrans-20
prefs: []
type: TYPE_TB
zh: '| MKL | `MKL_THREADING` | (相同) | 要启用 MKL,请使用 `BLAS=MKL` |'
- en: '| MKL-DNN | `MKLDNN_CPU_RUNTIME` | (same) | To enable MKL-DNN use `USE_MKLDNN=1`
|'
id: totrans-21
prefs: []
type: TYPE_TB
zh: '| MKL-DNN | `MKLDNN_CPU_RUNTIME` | (相同) | 要启用 MKL-DNN,请使用 `USE_MKLDNN=1` |'
- en: It is recommended not to mix OpenMP and TBB within one build.
id: totrans-22
prefs: []
type: TYPE_NORMAL
zh: 建议不要在一个构建中混合使用 OpenMP 和 TBB。
- en: 'Any of the `TBB` values above require `USE_TBB=1` build setting (default: OFF).
A separate setting `USE_OPENMP=1` (default: ON) is required for OpenMP parallelism.'
id: totrans-23
prefs: []
type: TYPE_NORMAL
zh: 上述任何 `TBB` 值都需要 `USE_TBB=1` 构建设置(默认为 OFF)。OpenMP 并行性需要单独设置 `USE_OPENMP=1`(默认为
ON)。
- en: Runtime API
id: totrans-24
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 运行时 API
- en: 'The following API is used to control thread settings:'
id: totrans-25
prefs: []
type: TYPE_NORMAL
zh: 以下 API 用于控制线程设置:
- en: '| Type of parallelism | Settings | Notes |'
id: totrans-26
prefs: []
type: TYPE_TB
zh: '| 并行性类型 | 设置 | 备注 |'
- en: '| --- | --- | --- |'
id: totrans-27
prefs: []
type: TYPE_TB
zh: '| --- | --- | --- |'
- en: '| Inter-op parallelism | `at::set_num_interop_threads`, `at::get_num_interop_threads`
(C++)`set_num_interop_threads`, `get_num_interop_threads` (Python, [`torch`](../torch.html#module-torch
"torch") module) | Default number of threads: number of CPU cores. |'
id: totrans-28
prefs: []
type: TYPE_TB
zh: '| 操作间的并行性 | `at::set_num_interop_threads`,`at::get_num_interop_threads`(C++)`set_num_interop_threads`,`get_num_interop_threads`(Python,[`torch`](../torch.html#module-torch
"torch") 模块) | 默认线程数:CPU 核心数。 |'
- en: '| Intra-op parallelism | `at::set_num_threads`, `at::get_num_threads` (C++)
`set_num_threads`, `get_num_threads` (Python, [`torch`](../torch.html#module-torch
"torch") module)Environment variables: `OMP_NUM_THREADS` and `MKL_NUM_THREADS`
|'
id: totrans-29
prefs: []
type: TYPE_TB
zh: '| 手术过程中的并行性 | `at::set_num_threads`,`at::get_num_threads`(C++)`set_num_threads`,`get_num_threads`(Python,[`torch`](../torch.html#module-torch
"torch") 模块)环境变量:`OMP_NUM_THREADS` `MKL_NUM_THREADS` |'
- en: For the intra-op parallelism settings, `at::set_num_threads`, `torch.set_num_threads`
always take precedence over environment variables, `MKL_NUM_THREADS` variable
takes precedence over `OMP_NUM_THREADS`.
id: totrans-30
prefs: []
type: TYPE_NORMAL
- en: Tuning the number of threads[](#tuning-the-number-of-threads "Permalink to
this heading")
zh: 对于内部操作并行设置,`at::set_num_threads`,`torch.set_num_threads`始终优先于环境变量,`MKL_NUM_THREADS`变量优先于`OMP_NUM_THREADS`。
- en: Tuning the number of threads[](#tuning-the-number-of-threads "Permalink to this
heading")
id: totrans-31
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 调整线程数量[](#tuning-the-number-of-threads“跳转到此标题”)
- en: 'The following simple script shows how a runtime of matrix multiplication changes
with the number of threads:'
id: totrans-32
prefs: []
type: TYPE_NORMAL
zh: 以下简单脚本显示了矩阵乘法的运行时如何随线程数量变化而变化:
- en: '[PRE1]'
id: totrans-33
prefs: []
type: TYPE_PRE
zh: '[PRE1]'
- en: 'Running the script on a system with 24 physical CPU cores (Xeon E5-2680, MKL
and OpenMP based build) results in the following runtimes:'
id: totrans-34
prefs: []
type: TYPE_NORMAL
zh: 在具有24个物理CPU核心的系统(基于Xeon E5-2680、MKL和OpenMP构建)上运行脚本会产生以下运行时间:
- en: '[![../_images/cpu_threading_runtimes.svg](../Images/50cb089741be0ac4482f410e4d719b4b.png)](../_images/cpu_threading_runtimes.svg)'
id: totrans-35
prefs: []
type: TYPE_NORMAL
zh: '[![../_images/cpu_threading_runtimes.svg](../Images/50cb089741be0ac4482f410e4d719b4b.png)](../_images/cpu_threading_runtimes.svg)'
- en: 'The following considerations should be taken into account when tuning the number
of intra- and inter-op threads:'
id: totrans-36
prefs: []
type: TYPE_NORMAL
zh: 调整内部和外部操作线程数量时应考虑以下因素:
- en: When choosing the number of threads one needs to avoid oversubscription (using
too many threads, leads to performance degradation). For example, in an application
that uses a large application thread pool or heavily relies on inter-op parallelism,
one might find disabling intra-op parallelism as a possible option (i.e. by calling
`set_num_threads(1)`);
id: totrans-37
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 在选择线程数量时,需要避免过度订阅(使用太多线程会导致性能下降)。例如,在使用大型应用程序线程池或严重依赖于内部操作并行性的应用程序中,可以考虑禁用内部操作并行性(即通过调用`set_num_threads(1)`);
- en: In a typical application one might encounter a trade off between latency (time
spent on processing an inference request) and throughput (amount of work done
per unit of time). Tuning the number of threads can be a useful tool to adjust
......@@ -167,31 +248,45 @@
as fast as possible. At the same time, parallel implementations of ops may add
an extra overhead that increases amount work done per single request and thus
reduces the overall throughput.
id: totrans-38
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 在典型应用程序中,可能会在延迟(用于处理推理请求的时间)和吞吐量(单位时间内完成的工作量)之间进行权衡。调整线程数量可以是调整这种权衡的有用工具。例如,在对延迟敏感的应用程序中,可能希望增加内部操作线程的数量,以尽可能快地处理每个请求。同时,操作的并行实现可能会增加额外的开销,增加单个请求的工作量,从而降低整体吞吐量。
- en: Warning
id: totrans-39
prefs: []
type: TYPE_NORMAL
zh: 警告
- en: OpenMP does not guarantee that a single per-process intra-op thread pool is
going to be used in the application. On the contrary, two different application
or inter-op threads may use different OpenMP thread pools for intra-op work. This
might result in a large number of threads used by the application. Extra care
in tuning the number of threads is needed to avoid oversubscription in multi-threaded
applications in OpenMP case.
id: totrans-40
prefs: []
type: TYPE_NORMAL
zh: OpenMP不能保证应用程序将使用单个进程内部操作线程池。相反,两个不同的应用程序或内部操作线程可能会使用不同的OpenMP线程池进行内部操作工作。这可能导致应用程序使用大量线程。在OpenMP情况下,需要特别注意调整线程数量,以避免多线程应用程序中的过度订阅。
- en: Note
id: totrans-41
prefs: []
type: TYPE_NORMAL
zh: 注意
- en: Pre-built PyTorch releases are compiled with OpenMP support.
id: totrans-42
prefs: []
type: TYPE_NORMAL
zh: 预编译的PyTorch版本已编译为支持OpenMP。
- en: Note
id: totrans-43
prefs: []
type: TYPE_NORMAL
zh: 注意
- en: '`parallel_info` utility prints information about thread settings and can be
used for debugging. Similar output can be also obtained in Python with `torch.__config__.parallel_info()`
call.'
id: totrans-44
prefs: []
type: TYPE_NORMAL
zh: '`parallel_info`实用程序打印有关线程设置的信息,可用于调试。在Python中也可以通过`torch.__config__.parallel_info()`调用获得类似的输出。'
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册