1. 24 7月, 2021 40 次提交
    • T
      [1/n] Update testing lib*.so path (#61960) · 8152433d
      tktrungna 提交于
      Summary:
      ### Issue
      
      Build PyTorch wheel packages during build stage for pull requests and install during test stage.
      
      ### Fix
      Update all tests which call lib*.so (under `./build folder`), change to call lib*.so in `{ent}/pytorch/lib/python3.8/site-packages/torch`
      
      ### Diff
      This diff starts to update test_fx, test_backend and test_torchbind first to check if current ci pass
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61960
      
      Test Plan: check of all ci workflows pass
      
      Reviewed By: malfet, saketh-are
      
      Differential Revision: D29823235
      
      Pulled By: tktrungna
      
      fbshipit-source-id: e7f652def698e303d4843fbaedf4859f5eca2fd9
      8152433d
    • N
      fix a typo (#61061) · 956f1c98
      Nikolay Korovaiko 提交于
      Summary:
      Fixes #{issue number}
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61061
      
      Reviewed By: navahgar, Gamrix
      
      Differential Revision: D29495806
      
      Pulled By: Krovatkin
      
      fbshipit-source-id: 510de724e3108c52af1b25b8ab53ae3c895b55f9
      956f1c98
    • R
      Modernize override (#61744) · ee44d73e
      Richard Barnes 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744
      
      Test Plan: Sandcastle
      
      Reviewed By: malfet
      
      Differential Revision: D29717320
      
      fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc
      ee44d73e
    • S
      [fx2trt] Add support for explicit batch dimension (#62110) · d2e03dc4
      Shiyan Deng 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62110
      
      Add an option to opt in explicit batch dimension. Extend unit tests to test both scenario (implicit and explicit). Fixed some converters that doesn't work with explicit batch dimension before.
      
      Add broadcast support and a generic function for adding elementwise binary ops.
      
      Follow ups:
      1. Adding the dynamic shape support in explicit batch dimension mode to allow different batch dimension at least.
      2. Extend layer_norm plugin `PluginV2Ext` to make it work in explicit batch dimension.
      
      Test Plan: unit tests
      
      Reviewed By: jackm321
      
      Differential Revision: D29798239
      
      fbshipit-source-id: 91d47c6155d2473ed4a6f8d2816715a32c61b869
      d2e03dc4
    • J
      [bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687) · cc263ef7
      Jerry Zhang 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687
      
      Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
      But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
      node.
      
      Model:
      ```
      class M(torch.nn.Module):
          def __init__(self):
              super().__init__()
              self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)
      
          def forward(self, x):
              x = self.maxpool2d(x)
              return x
      ```
      result of prepare:
      
      Before:
      def forward(self, x):
          x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
          maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
          return maxpool2d
      
      After:
      def forward(self, x):
          x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
          maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
          maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d);  maxpool2d = None
          return maxpool2d_activation_post_process_0
      
      Test Plan: Imported from OSS
      
      Reviewed By: vkuzo
      
      Differential Revision: D29715566
      
      fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
      cc263ef7
    • H
      [Static Runtime] Remove wrappers for aten::cat (#62067) · 78f7d8cc
      Hao Lu 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067
      
      The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (https://github.com/pytorch/pytorch/commit/ae58a4c45de4cb16949934b33a93e79d3aca4350) .
      Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1.
      
      Reviewed By: navahgar, mikeiovine
      
      Differential Revision: D29864600
      
      fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2
      78f7d8cc
    • Z
      [torch deploy] add support for Python C extension modules (#58117) · 7c09de83
      Zachary DeVito 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/58117
      
      Previously it was not possible to load C extension modules with deploy because extension
      modules need to link against the Python.h API functions. Since
      each libtorchdeploy_interpreter.so had its own copy of these functions, it is not possible
      to tell dlopen to resolve symbols in a loaded SO from one of these libraries without exposing
      its symbols globally.
      
      This patch adds a custom ELF loader which does the custom loading of attaching c extension libraries
      to the Python API that loaded the shared library. Simple use of numpy and regex modules appears to work.
      
      This diff has some limitations:
      
      * 64-bit Linux only. OSX and windows use different formats for shared libraries. 32-bit ELF files are not supported.
      * debug info is not immediately availiable to debuggers. A script for lldb is provided which can be loaded
      so that lldb knows about the libraries as they are loaded.
      * shared libraries can directly use the Python API, but libraries they depend on
        (via DT_NEEDED entries in their dynamic segment) may not use Python. In the future, we can
        try to detect whether a sub library uses the Python API and load it with our customer loader.
      * TLS initialization and library initialization may occur in a different order than what would happen with dlopen,
        potentially leading to some issues running destructors in TLS segments. Use of this C++ features is relatively rare.
      
      Test Plan: Imported from OSS
      
      Reviewed By: suo
      
      Differential Revision: D28435305
      
      Pulled By: zdevito
      
      fbshipit-source-id: 10f046053dd1d250e3c73f2cce8eb945eeba31b6
      7c09de83
    • Y
      [Model Averaging] Refactor averagers to accept parameters instead of a module (#62105) · e856a452
      Yi Wang 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62105
      
      This is for the preparation of wrapping the averager as an optimizer, which can only accept parameters rather than a module.
      
      Proposal: https://github.com/pytorch/pytorch/issues/59699
      ghstack-source-id: 134213572
      
      Test Plan:
      buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager
      
      buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters
      
      Reviewed By: rohan-varma
      
      Differential Revision: D29883693
      
      fbshipit-source-id: 474ba924a0b05068b12f163fb74582bccf314964
      e856a452
    • I
      [profiler][refactor] Avoid using legacy event in profiler (#61721) · 41f7a9da
      Ilia Cherniavskii 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61721
      
      Remove dependency on LegacyEvent from the profiler
      
      Test Plan:
      python test/test_profiler.py -v
      
      Imported from OSS
      
      Reviewed By: kimishpatel, gdankel
      
      Differential Revision: D29716769
      
      fbshipit-source-id: 2c2b48f2ee096adcbde09821e0cc7c0fcb94d19f
      41f7a9da
    • I
      [android] Lite interpreter module to load from assets (#61609) · 06a3b239
      Ivan Kobzarev 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61609
      
      Test Plan: Imported from OSS
      
      Reviewed By: cccclai
      
      Differential Revision: D29688641
      
      Pulled By: IvanKobzarev
      
      fbshipit-source-id: 7857bad51e91eae7c90a1218d463f3767f4fae15
      06a3b239
    • H
      [nnc] Rename IRSimplifierBase with PolynomialBase (#60686) · 643e5846
      Hui Guo 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60686
      
      Test Plan: Imported from OSS
      
      Reviewed By: navahgar, soulitzer
      
      Differential Revision: D29373316
      
      Pulled By: huiguoo
      
      fbshipit-source-id: bd44bff60455076d1c5291273989e9939a428f9a
      643e5846
    • A
      [6/N] Nnapi Backend Delegate: Comprehensive OSS Tests (#61782) · 046272f3
      Amy He 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61782
      
      This PR depends on https://github.com/pytorch/pytorch/pull/61787
      
      ### Summary:
      Added more comprehensive tests for Android NNAPI delegate.
      Previously, there was only one basic test for lowering a PReLU module with the NNAPI delegate. Now, more tests are inherited from `test_nnapi.py`, the file for testing NNAPI conversion and execution without the delegate.
      
      **test_backend_nnapi.py**
      Test file for Android NNAPI delegate.
      - `TestNnapiBackend` class inherits tests from `test_nnapi.py` and overrides the model conversion to use the delegate API.
      - Includes an extra test for passing input arguments as Tensors and Tensor Lists.
      - Has extra set up for loading the NNAPI delegate library and changing the default dtype from float64 to float32 (dtype is typically float32 by default, but not in delegate backend unit tests)
      
      **test_nnapi.py**
      Test file for Android NNAPI without the delegate.
      - Some code was refactored to allow override of only the NNAPI conversion call.
      - An extra function was added to allow the NNAPI delegate unit test to turn off the model execution step. Once the NNAPI delegate's execution implementation is complete, this may no longer be necessary.
      
      ### Test Plan:
      I ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` to run both test files.
      
      Test Plan: Imported from OSS
      
      Reviewed By: raziel, iseeyuan
      
      Differential Revision: D29772005
      
      fbshipit-source-id: 5d14067a4f6081835699b87a2ece5bd6bed00c6b
      046272f3
    • T
      ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461) · f03e7170
      Thomas J. Fan 提交于
      Summary:
      Towards https://github.com/pytorch/pytorch/issues/60585
      
      This PR does not use `check_sum_reduction` because I wanted to test every reduction option.
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461
      
      Reviewed By: suo
      
      Differential Revision: D29883744
      
      Pulled By: jbschlosser
      
      fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07
      f03e7170
    • T
      ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984) · 1ec6205b
      Thomas J. Fan 提交于
      Summary:
      Towards https://github.com/pytorch/pytorch/issues/60585
      
      (Interesting how the maxpool tests are currently in `test/test_nn.py`)
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984
      
      Reviewed By: suo
      
      Differential Revision: D29883846
      
      Pulled By: jbschlosser
      
      fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8
      1ec6205b
    • J
      Fix type promotion for cosine_similarity() (#62054) · f4ffaf0c
      Joel Schlosser 提交于
      Summary:
      Fixes https://github.com/pytorch/pytorch/issues/61454
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054
      
      Reviewed By: suo
      
      Differential Revision: D29881755
      
      Pulled By: jbschlosser
      
      fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6
      f4ffaf0c
    • J
      Improve MHA docs (#61977) · e408af08
      Joel Schlosser 提交于
      Summary:
      Fixes https://github.com/pytorch/pytorch/issues/60831
      Also clarifies the relationship between `embed_dim` and `num_heads` (see https://github.com/pytorch/pytorch/issues/60853 and https://github.com/pytorch/pytorch/issues/60445).
      Formatting was overhauled to remove some redundancy between the input docs and shape docs; suggestions / comments welcome!
      
      Link to rendered docs here: https://14912919-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.MultiheadAttention.html
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61977
      
      Reviewed By: bhosmer
      
      Differential Revision: D29876884
      
      Pulled By: jbschlosser
      
      fbshipit-source-id: a3e82083219cc4f8245c021d309ad9d92bf39196
      e408af08
    • H
      [Static Runtime] Add is_frozen to StaticModule ctor (#62020) · cf3cc01f
      Hao Lu 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62020
      
      Add is_frozen to StaticModule ctor so we can skip freezing in StaticModule.
      
      Reviewed By: ajyu, mikeiovine
      
      Differential Revision: D29807431
      
      fbshipit-source-id: 7742e9f5c5ae9f442a9e4007c870a14fd8b4af20
      cf3cc01f
    • E
      [clang-tidy] Fix unknown GNU flag error (#62128) · fa11103c
      Elton Leander Pinto 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62128
      
      Test Plan: Imported from OSS
      
      Reviewed By: driazati
      
      Differential Revision: D29888297
      
      Pulled By: 1ntEgr8
      
      fbshipit-source-id: 0657d5baa72c014a83c9def4a39338c52f4ef8d1
      fa11103c
    • T
      MAINT Migrates multilabel_margin_loss from THC to ATen (CUDA) (#60708) · 9730d91a
      Thomas J. Fan 提交于
      Summary:
      Fixes https://github.com/pytorch/pytorch/issues/24603
      Fixes https://github.com/pytorch/pytorch/issues/24602
      
      <s>The implementation should be exactly the same, so it is strange that the benchmarks show such a significant improvement in this PR.</s>
      
      The benchmarks are now the same.
      
      <details>
       <summary>Benchmark script</summary>
      
      ```python
      from itertools import product
      
      import torch
      import torch.nn as nn
      import torch.nn.functional as F
      import time
      
      torch.manual_seed(0)
      MS_PER_SECOND = 1000
      
      def _time():
          torch.cuda.synchronize()
          return time.perf_counter() * MS_PER_SECOND
      
      device = "cuda"
      C = 30
      n_runs = 100
      reductions = ["none", "sum", "mean"]
      Ns = [1_000, 10_000, 100_000]
      
      for reduction, N in product(reductions, Ns):
          total_fwd_time = 0
          total_back_time = 0
          grad_out = torch.randn(N, device=device)
          if reduction != "none":
              grad_out = grad_out[0]
      
          for _ in range(n_runs):
              input = torch.randn(N, C, device=device, requires_grad=True)
              target = torch.randint(0, C, size=input.size(), device=device)
      
              # forward
              start = _time()
              result = F.multilabel_margin_loss(input, target, reduction=reduction)
              total_fwd_time += _time() - start
      
          result = F.multilabel_margin_loss(input, target, reduction=reduction)
          for _ in range(n_runs):
              # backward
              start = _time()
              result.backward(grad_out, retain_graph=True)
              total_back_time += _time() - start
      
          fwd_avg = total_fwd_time / n_runs
          bwd_avg = total_back_time / n_runs
          print(
              f"input size({N}, {C}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)"
          )
      ```
      
      </details>
      
      ## master
      
      ```
      input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.41 (ms)
      input size(10000, 30), reduction: none, fwd: 1.26 (ms), back: 3.58 (ms)
      input size(100000, 30), reduction: none, fwd: 13.15 (ms), back: 34.68 (ms)
      input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.38 (ms)
      input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 3.53 (ms)
      input size(100000, 30), reduction: sum, fwd: 13.04 (ms), back: 34.53 (ms)
      input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.38 (ms)
      input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 3.52 (ms)
      input size(100000, 30), reduction: mean, fwd: 13.12 (ms), back: 34.54 (ms)
      ```
      
      ## this PR
      
      ```
      input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.35 (ms)
      input size(10000, 30), reduction: none, fwd: 1.22 (ms), back: 2.98 (ms)
      input size(100000, 30), reduction: none, fwd: 12.90 (ms), back: 29.32 (ms)
      input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.32 (ms)
      input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 2.97 (ms)
      input size(100000, 30), reduction: sum, fwd: 13.00 (ms), back: 29.17 (ms)
      input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.32 (ms)
      input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 2.97 (ms)
      input size(100000, 30), reduction: mean, fwd: 13.09 (ms), back: 28.91 (ms)
      ```
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/60708
      
      Reviewed By: saketh-are
      
      Differential Revision: D29856579
      
      Pulled By: ngimel
      
      fbshipit-source-id: b6bbf27a71e5a04f61779f6fef4ed1c98baa2607
      9730d91a
    • I
      [profiler] Nvtx support (#61634) · a6c6fd92
      Ilia Cherniavskii 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61634
      
      Legacy profiler supported Nvtx and that was used by emit_nvtx, this PR
      adds support for Nvtx in the new compiler, to prepare for the eventual
      deprecation of the legacy profiler
      
      Test Plan:
      Verified that the profiles produced with nvprof are the same
      ```
      import torch
      import torchvision.models as models
      from torch.autograd.profiler import emit_nvtx, load_nvprof
      
      model = models.resnet18().cuda()
      inputs = torch.randn(5, 3, 224, 224).cuda()
      
      with emit_nvtx(record_shapes=True):
        model(inputs)
      ```
      /usr/local/cuda/bin/nvprof  -o test_trace2.prof -f  -- python test_emit_nvtx.py
      ```
      evt = load_nvprof("/home/iliacher/local/pytorch/test_trace.prof")
      ```
      
      Imported from OSS
      
      Reviewed By: kimishpatel, gdankel
      
      Differential Revision: D29691316
      
      fbshipit-source-id: 1e186cc072368f3e3987a2da0bfd90ed328817c5
      a6c6fd92
    • J
      Smart Decay for Adam - DPER3 (#62058) · 812bc1dd
      Jamie King 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058
      
      This is the second diff in this stack.  This diff includes the changes to DPER3; the first diff includes the changes to Caffe2.
      
      We want to decay learning parameters properly.  Previously this was not done when a parameter is absent from a minibatch.  We fix this by keeping track of missed minibatches and making decay catch up accordingly.
      
      The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch.  Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
      
      To avoid the computational overhead of touching every parameter for every minibatch, we:
      * keep track of the last time a parameter is seen
      * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.
      
      We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam.
      
      Differential Revision: D29638897
      
      fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c
      812bc1dd
    • Y
      Implement NumPy-like `frombuffer` tensor constructor. (#59077) · 5224490a
      Yukio Siraichi 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077
      
      Fixes #58549
      
      `from_buffer` constructs a tensor object from an already allocated buffer through
      CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
      this function also accepts:
      
      - `device`: where the buffer lives
      - `requires_grad`: should autograd record operations on the new tensor
      
      A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
      implemented. That's because neither PyTorch nor Numba implements CPython's buffer
      protocol. Therefore, there's no way to create a CUDA buffer with the existing
      dependencies (could use PyCUDA for that, though).
      
      At the moment, if `device` differs from the device the buffer actually lives, two things
      may happen:
      
      - `RuntimeError`, if `device='cuda'`
      - Segmentation fault (not tested -- see above), if `device='cpu'`
      
      Test Plan: Imported from OSS
      
      Reviewed By: jbschlosser
      
      Differential Revision: D29870914
      
      Pulled By: mruberry
      
      fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
      5224490a
    • M
      [Static Runtime] Fix broken test_static_runtime build (#62098) · ec4e6181
      Mike Iovine 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62098
      
      The build was broken by D29821533 (https://github.com/pytorch/pytorch/commit/1d2ea76afb4f9ac40c43555da2f3d94dd3549136). The `clamp` overloads used in `deep_wide.h`
      are no longer available in the `at::native` namespace.
      
      Use `at::cpu::clamp` and `at::clamp::clip_out` (which should be an alias for
      clamp) instead.
      
      Reviewed By: hlu1
      
      Differential Revision: D29880187
      
      fbshipit-source-id: 210b6d2be8a8142e7af1a0ba07e55a95b1a77d25
      ec4e6181
    • J
      [skip ci] Refactor CIFlow init logic (#62102) · b820493c
      Jane Xu 提交于
      Summary:
      This PR refactors the CIWorkflow post_init step to best account for how CIFlow interacts with everything.
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62102
      
      Test Plan: This PR did NOT garner any workflow changes. I ran mypy and flake8 on the changed file locally with no issues.
      
      Reviewed By: jbschlosser
      
      Differential Revision: D29883275
      
      Pulled By: janeyx99
      
      fbshipit-source-id: 6c5c1fc1878159e0de1bf8d9bd0cb32aa47af49a
      b820493c
    • Y
      Remove redundant `torch.cuda.set_device(self.rank)` (#62097) · 71cfbc45
      Yi Wang 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62097
      
      as title
      ghstack-source-id: 134196740
      
      Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_profiling_autograd_profiler
      
      Reviewed By: rohan-varma
      
      Differential Revision: D29880040
      
      fbshipit-source-id: 6a06fb2d87e9a7dfa1d7c81bf0c3fe115c1a1abb
      71cfbc45
    • P
      Remove duplicated movedim implementation (#61939) · 5ef667a8
      Peter Bell 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61939
      
      Test Plan: Imported from OSS
      
      Reviewed By: saketh-are
      
      Differential Revision: D29850798
      
      Pulled By: zou3519
      
      fbshipit-source-id: e803b235d8535a204515ff9f5d46b8c4d191b73c
      5ef667a8
    • P
      remove `randn?` from `torch.testing` namespace (#61840) · 10ccc5a8
      Philip Meier 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61840
      
      Redo of #60859.
      
      Test Plan: Imported from OSS
      
      Reviewed By: jbschlosser
      
      Differential Revision: D29871017
      
      Pulled By: mruberry
      
      fbshipit-source-id: 47afed1dc6aa0bb1e826af616ef5d5aaabb8e5bb
      10ccc5a8
    • K
      OpInfo Ref: fmod, remainder (#61527) · cb47d1f9
      Kushashwa Ravi Shrimali 提交于
      Summary:
      See https://github.com/pytorch/pytorch/issues/54261 for OpInfo tracker.
      
      This PR:
      
      * [x] Adds references to both `fmod` and `remainder` for testing.
      * [x] Updates `remainder` documentation to add a note on divergence with `std::remainder`. (something similar to NumPy's note: https://numpy.org/doc/1.20/reference/generated/numpy.remainder.html), see: https://github.com/pytorch/pytorch/pull/61527#discussion_r670238788 for further discussion.
      
      cc: mruberry
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61527
      
      Reviewed By: albanD
      
      Differential Revision: D29841266
      
      Pulled By: mruberry
      
      fbshipit-source-id: be99851a94f53ea2fc07b64fd7c947775129658c
      cb47d1f9
    • B
      don't allow alias dispatch keys to go in the DispatchKeySet (#61771) · c9b71549
      Brian Hirsh 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61771
      
      Test Plan: Imported from OSS
      
      Reviewed By: asuhan
      
      Differential Revision: D29736432
      
      Pulled By: bdhirsh
      
      fbshipit-source-id: 54bb716db1e41565b00f4f01ea0096f834087577
      c9b71549
    • A
      Throw RuntimeError when numpy() is called on a tensor with conjugate or negative bit set (#61925) · 143ef016
      anjali411 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61925
      
      Resolves https://github.com/pytorch/pytorch/issues/59945 and https://github.com/pytorch/pytorch/issues/59946
      
      bc breaking note: Unlike before, complex_tensor.conj().numpy(),  complex_float_tensor.conj().view(torch.float64), complex_float_tensor.conj().imag.view(torch.int32) now doesn't return a view but instead errors out
      
      Test Plan: Imported from OSS
      
      Reviewed By: albanD
      
      Differential Revision: D29819288
      
      Pulled By: anjali411
      
      fbshipit-source-id: 4bebec721eb535f44ef4b728bdc75fa444e05d16
      143ef016
    • K
      [special] alias for mvlgamma (#61633) · 943ca5f6
      kshitij12345 提交于
      Summary:
      Reference: https://github.com/pytorch/pytorch/issues/50345
      
      Have added `out` variant for consistency.
      
      TODO:
      * [x] Check docs https://docs-preview.pytorch.org/61633/special.html#torch.special.multigammaln
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61633
      
      Reviewed By: albanD
      
      Differential Revision: D29815514
      
      Pulled By: mruberry
      
      fbshipit-source-id: 003c7b6a5938ecc7a96727310e8a39da0b3d7aca
      943ca5f6
    • A
      [torchelastic] Improve process termination logic (#61602) · 0c55f1bd
      Aliaksandr Ivanou 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61602
      
      The diff introduces signal handlers and SignalException that is raised when the agent process receives SIGTERM or SIGINT.
      
      When any of these signals received, the termination handler will raise the `SignalException`. The exception will then be processed by the main agent loop. The `shutdown(signum)` will be invoked, that would propagate the received signal to the child processes. The default 30 seconds timeout introduced: if child processes will not be able gracefully terminate during this timeout, the agent process would kill the processes via SIGKILL.
      
      Test Plan: unittests, sandcastle
      
      Reviewed By: cbalioglu
      
      Differential Revision: D29671783
      
      fbshipit-source-id: 3dbca2125676dc18d417cc3e3bb0301fdd42737a
      0c55f1bd
    • E
      Remove default arguments before calling to __torch_dispatch__ (#61123) · e42360d5
      Edward Yang 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61123
      
      This applies the design pattern of removing explicit arguments when they
      coincide with the default arguments.  This simplifies argument patterns
      that dispatch kernels receive and make it easier for us to maintain BC
      (as addition of a new default argument isn't immediately BC-breaking
      for dispatch implementors).
      
      There is an important extra API which I haven't implemented here yet,
      which is to take an incomplete sequence of arguments and fill out their
      defaults (in case the user did want normalization).  I plan on adding
      that in a future PR.
      Signed-off-by: NEdward Z. Yang <ezyang@fb.com>
      
      Test Plan: Imported from OSS
      
      Reviewed By: saketh-are
      
      Differential Revision: D29853616
      
      Pulled By: ezyang
      
      fbshipit-source-id: 71c672cb3a7d4d01f838a1c7fcdb75a8ce7d058e
      e42360d5
    • C
      Support for reference convert_fx working on gpu · 32d0c3e8
      Charles David Hernandez 提交于
      Summary:
      This PR enables gpu only quantization, best used with is_reference since
      there are not many gpu kernels for ops as of now.
      
      This PR mainly changes how qconfigs and their obs constructors operate once they
      on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
      qconfig, and configures them so that when invoked, the created obs will
      be on whatever device the module occupies. (Once observers are created,
      module.to(device) is already setup so that it moves any observers). To do this,
      a new method and a few small chanegs were added to the _PartialWrapper class that
      our observers already use to create constructors (without changing the
      existing functionality). These changes work in
      concert with changes to the prepare flow such that when the qconfigs are
      propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.
      
      Ideally this would work on other models but the is_reference support for
      a lot of modules isn't there yet, those tests should be added in a
      future PR
      
      Test Plan:
      python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic
      
      python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert
      
      python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert
      
      python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence
      
      Reviewed By: vkuzo
      
      Differential Revision: D29684114
      
      fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
      32d0c3e8
    • P
      BatchNorm: fix mixed precision usage with affine=False (#61962) · 0df1679e
      Peter Bell 提交于
      Summary:
      Fixes https://github.com/pytorch/pytorch/issues/61924
      
      The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well.
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962
      
      Reviewed By: albanD
      
      Differential Revision: D29825516
      
      Pulled By: ngimel
      
      fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda
      0df1679e
    • J
      Ignore LNK4099 for debug binary libtorch builds (#62060) · e318058f
      Jane Xu 提交于
      Summary:
      Fixes https://github.com/pytorch/pytorch/issues/61979
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060
      
      Test Plan:
      This CI shouldn't break
      and https://github.com/pytorch/pytorch/pull/62061
      
      Reviewed By: driazati
      
      Differential Revision: D29877487
      
      Pulled By: janeyx99
      
      fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77
      e318058f
    • V
      ns for fx: expose hook to define custom weight extraction functions (#62047) · 04c95a06
      Vasiliy Kuznetsov 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62047
      
      Adds a hook for user to define a weight extraction function for a
      custom type.
      
      Example usage:
      ```
      op_to_type_to_weight_extraction_fn = \
          get_op_to_type_to_weight_extraction_fn()
      op_to_type_to_weight_extraction_fn['call_function'][_wrapped_linear] = \
          torch.quantization.ns.weight_utils.get_linear_fun_weight
      
      results = extract_weights_impl(
          'a', m1, 'b', m2,
          op_to_type_to_weight_extraction_fn=op_to_type_to_weight_extraction_fn)
      ```
      
      Test Plan:
      ```
      python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
      ```
      
      Imported from OSS
      
      Reviewed By: jerryzh168
      
      Differential Revision: D29853625
      
      fbshipit-source-id: 183916ef54ba303bc818e0eba00b52e33c4633ad
      04c95a06
    • V
      ns for fx: fix typing issue in weight extraction (#62041) · 07c6a120
      Vasiliy Kuznetsov 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041
      
      Before this PR, weights of conv and linear modules were extracted
      as lists, in order to match the signature of LSTM weights.
      
      After this PR, weight extraction preserves the type of the weights,
      so extracted weights of conv and linear have a different type
      from LSTM weights.  The comparison util functions are updated to
      handle the LSTM weight type of `List[tensor]`.
      
      Test Plan:
      ```
      python test/test_quantization.py TestFXNumericSuiteCoreAPIs
      python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
      ```
      
      Imported from OSS
      
      Reviewed By: jerryzh168
      
      Differential Revision: D29853626
      
      fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e
      07c6a120
    • V
      ns for fx: change weight extraction to direct mapping (#62038) · eaba16d6
      Vasiliy Kuznetsov 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038
      
      Updates the logic to extract weights from nodes to use a
      direct mapping from type to weight extraction function.
      
      This is needed for a future PR which will allow users to
      specify custom weight extraction functions for user defined
      types.
      
      Test Plan:
      ```
      python test/test_quantization.py TestFXNumericSuiteCoreAPIs
      python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
      ```
      
      Imported from OSS
      
      Reviewed By: jerryzh168
      
      Differential Revision: D29853627
      
      fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca
      eaba16d6
    • R
      Fix some sign comparisons (#61849) · 8a2c525d
      Richard Barnes 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61849
      
      Test Plan: Sandcastle
      
      Reviewed By: malfet
      
      Differential Revision: D29736180
      
      fbshipit-source-id: 1391b11e73725ee985b9aa768566ca77f44d04ae
      8a2c525d