diff --git a/totrans/aud22_00.yaml b/totrans/aud22_00.yaml index 850b4dd3b1d917f6c8e16f0005da184206c25aa8..a6f283bd9c7c44f246a0afa5177bf9cbb4f92ff3 100644 --- a/totrans/aud22_00.yaml +++ b/totrans/aud22_00.yaml @@ -1,7 +1,11 @@ - en: TorchAudio 2.2 Doc + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchAudio 2.2 文档 - en: 来源:[https://pytorch.org/audio/stable/index.html](https://pytorch.org/audio/stable/index.html) + id: totrans-1 prefs: [] type: TYPE_NORMAL + zh: 'Source: [https://pytorch.org/audio/stable/index.html](https://pytorch.org/audio/stable/index.html)' diff --git a/totrans/aud22_01.yaml b/totrans/aud22_01.yaml index 832a4b0a2a97002869297205538267b9537aca8f..988a726e01069dc3c8b0a2ef07f1bc951d1f1aa7 100644 --- a/totrans/aud22_01.yaml +++ b/totrans/aud22_01.yaml @@ -1,4 +1,6 @@ - en: Torchaudio Documentation + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: Torchaudio 文档 diff --git a/totrans/aud22_02.yaml b/totrans/aud22_02.yaml index 18a328be6d08810a9a4867e20fe14a32e130e97a..1178efb414433489d8001d645f4728475a85c98d 100644 --- a/totrans/aud22_02.yaml +++ b/totrans/aud22_02.yaml @@ -1,354 +1,584 @@ - en: Torchaudio Documentation + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: Torchaudio文档 - en: 原文:[https://pytorch.org/audio/stable/index.html](https://pytorch.org/audio/stable/index.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/index.html](https://pytorch.org/audio/stable/index.html) - en: '![_images/logo.png](../Images/b53d726dbc41b0fa6ad51fac70918434.png)' + id: totrans-2 prefs: [] type: TYPE_IMG + zh: '![_images/logo.png](../Images/b53d726dbc41b0fa6ad51fac70918434.png)' - en: Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: Torchaudio是一个用于音频和信号处理的PyTorch库。它提供了I/O,信号和数据处理函数,数据集,模型实现和应用组件。 - en: Tutorials[](#tutorials "Permalink to this heading") + id: totrans-4 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 教程[](#tutorials "跳转到此标题的永久链接") - en: All + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 全部 - en: '* * *' + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: '* * *' - en: '[#### AM inference with CUDA CTC Beam Seach Decoder' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: '[#### 使用CUDA CTC Beam Seach解码器进行AM推理' - en: 'Topics: Pipelines,ASR,CTC-Decoder,CUDA-CTC-Decoder' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 主题:Pipelines,ASR,CTC-Decoder,CUDA-CTC-Decoder - en: Learn how to perform ASR beam search decoding with GPU, using `torchaudio.models.decoder.cuda_ctc_decoder`. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用GPU执行ASR Beam Search解码,使用`torchaudio.models.decoder.cuda_ctc_decoder`。 - en: '![](../Images/e7a25e95763882cf04670faa0448fc45.png)](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html) [#### On device audio-visual automatic speech recognition' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/e7a25e95763882cf04670faa0448fc45.png)](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html) + [#### 设备音频-视觉自动语音识别' - en: 'Topics: I/O,Pipelines,RNNT' + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,Pipelines,RNNT - en: Learn how to stream audio and video from laptop webcam and perform audio-visual automatic speech recognition using Emformer-RNNT model. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 学习如何从笔记本电脑摄像头流式传输音频和视频,并使用Emformer-RNNT模型执行音频-视觉自动语音识别。 - en: '![](../Images/7029d284337ec7c2222d6b4344ac49d0.png)](tutorials/device_avsr.html) [#### Loading waveform Tensors from files and saving them' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/7029d284337ec7c2222d6b4344ac49d0.png)](tutorials/device_avsr.html) + [#### 从文件加载波形张量并保存它们' - en: 'Topics: I/O' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O - en: Learn how to query/load audio files and save waveform tensors to files, using `torchaudio.info`, `torchaudio.load` and `torchaudio.save` functions. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 学习如何查询/加载音频文件并将波形张量保存到文件中,使用`torchaudio.info`,`torchaudio.load`和`torchaudio.save`函数。 - en: '![](../Images/7c7697acfe69544b47c02ec3585cb2b9.png)](tutorials/audio_io_tutorial.html) [#### CTC Forced Alignment API' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/7c7697acfe69544b47c02ec3585cb2b9.png)](tutorials/audio_io_tutorial.html) + [#### CTC强制对齐API' - en: 'Topics: CTC,Forced-Alignment' + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 主题:CTC,强制对齐 - en: Learn how to use TorchAudio's CTC forced alignment API (`torchaudio.functional.forced_align`). + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用TorchAudio的CTC强制对齐API(`torchaudio.functional.forced_align`)。 - en: '![](../Images/503e8ce33cf1bab2fe0bfc31ab06eb0c.png)](tutorials/ctc_forced_alignment_api_tutorial.html) [#### Forced alignment for multilingual data' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/503e8ce33cf1bab2fe0bfc31ab06eb0c.png)](tutorials/ctc_forced_alignment_api_tutorial.html) + [#### 多语言数据的强制对齐' - en: 'Topics: Forced-Alignment' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 主题:强制对齐 - en: Learn how to use align multiligual data using TorchAudio's CTC forced alignment API (`torchaudio.functional.forced_align`) and a multiligual Wav2Vec2 model. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用TorchAudio的CTC强制对齐API(`torchaudio.functional.forced_align`)和多语言Wav2Vec2模型对齐多语言数据。 - en: '![](../Images/b9c0f51256a6c25345ba2e4cc08a7f33.png)](tutorials/forced_alignment_for_multilingual_data_tutorial.html) [#### Streaming media decoding with StreamReader' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/b9c0f51256a6c25345ba2e4cc08a7f33.png)](tutorials/forced_alignment_for_multilingual_data_tutorial.html) + [#### 使用StreamReader进行流媒体解码' - en: 'Topics: I/O,StreamReader' + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamReader - en: Learn how to load audio/video to Tensors using `torchaudio.io.StreamReader` class. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.io.StreamReader`类将音频/视频加载到张量中。 - en: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](tutorials/streamreader_basic_tutorial.html) [#### Device input, synthetic audio/video, and filtering with StreamReader' + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](tutorials/streamreader_basic_tutorial.html) + [#### 使用StreamReader进行设备输入、合成音频/视频和过滤' - en: 'Topics: I/O,StreamReader' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamReader - en: Learn how to load media from hardware devices, generate synthetic audio/video, and apply filters to them with `torchaudio.io.StreamReader`. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 学习如何从硬件设备加载媒体,生成合成音频/视频,并使用`torchaudio.io.StreamReader`对其应用滤镜。 - en: '![](../Images/f6073184469be7b25be3322e22b86f48.png)](tutorials/streamreader_advanced_tutorial.html) [#### Streaming media encoding with StreamWriter' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/f6073184469be7b25be3322e22b86f48.png)](tutorials/streamreader_advanced_tutorial.html) + [#### 使用StreamWriter进行流媒体编码' - en: 'Topics: I/O,StreamWriter' + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamWriter - en: Learn how to save audio/video with `torchaudio.io.StreamWriter`. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.io.StreamWriter`保存音频/视频。 - en: '![](../Images/47bb257a04b8d31914c1c25526704a87.png)](tutorials/streamwriter_basic_tutorial.html) [#### Playing media with StreamWriter' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/47bb257a04b8d31914c1c25526704a87.png)](tutorials/streamwriter_basic_tutorial.html) + [#### 使用StreamWriter播放媒体' - en: 'Topics: I/O,StreamWriter' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamWriter - en: Learn how to play audio/video with `torchaudio.io.StreamWriter`. + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.io.StreamWriter`播放音频/视频。 - en: '![](../Images/c17a8cd1a15881d6b95242ba538fcb7f.png)](tutorials/streamwriter_advanced.html) [#### Hardware accelerated video decoding with NVDEC' + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/c17a8cd1a15881d6b95242ba538fcb7f.png)](tutorials/streamwriter_advanced.html) + [#### 使用NVDEC进行硬件加速视频解码' - en: 'Topics: I/O,StreamReader' + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamReader - en: Learn how to use HW video decoder. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用硬件视频解码器。 - en: '![](../Images/89ac3082e35e60e3410d81d1563ed18b.png)](tutorials/nvdec_tutorial.html) [#### Hardware accelerated video encoding with NVENC' + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/89ac3082e35e60e3410d81d1563ed18b.png)](tutorials/nvdec_tutorial.html) + [#### 使用NVENC进行硬件加速视频编码' - en: 'Topics: I/O,StreamWriter' + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 主题:I/O,StreamWriter - en: Learn how to use HW video encoder. + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用硬件视频编码器。 - en: '![](../Images/89ac3082e35e60e3410d81d1563ed18b.png)](tutorials/nvenc_tutorial.html) [#### Apply effects and codecs to waveform' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/89ac3082e35e60e3410d81d1563ed18b.png)](tutorials/nvenc_tutorial.html) + [#### 对波形应用效果和编解码器' - en: 'Topics: Preprocessing' + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 主题:预处理 - en: Learn how to apply effects and codecs to waveform using `torchaudio.io.AudioEffector`. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.io.AudioEffector`对波形应用效果和编解码器。 - en: '![](../Images/8404ff1ec9d47bea48da5b113159e989.png)](tutorials/effector_tutorial.html) [#### Audio resampling with bandlimited sinc interpolation' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/8404ff1ec9d47bea48da5b113159e989.png)](tutorials/effector_tutorial.html) + [#### 使用有限带宽sinc插值进行音频重采样' - en: 'Topics: Preprocessing' + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 主题:预处理 - en: Learn how to resample audio tensor with `torchaudio.functional.resample` and `torchaudio.transforms.Resample`. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.functional.resample`和`torchaudio.transforms.Resample`对音频张量进行重采样。 - en: '![](../Images/72a6c1814ddfb5c23adec67efd1e0b66.png)](tutorials/audio_resampling_tutorial.html) [#### Audio data augmentation' + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/72a6c1814ddfb5c23adec67efd1e0b66.png)](tutorials/audio_resampling_tutorial.html) + [#### 音频数据增强' - en: 'Topics: Preprocessing' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 主题:预处理 - en: Learn how to use `torchaudio.functional` and `torchaudio.transforms` modules to perform data augmentation. + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.functional`和`torchaudio.transforms`模块执行数据增强。 - en: '![](../Images/c80cb203950ac50011446822e0d43a9a.png)](tutorials/audio_data_augmentation_tutorial.html) [#### Audio feature extraction' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/c80cb203950ac50011446822e0d43a9a.png)](tutorials/audio_data_augmentation_tutorial.html) + [#### 音频特征提取' - en: 'Topics: Preprocessing' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 主题:预处理 - en: Learn how to use `torchaudio.functional` and `torchaudio.transforms` modules to extract features from waveform. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.functional`和`torchaudio.transforms`模块从波形中提取特征。 - en: '![](../Images/66b0577f2a89d08f0ffcdd5bf744fc56.png)](tutorials/audio_feature_extractions_tutorial.html) [#### Audio feature augmentation' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/66b0577f2a89d08f0ffcdd5bf744fc56.png)](tutorials/audio_feature_extractions_tutorial.html) + [#### 音频特征增强' - en: 'Topics: Preprocessing' + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 主题:预处理 - en: Learn how to use `torchaudio.functional` and `torchaudio.transforms` modules to perform feature augmentation. + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.functional`和`torchaudio.transforms`模块执行特征增强。 - en: '![](../Images/34118d94eaa384b854c9418ad5da1f95.png)](tutorials/audio_feature_augmentation_tutorial.html) [#### Generating waveforms with oscillator' + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/34118d94eaa384b854c9418ad5da1f95.png)](tutorials/audio_feature_augmentation_tutorial.html) + [#### 使用振荡器生成波形' - en: 'Topics: DSP' + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 主题:DSP - en: '![](../Images/7533eae9ba1a1de4063506e20c82e055.png)](tutorials/oscillator_tutorial.html) [#### Additive Synthesis' + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/7533eae9ba1a1de4063506e20c82e055.png)](tutorials/oscillator_tutorial.html) + [#### 加法合成' - en: 'Topics: DSP' + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 主题:DSP - en: '![](../Images/f9720f8c48393179934600ac2f65389d.png)](tutorials/additive_synthesis_tutorial.html) [#### Designing digital filters' + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/f9720f8c48393179934600ac2f65389d.png)](tutorials/additive_synthesis_tutorial.html) + [#### 设计数字滤波器' - en: 'Topics: DSP' + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 主题:DSP - en: '![](../Images/028f20c05a4c635e9b4dc624deff86d9.png)](tutorials/filter_design_tutorial.html) [#### Subtractive Synthesis' + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/028f20c05a4c635e9b4dc624deff86d9.png)](tutorials/filter_design_tutorial.html) + [#### 减法合成' - en: 'Topics: DSP' + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 主题:DSP - en: '![](../Images/ceefcd60c63f2946e39ad0990b7155b2.png)](tutorials/subtractive_synthesis_tutorial.html) [#### Audio dataset' + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/ceefcd60c63f2946e39ad0990b7155b2.png)](tutorials/subtractive_synthesis_tutorial.html) + [#### 音频数据集' - en: 'Topics: Dataset' + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 主题:数据集 - en: Learn how to use `torchaudio.datasets` module. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用`torchaudio.datasets`模块。 - en: '![](../Images/45bf746a4ad837db90636d007045fba9.png)](tutorials/audio_datasets_tutorial.html) [#### AM inference with Wav2Vec2' + id: totrans-66 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/45bf746a4ad837db90636d007045fba9.png)](tutorials/audio_datasets_tutorial.html) + [#### 使用Wav2Vec2进行AM推断' - en: 'Topics: ASR,wav2vec2' + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 主题:ASR,wav2vec2 - en: Learn how to perform acoustic model inference with Wav2Vec2 (`torchaudio.pipelines.Wav2Vec2ASRBundle`). + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用Wav2Vec2(`torchaudio.pipelines.Wav2Vec2ASRBundle`)执行声学模型推断。 - en: '![](../Images/759e5d687d4fd430de33b8a738b7d35e.png)](tutorials/speech_recognition_pipeline_tutorial.html) [#### LM inference with CTC Beam Seach Decoder' + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/759e5d687d4fd430de33b8a738b7d35e.png)](tutorials/speech_recognition_pipeline_tutorial.html) + [#### 使用CTC波束搜索解码器进行LM推断' - en: 'Topics: Pipelines,ASR,wav2vec2,CTC-Decoder' + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,ASR,wav2vec2,CTC-Decoder - en: Learn how to perform ASR beam search decoding with lexicon and language model, using `torchaudio.models.decoder.ctc_decoder`. + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用词典和语言模型进行ASR波束搜索解码,使用`torchaudio.models.decoder.ctc_decoder`。 - en: '![](../Images/e7a25e95763882cf04670faa0448fc45.png)](tutorials/asr_inference_with_ctc_decoder_tutorial.html) [#### Online ASR with Emformer RNN-T' + id: totrans-72 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/e7a25e95763882cf04670faa0448fc45.png)](tutorials/asr_inference_with_ctc_decoder_tutorial.html) + [#### 使用Emformer RNN-T进行在线ASR' - en: 'Topics: Pipelines,ASR,RNNT,StreamReader' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,ASR,RNNT,StreamReader - en: Learn how to perform online ASR with Emformer RNN-T (`torchaudio.pipelines.RNNTBundle`) and `torchaudio.io.StreamReader`. + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用Emformer RNN-T(`torchaudio.pipelines.RNNTBundle`)和`torchaudio.io.StreamReader`进行在线ASR。 - en: '![](../Images/8eb911527cd412c646efcf3a47694751.png)](tutorials/online_asr_tutorial.html) [#### Real-time microphone ASR with Emformer RNN-T' + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/8eb911527cd412c646efcf3a47694751.png)](tutorials/online_asr_tutorial.html) + [#### 使用Emformer RNN-T进行实时麦克风ASR' - en: 'Topics: Pipelines,ASR,RNNT,StreamReader' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,ASR,RNNT,StreamReader - en: Learn how to transcribe speech fomr microphone with Emformer RNN-T (`torchaudio.pipelines.RNNTBundle`) and `torchaudio.io.StreamReader`. + id: totrans-77 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用Emformer RNN-T(`torchaudio.pipelines.RNNTBundle`)和`torchaudio.io.StreamReader`从麦克风转录语音。 - en: '![](../Images/9f1cbd6779e3f9b5466a62958ff1aa4d.png)](tutorials/device_asr.html) [#### Forced Alignment with Wav2Vec2' + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/9f1cbd6779e3f9b5466a62958ff1aa4d.png)](tutorials/device_asr.html) + [#### 使用Wav2Vec2进行强制对齐' - en: 'Topics: Pipelines,Forced-Alignment,wav2vec2' + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,强制对齐,wav2vec2 - en: Learn how to align text to speech with Wav2Vec 2 (`torchaudio.pipelines.Wav2Vec2ASRBundle`). + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用Wav2Vec 2(`torchaudio.pipelines.Wav2Vec2ASRBundle`)将文本与语音对齐。 - en: '![](../Images/8d2c58eb58c5541cb88d3d7020a48231.png)](tutorials/forced_alignment_tutorial.html) [#### Text-to-Speech with Tacotron2' + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/8d2c58eb58c5541cb88d3d7020a48231.png)](tutorials/forced_alignment_tutorial.html) + [#### 使用Tacotron2进行文本到语音转换' - en: 'Topics: Pipelines,TTS-(Text-to-Speech)' + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,TTS-(文本到语音) - en: Learn how to generate speech from text with Tacotron2 (`torchaudio.pipelines.Tacotron2TTSBundle`). + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用Tacotron2(`torchaudio.pipelines.Tacotron2TTSBundle`)从文本生成语音。 - en: '![](../Images/9eb051bb2da7cc1078a42ad11d2a0e37.png)](tutorials/tacotron2_pipeline_tutorial.html) [#### Speech Enhancement with MVDR Beamforming' + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/9eb051bb2da7cc1078a42ad11d2a0e37.png)](tutorials/tacotron2_pipeline_tutorial.html) + [#### MVDR波束形成的语音增强' - en: 'Topics: Pipelines,Speech-Enhancement' + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,语音增强 - en: Learn how to improve speech quality with MVDR Beamforming. + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用MVDR波束形成改善语音质量。 - en: '![](../Images/e3534984cab738a2fc134dd82d26b689.png)](tutorials/mvdr_tutorial.html) [#### Music Source Separation with Hybrid Demucs' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/e3534984cab738a2fc134dd82d26b689.png)](tutorials/mvdr_tutorial.html) + [#### 使用混合Demucs进行音乐源分离' - en: 'Topics: Pipelines,Source-Separation' + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,源分离 - en: Learn how to perform music source separation with pre-trained Hybrid Demucs (`torchaudio.pipelines.SourceSeparationBundle`). + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用预训练的混合Demucs(`torchaudio.pipelines.SourceSeparationBundle`)执行音乐源分离。 - en: '![](../Images/923517e38a4a4d8b89489fbd1603f0f4.png)](tutorials/hybrid_demucs_tutorial.html) [#### Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio' + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/923517e38a4a4d8b89489fbd1603f0f4.png)](tutorials/hybrid_demucs_tutorial.html) + [#### Torchaudio-Squim:TorchAudio中的非侵入式语音评估' - en: 'Topics: Pipelines,Speech Assessment,Speech Enhancement' + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: 主题:流水线,语音评估,语音增强 - en: Learn how to estimate subjective and objective metrics with pre-trained TorchAudio-SQUIM models (`torchaudio.pipelines.SQUIMObjective`). + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: 学习如何使用预训练的TorchAudio-SQUIM模型(`torchaudio.pipelines.SQUIMObjective`)估计主观和客观指标。 - en: '![](../Images/62ea891fdabe714735b28751d17e57b5.png)](tutorials/squim_tutorial.html)' + id: totrans-93 prefs: [] type: TYPE_IMG + zh: ![](../Images/62ea891fdabe714735b28751d17e57b5.png)](tutorials/squim_tutorial.html) - en: Citing torchaudio[](#citing-torchaudio "Permalink to this heading") + id: totrans-94 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 引用torchaudio[](#citing-torchaudio "此标题的永久链接") - en: 'If you find torchaudio useful, please cite the following paper:' + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 如果您发现torchaudio有用,请引用以下论文: - en: 'Yang, Y.-Y., Hira, M., Ni, Z., Chourdia, A., Astafurov, A., Chen, C., Yeh, C.-F., Puhrsch, C., Pollack, D., Genzel, D., Greenberg, D., Yang, E. Z., Lian, J., Mahadeokar, J., Hwang, J., Chen, J., Goldsborough, P., Roy, P., Narenthiran, S., Watanabe, S., Chintala, S., Quenneville-Bélair, V, & Shi, Y. (2021). TorchAudio: Building Blocks for Audio and Speech Processing. arXiv preprint arXiv:2110.15018.' + id: totrans-96 prefs: - PREF_UL type: TYPE_NORMAL + zh: 杨,Y.-Y.,希拉,M.,倪,Z.,乔迪亚,A.,阿斯塔夫罗夫,A.,陈,C.,叶,C.-F.,普尔施,C.,波拉克,D.,根策尔,D.,格林伯格,D.,杨,E. + Z.,连,J.,马哈迪奥卡尔,J.,黄,J.,陈,J.,戈兹伯勒,P.,罗伊,P.,纳伦西兰,S.,渡边,S.,钦塔拉,S.,肯纳维尔-贝莱尔,V.,& 施,Y.(2021)。TorchAudio:音频和语音处理的构建模块。arXiv预印本arXiv:2110.15018。 - en: 'In BibTeX format:' + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 以BibTeX格式: - en: '[PRE0]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-99 prefs: [] type: TYPE_PRE + zh: '[PRE1]' diff --git a/totrans/aud22_03.yaml b/totrans/aud22_03.yaml index a8356afcf21a08e3d97f62d35f8ab73ff8dd1aae..fbd3c0ecc04c89f11e79b986cdadedc1eb0005de 100644 --- a/totrans/aud22_03.yaml +++ b/totrans/aud22_03.yaml @@ -1,126 +1,197 @@ - en: Supported Features + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 支持的功能 - en: 原文:[https://pytorch.org/audio/stable/supported_features.html](https://pytorch.org/audio/stable/supported_features.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/supported_features.html](https://pytorch.org/audio/stable/supported_features.html) - en: 'Each TorchAudio API supports a subset of PyTorch features, such as devices and data types. Supported features are indicated in API references like the following:' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 每个 TorchAudio API 支持一部分 PyTorch 功能,比如设备和数据类型。支持的功能在 API 参考中标明,如下所示: - en: '[![This feature supports the following devices: CPU, CUDA](../Images/436dcea77111f2b243d161ad46fb68d6.png)](supported_features.html#devices) [![This API supports the following properties: Autograd, TorchScript](../Images/7f8d40aa9fa8230970316fdd270003ed.png)](supported_features.html#properties)' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: '[![此功能支持以下设备:CPU,CUDA](../Images/436dcea77111f2b243d161ad46fb68d6.png)](supported_features.html#devices) + [![此 API 支持以下属性:Autograd,TorchScript](../Images/7f8d40aa9fa8230970316fdd270003ed.png)](supported_features.html#properties)' - en: These icons mean that they are verified through automated testing. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 这些图标表示它们已通过自动化测试验证。 - en: Note + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Missing feature icons mean that they are not tested, and this can mean different things, depending on the API. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 缺失的功能图标表示它们未经测试,这可能意味着不同的事情,具体取决于 API。 - en: The API is compatible with the feature but not tested. + id: totrans-7 prefs: - PREF_OL type: TYPE_NORMAL + zh: API 与该功能兼容,但未经测试。 - en: The API is not compatible with the feature. + id: totrans-8 prefs: - PREF_OL type: TYPE_NORMAL + zh: API 与该功能不兼容。 - en: In case of 2, the API might explicitly raise an error, but that is not guaranteed. For example, APIs without an Autograd badge might throw an error during backpropagation, or silently return a wrong gradient. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 在第二种情况下,API 可能会明确引发错误,但这并不保证。例如,没有 Autograd 标志的 API 可能在反向传播过程中抛出错误,或者悄悄返回错误的梯度。 - en: If you use an API that hasn’t been labeled as supporting a feature, you might want to first verify that the feature works fine. + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 如果您使用的 API 没有被标记为支持某个功能,您可能需要先验证该功能是否正常工作。 - en: Devices[](#devices "Permalink to this heading") + id: totrans-11 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 设备[](#设备 "此标题的永久链接") - en: CPU[](#cpu "Permalink to this heading") + id: totrans-12 prefs: - PREF_H3 type: TYPE_NORMAL + zh: CPU[](#cpu "此标题的永久链接") - en: '[![This feature supports the following devices: CPU](../Images/d8168b4ee98570889e4c86c2a6aeca75.png)](supported_features.html#devices)' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: '[![此功能支持以下设备:CPU](../Images/d8168b4ee98570889e4c86c2a6aeca75.png)](supported_features.html#devices)' - en: TorchAudio APIs that support CPU can perform their computation on CPU tensors. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 支持 CPU 的 TorchAudio API 可以在 CPU 张量上执行计算。 - en: CUDA[](#cuda "Permalink to this heading") + id: totrans-15 prefs: - PREF_H3 type: TYPE_NORMAL + zh: CUDA[](#cuda "此标题的永久链接") - en: '[![This feature supports the following devices: CUDA](../Images/715a101451863e082b0b61bdeaec1135.png)](supported_features.html#devices)' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[![此功能支持以下设备:CUDA](../Images/715a101451863e082b0b61bdeaec1135.png)](supported_features.html#devices)' - en: TorchAudio APIs that support CUDA can perform their computation on CUDA devices. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 支持 CUDA 的 TorchAudio API 可以在 CUDA 设备上执行计算。 - en: In case of functions, move the tensor arguments to CUDA device before passing them to a function. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 在函数的情况下,在将张量参数传递给函数之前,将它们移动到 CUDA 设备上。 - en: 'For example:' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 例如: - en: '[PRE0]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Classes with CUDA support are implemented with `torch.nn.Module()`. It is also necessary to move the instance to CUDA device, before passing CUDA tensors. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 具有 CUDA 支持的类使用 `torch.nn.Module()` 实现。在传递 CUDA 张量之前,将实例移动到 CUDA 设备是必要的。 - en: 'For example:' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 例如: - en: '[PRE1]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Properties[](#properties "Permalink to this heading") + id: totrans-24 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 属性[](#属性 "此标题的永久链接") - en: Autograd[](#autograd "Permalink to this heading") + id: totrans-25 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 自动求导[](#autograd "此标题的永久链接") - en: '[![This API supports the following properties: Autograd](../Images/6d4055c124921ae7bf28212985a77b02.png)](supported_features.html#properties)' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '[![此 API 支持以下属性:Autograd](../Images/6d4055c124921ae7bf28212985a77b02.png)](supported_features.html#properties)' - en: TorchAudio APIs with autograd support can correctly backpropagate gradients. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 支持自动求导的 TorchAudio API 可以正确地反向传播梯度。 - en: For the basics of autograd, please refer to this [tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html). + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 有关自动求导的基础知识,请参考这个[教程](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)。 - en: Note + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: APIs without this mark may or may not raise an error during backpropagation. The absence of an error raised during backpropagation does not necessarily mean the gradient is correct. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 没有此标记的 API 在反向传播过程中可能会引发错误,也可能不会。在反向传播过程中没有引发错误并不一定意味着梯度是正确的。 - en: TorchScript[](#torchscript "Permalink to this heading") + id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL + zh: TorchScript[](#torchscript "此标题的永久链接") - en: '[![This API supports the following properties: TorchScript](../Images/f8384797bafe9e7b6155ead9932a3063.png)](supported_features.html#properties)' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: '[![此 API 支持以下属性:TorchScript](../Images/f8384797bafe9e7b6155ead9932a3063.png)](supported_features.html#properties)' - en: TorchAudio APIs with TorchScript support can be serialized and executed in non-Python environments. + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 具有 TorchScript 支持的 TorchAudio API 可以在非 Python 环境中序列化和执行。 - en: For details on TorchScript, please refer to the [documentation](https://pytorch.org/docs/stable/jit.html). + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 有关 TorchScript 的详细信息,请参考[文档](https://pytorch.org/docs/stable/jit.html)。 diff --git a/totrans/aud22_04.yaml b/totrans/aud22_04.yaml index 0cc62754737dbde083c007d7e455e64259f7a0c9..5460c559b88f80167e2489fc470d5d0631c7fce1 100644 --- a/totrans/aud22_04.yaml +++ b/totrans/aud22_04.yaml @@ -1,29 +1,41 @@ - en: Feature Classifications + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 功能分类 - en: 原文:[https://pytorch.org/audio/stable/feature_classifications.html](https://pytorch.org/audio/stable/feature_classifications.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/feature_classifications.html](https://pytorch.org/audio/stable/feature_classifications.html) - en: 'Features described in this documentation are classified by release status:' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 本文档中描述的功能按发布状态分类: - en: '*Stable:* These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time).' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 稳定:这些功能将长期保持,并且通常不会有主要性能限制或文档中的差距。我们也希望保持向后兼容性(尽管可能会发生破坏性更改,并且会提前一个版本发布通知)。 - en: '*Beta:* Features are tagged as Beta because the API may change based on user feedback, because the performance needs to improve, or because coverage across operators is not yet complete. For Beta features, we are committing to seeing the feature through to the Stable classification. We are not, however, committing to backwards compatibility.' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: Beta:功能被标记为Beta,因为API可能会根据用户反馈而更改,因为性能需要改进,或者因为跨运算符的覆盖范围尚未完全。对于Beta功能,我们承诺将该功能推进到稳定分类。然而,我们不承诺向后兼容。 - en: '*Prototype:* These features are typically not available as part of binary distributions like PyPI or Conda, except sometimes behind run-time flags, and are at an early stage for feedback and testing.' + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 原型:这些功能通常不作为PyPI或Conda等二进制发行版的一部分提供,除非有时在运行时标志后面,并且处于反馈和测试的早期阶段。 diff --git a/totrans/aud22_05.yaml b/totrans/aud22_05.yaml index f77363689b0cfe53faa2de834b6a81f8049fe8b9..8a0dd0e31e5eebdfeb00c5a96ef381d676edae14 100644 --- a/totrans/aud22_05.yaml +++ b/totrans/aud22_05.yaml @@ -1,45 +1,69 @@ - en: TorchAudio Logo + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchAudio 标志 - en: 原文:[https://pytorch.org/audio/stable/logo.html](https://pytorch.org/audio/stable/logo.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/logo.html](https://pytorch.org/audio/stable/logo.html) - en: If you make your project using TorchAudio and you want to mention TorchAudio, you can use TorchAudio logo. There are couple of variations. You can download them from [here](https://download.pytorch.org/torchaudio/logo/v1.zip). + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 如果您使用 TorchAudio 制作项目并想提及 TorchAudio,您可以使用 TorchAudio 标志。有几种变体。您可以从[这里](https://download.pytorch.org/torchaudio/logo/v1.zip)下载它们。 - en: Please follow [the guideline](https://download.pytorch.org/torchaudio/logo/v1/guidelines.pdf) for the proper usage. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 请遵循[指南](https://download.pytorch.org/torchaudio/logo/v1/guidelines.pdf)以正确使用。 - en: Warning + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 警告 - en: Please do not alter the logo. The guideline lists examples of inproper usages as well, so please check them out before using the logos. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 请不要修改标志。指南列出了不当使用的示例,请在使用标志之前查看它们。 - en: Icon[](#icon "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 图标[](#icon "跳转到此标题的永久链接") - en: '[![https://download.pytorch.org/torchaudio/logo/v1/icon.png](../Images/0f8a20b254c0a0a24667c35a283fdeee.png)](https://download.pytorch.org/torchaudio/logo/v1/icon.png)' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/logo/v1/icon.png](../Images/0f8a20b254c0a0a24667c35a283fdeee.png)](https://download.pytorch.org/torchaudio/logo/v1/icon.png)' - en: Horizontal[](#horizontal "Permalink to this heading") + id: totrans-8 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 水平[](#horizontal "跳转到此标题的永久链接") - en: '[![https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_fullcolor.png](../Images/559e38269daa61bf14e53b879fe07651.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_fullcolor.png)[![https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_black.png](../Images/2f32e7bce3f2fcddac736d84499c999e.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_black.png)![](../Images/6c04bd3c064bb5a44ec0a134cbed6942.png)' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_fullcolor.png](../Images/559e38269daa61bf14e53b879fe07651.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_fullcolor.png)[![https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_black.png](../Images/2f32e7bce3f2fcddac736d84499c999e.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_horizontal_black.png)![](../Images/6c04bd3c064bb5a44ec0a134cbed6942.png)' - en: Vertical[](#vertical "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 垂直[](#vertical "跳转到此标题的永久链接") - en: '[![https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_fullcolor.png](../Images/8b836fc7376d32939bdc379f807b28ee.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_fullcolor.png)[![https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_black.png](../Images/c7ceea352093f00b2e5ee364e4b30d18.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_black.png)![](../Images/f6a349d770bcda78ee8a8f74f4962ff8.png)' + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_fullcolor.png](../Images/8b836fc7376d32939bdc379f807b28ee.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_fullcolor.png)[![https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_black.png](../Images/c7ceea352093f00b2e5ee364e4b30d18.png)](https://download.pytorch.org/torchaudio/logo/v1/logo_vertical_black.png)![](../Images/f6a349d770bcda78ee8a8f74f4962ff8.png)' diff --git a/totrans/aud22_06.yaml b/totrans/aud22_06.yaml index b5c97d527651d8e3570805bdc4a6b8789f7562ee..5767ec365eff9f9dcf88fecaf98c47fa024b8c53 100644 --- a/totrans/aud22_06.yaml +++ b/totrans/aud22_06.yaml @@ -1,578 +1,906 @@ - en: References + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 参考文献 - en: 原文:[https://pytorch.org/audio/stable/references.html](https://pytorch.org/audio/stable/references.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/audio/stable/references.html](https://pytorch.org/audio/stable/references.html)' - en: '[Yes]' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '[Yes]' - en: 'Yesno. URL: [http://www.openslr.org/1/](http://www.openslr.org/1/).' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: Yesno。网址:[http://www.openslr.org/1/](http://www.openslr.org/1/)。 - en: '[AB79]' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '[AB79]' - en: Jont B Allen and David A Berkley. Image method for efficiently simulating small-room acoustics. *The Journal of the Acoustical Society of America*, 65(4):943–950, 1979. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: Jont B Allen和David A Berkley。用于高效模拟小房间声学的图像方法。*美国声学学会杂志*,65(4):943-950,1979年。 - en: '[ABD+20]' + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: '[ABD+20]' - en: 'Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. Common voice: a massively-multilingual speech corpus. 2020\. [arXiv:1912.06670](https://arxiv.org/abs/1912.06670).' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: Rosana Ardila,Megan Branson,Kelly Davis,Michael Henretty,Michael Kohler,Josh + Meyer,Reuben Morais,Lindsay Saunders,Francis M. Tyers和Gregor Weber。Common voice:一个大规模多语言语音语料库。2020年。[arXiv:1912.06670](https://arxiv.org/abs/1912.06670)。 - en: '[BWT+21]' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '[BWT+21]' - en: 'Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, and others. Xls-r: self-supervised cross-lingual speech representation learning at scale. *arXiv preprint arXiv:2111.09296*, 2021.' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: Arun Babu,王长翰,Andros Tjandra,Kushal Lakhotia,徐前通,Naman Goyal,Kritika Singh,Patrick + von Platen,Yatharth Saraf,Juan Pino等人。Xls-r:规模化的自监督跨语言语音表示学习。*arXiv预印本arXiv:2111.09296*,2021年。 - en: '[BZMA20]' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '[BZMA20]' - en: 'Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: a framework for self-supervised learning of speech representations. 2020\. [arXiv:2006.11477](https://arxiv.org/abs/2006.11477).' + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: Alexei Baevski,Henry Zhou,Abdelrahman Mohamed和Michael Auli。Wav2vec 2.0:一种用于自监督学习语音表示的框架。2020年。[arXiv:2006.11477](https://arxiv.org/abs/2006.11477)。 - en: '[BBL+08]' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: '[BBL+08]' - en: 'Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower Provost, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. Iemocap: interactive emotional dyadic motion capture database. *Language Resources and Evaluation*, 42:335–359, 12 2008\. [doi:10.1007/s10579-008-9076-6](https://doi.org/10.1007/s10579-008-9076-6).' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: Carlos Busso,Murtaza Bulut,李志俊,Abe Kazemzadeh,Emily Mower Provost,Samuel Kim,Jeannette + Chang,李成博,Shrikanth Narayanan。Iemocap:交互式情感二元动作捕捉数据库。*语言资源与评估*,42:335-359,2008年12月。[doi:10.1007/s10579-008-9076-6](https://doi.org/10.1007/s10579-008-9076-6)。 - en: '[Cap69]' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: '[Cap69]' - en: Jack Capon. High-resolution frequency-wavenumber spectrum analysis. *Proceedings of the IEEE*, 57(8):1408–1418, 1969. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: Jack Capon。高分辨率频率-波数谱分析。*IEEE会议论文集*,57(8):1408-1418,1969年。 - en: '[CDiGangiB+21]' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[CDiGangiB+21]' - en: 'Roldano Cattoni, Mattia Antonino Di Gangi, Luisa Bentivogli, Matteo Negri, and Marco Turchi. Must-c: a multilingual corpus for end-to-end speech translation. *Computer Speech & Language*, 66:101155, 2021\. URL: [https://www.sciencedirect.com/science/article/pii/S0885230820300887](https://www.sciencedirect.com/science/article/pii/S0885230820300887), [doi:https://doi.org/10.1016/j.csl.2020.101155](https://doi.org/https://doi.org/10.1016/j.csl.2020.101155).' + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: Roldano Cattoni,Mattia Antonino Di Gangi,Luisa Bentivogli,Matteo Negri和Marco + Turchi。Must-c:用于端到端语音翻译的多语言语料库。*计算机语音与语言*,66:101155,2021年。网址:[https://www.sciencedirect.com/science/article/pii/S0885230820300887](https://www.sciencedirect.com/science/article/pii/S0885230820300887),[doi:https://doi.org/10.1016/j.csl.2020.101155](https://doi.org/https://doi.org/10.1016/j.csl.2020.101155)。 - en: '[CCW+21]' + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: '[CCW+21]' - en: 'Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, and Zhiyong Yan. Gigaspeech: an evolving, multi-domain asr corpus with 10,000 hours of transcribed audio. In *Proc. Interspeech 2021*. 2021.' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: Guoguo Chen,柴树洲,王冠波,杜佳宇,张伟强,翁超,苏丹,Daniel Povey,Jan Trmal,张俊博,金明杰,Sanjeev Khudanpur,Shinji + Watanabe,赵帅江,邹伟,李相刚,姚旭晨,王永庆,王玉军,尤赵,严志勇。Gigaspeech:一个不断发展的、多领域的带有10000小时转录音频的自动语音识别语料库。在*Interspeech + 2021*会议上。2021年。 - en: '[CWC+22]' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: '[CWC+22]' - en: 'Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, and others. Wavlm: large-scale self-supervised pre-training for full stack speech processing. *IEEE Journal of Selected Topics in Signal Processing*, 16(6):1505–1518, 2022.' + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 三元陈,程毅王,正阳陈,宇吴,树杰刘,卓陈,金宇李,神行祥,吉田直之,吉冈拓也,肖雄,等人。Wavlm:用于全栈语音处理的大规模自监督预训练。*IEEE信号处理领域选题杂志*,16(6):1505-1518,2022年。 - en: '[CPS16]' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: '[CPS16]' - en: 'Ronan Collobert, Christian Puhrsch, and Gabriel Synnaeve. Wav2letter: an end-to-end convnet-based speech recognition system. 2016\. [arXiv:1609.03193](https://arxiv.org/abs/1609.03193).' + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: Ronan Collobert,Christian Puhrsch和Gabriel Synnaeve。Wav2letter:一种端到端的基于卷积神经网络的语音识别系统。2016年。[arXiv:1609.03193](https://arxiv.org/abs/1609.03193)。 - en: '[CBC+20]' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: '[CBC+20]' - en: Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, and Michael Auli. Unsupervised cross-lingual representation learning for speech recognition. 2020\. [arXiv:2006.13979](https://arxiv.org/abs/2006.13979). + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: Alexis Conneau,Alexei Baevski,Ronan Collobert,Abdelrahman Mohamed和Michael Auli。用于语音识别的无监督跨语言表示学习。2020年。[arXiv:2006.13979](https://arxiv.org/abs/2006.13979)。 - en: '[CY21]' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '[CY21]' - en: Erica Cooper and Junichi Yamagishi. How do voices from past speech synthesis challenges compare today? *arXiv preprint arXiv:2105.02373*, 2021. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: Erica Cooper和山岸纯一。过去语音合成挑战中的声音如何与今天相比?*arXiv预印本arXiv:2105.02373*,2021年。 - en: '[CPC+20]' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: '[CPC+20]' - en: 'Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. Librimix: an open-source dataset for generalizable speech separation. 2020\. [arXiv:2005.11262](https://arxiv.org/abs/2005.11262).' + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: Joris Cosentino,Manuel Pariente,Samuele Cornell,Antoine Deleforge和Emmanuel Vincent。Librimix:一个用于通用语音分离的开源数据集。2020年。[arXiv:2005.11262](https://arxiv.org/abs/2005.11262)。 - en: '[CSB+18]' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '[CSB+18]' - en: 'Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, and others. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. *arXiv preprint arXiv:1805.10190*, 2018.' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: Alice Coucke,Alaa Saade,Adrien Ball,Théodore Bluche,Alexandre Caulier,David + Leroy,Clément Doumouro,Thibault Gisselbrecht,Francesco Caltagirone,Thibaut Lavril等人。Snips语音平台:一种用于私密设计语音界面的嵌入式口语理解系统。*arXiv预印本arXiv:1805.10190*,2018年。 - en: '[DL82]' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: '[DL82]' - en: DC Dowson and BV666017 Landau. The fréchet distance between multivariate normal distributions. *Journal of multivariate analysis*, 12(3):450–455, 1982. + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: DC道森和BV666017兰道。多元正态分布之间的弗雷歇距离。*多元分析杂志*,12(3):450-455,1982年。 - en: '[Defossez21]' + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: '[Defossez21]' - en: Alexandre Défossez. Hybrid spectrogram and waveform source separation. In *Proceedings of the ISMIR 2021 Workshop on Music Source Separation*. 2021. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 亚历山大·德福塞。混合谱图和波形源分离。在*ISMIR 2021音乐源分离研讨会论文集*中。2021年。 - en: '[GKRR14]' + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: '[GKRR14]' - en: 'Mark John Francis Gales, Kate Knill, Anton Ragni, and Shakti Prasad Rath. Speech recognition and keyword spotting for low-resource languages: babel project research at cued. In *SLTU*. 2014.' + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 马克·约翰·弗朗西斯·盖尔斯、凯特·尼尔、安东·拉格尼和沙克提·普拉萨德·拉特。低资源语言的语音识别和关键词检测:剑桥大学babel项目研究。在*SLTU*中。2014年。 - en: '[Gra12]' + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: '[Gra12]' - en: Alex Graves. Sequence transduction with recurrent neural networks. 2012\. [arXiv:1211.3711](https://arxiv.org/abs/1211.3711). + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 亚历克斯·格雷夫斯。使用递归神经网络进行序列转导。2012年。[arXiv:1211.3711](https://arxiv.org/abs/1211.3711)。 - en: '[GL83]' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: '[GL83]' - en: D. Griffin and Jae Lim. Signal estimation from modified short-time fourier transform. In *ICASSP '83\. IEEE International Conference on Acoustics, Speech, and Signal Processing*, volume 8, 804–807\. 1983\. [doi:10.1109/ICASSP.1983.1172092](https://doi.org/10.1109/ICASSP.1983.1172092). + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: D.格里芬和林杰。从修改后的短时傅里叶变换中估计信号。在*ICASSP '83。IEEE国际声学、语音和信号处理会议*中,卷8,804-807。1983年。[doi:10.1109/ICASSP.1983.1172092](https://doi.org/10.1109/ICASSP.1983.1172092)。 - en: '[GQC+20]' + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: '[GQC+20]' - en: 'Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. Conformer: convolution-augmented transformer for speech recognition. 2020\. [arXiv:2005.08100](https://arxiv.org/abs/2005.08100).' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 安莫尔·古拉蒂、詹姆斯·秦、邱中成、尼基·帕马尔、张宇、余佳辉、韩伟、王世博、张正东、吴永辉和庞若明。Conformer:用于语音识别的卷积增强变压器。2020年。[arXiv:2005.08100](https://arxiv.org/abs/2005.08100)。 - en: '[HCC+14]' + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: '[HCC+14]' - en: 'Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. Deep speech: scaling up end-to-end speech recognition. 2014\. [arXiv:1412.5567](https://arxiv.org/abs/1412.5567).' + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 奥尼·汉农、卡尔·凯斯、贾里德·卡斯珀、布莱恩·卡坦扎罗、格雷格·迪阿莫斯、埃里希·埃尔森、瑞安·普伦格、桑杰夫·萨蒂什、舒博·森古普塔、亚当·科茨和安德鲁·Y. + 吴。深度语音:扩展端到端语音识别。2014年。[arXiv:1412.5567](https://arxiv.org/abs/1412.5567)。 - en: '[HCE+17]' + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: '[HCE+17]' - en: 'Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In *International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. 2017\. URL: [https://arxiv.org/abs/1609.09430](https://arxiv.org/abs/1609.09430).' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 肖恩·赫尔希、索里什·乔杜里、丹尼尔·P. W. 艾利斯、约特·F. 格梅克、阿伦·詹森、查宁·摩尔、马诺杰·普拉卡尔、德文·普拉特、里夫·A. 索罗斯、布莱恩·塞伯尔德、马尔科姆·斯兰尼、罗恩·韦斯和凯文·威尔逊。用于大规模音频分类的CNN架构。在*国际声学、语音和信号处理会议(ICASSP)*中。2017年。网址:[https://arxiv.org/abs/1609.09430](https://arxiv.org/abs/1609.09430)。 - en: '[HIA+17]' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: '[HIA+17]' - en: Takuya Higuchi, Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Marc Delcroix, and Tomohiro Nakatani. Online mvdr beamformer based on complex gaussian mixture model with spatial prior for noise robust asr. *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, 25(4):780–793, 2017. + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 樋口拓也、伊藤伸孝、荒木祥子、吉冈拓也、马克·德尔克罗伊和中谷智博。基于复高斯混合模型的在线mvdr波束形成器,具有空间先验用于噪声鲁棒的asr。*IEEE/ACM音频、语音和语言处理交易*,25(4):780-793,2017年。 - en: '[HIYN16]' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: '[HIYN16]' - en: Takuya Higuchi, Nobutaka Ito, Takuya Yoshioka, and Tomohiro Nakatani. Robust mvdr beamforming using time-frequency masks for online/offline asr in noise. In *2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 5210–5214\. IEEE, 2016. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 樋口拓也、伊藤伸孝、吉冈拓也和中谷智博。使用时频掩模进行在线/离线噪声下的鲁棒mvdr波束形成。在*2016年IEEE国际声学、语音和信号处理会议(ICASSP)*中,5210-5214。IEEE,2016年。 - en: '[HBT+21]' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: '[HBT+21]' - en: 'Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: self-supervised speech representation learning by masked prediction of hidden units. 2021\. [arXiv:2106.07447](https://arxiv.org/abs/2106.07447).' + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 徐伟宁、本杰明·博尔特、蔡耀宏、库沙尔·拉克霍蒂亚、鲁斯兰·萨拉胡特迪诺夫和阿卜杜勒拉曼·穆罕默德。Hubert:通过隐藏单元的掩码预测进行自监督语音表示学习。2021年。[arXiv:2106.07447](https://arxiv.org/abs/2106.07447)。 - en: '[IJ17]' + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: '[IJ17]' - en: Keith Ito and Linda Johnson. The lj speech dataset. [https://keithito.com/LJ-Speech-Dataset/](https://keithito.com/LJ-Speech-Dataset/), 2017. + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 基思伊托和琳达约翰逊。LJ语音数据集。[https://keithito.com/LJ-Speech-Dataset/](https://keithito.com/LJ-Speech-Dataset/),2017年。 - en: '[KPL+22]' + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: '[KPL+22]' - en: 'Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, and others. Flashlight: enabling innovation in tools for machine learning. *arXiv preprint arXiv:2201.12465*, 2022.' + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 雅各布·卡恩、维尼尔·普拉塔普、塔蒂亚娜·利霍马年科、钱通徐、奥尼·汉农、杰夫·凯、帕登·托马塞洛、安·李、埃杜瓦·格雷夫、吉拉德·阿维多夫等。Flashlight:为机器学习工具创新提供支持。*arXiv预印本arXiv:2201.12465*,2022年。 - en: '[KES+18a]' + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: '[KES+18a]' - en: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. Efficient neural audio synthesis. 2018\. [arXiv:1802.08435](https://arxiv.org/abs/1802.08435). + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 纳尔·卡尔布伦纳、埃里希·埃尔森、卡伦·西蒙扬、塞布·努里、诺曼·卡萨格兰德、爱德华·洛克哈特、弗洛里安·斯蒂姆伯格、亚伦·范登·奥尔德、桑德·迪勒曼和科雷·卡武克乔格卢。高效的神经音频合成。2018年。[arXiv:1802.08435](https://arxiv.org/abs/1802.08435)。 - en: '[KES+18b]' + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: '[KES+18b]' - en: 'Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aäron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. Efficient neural audio synthesis. *CoRR*, 2018\. URL: [http://arxiv.org/abs/1802.08435](http://arxiv.org/abs/1802.08435), [arXiv:1802.08435](https://arxiv.org/abs/1802.08435).' + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: 纳尔·卡尔布伦纳、埃里希·埃尔森、卡伦·西蒙扬、塞布·努里、诺曼·卡萨格兰德、爱德华·洛克哈特、弗洛里安·斯蒂姆伯格、阿伦·范登·奥尔德、桑德·迪勒曼和科雷·卡武克乔格卢。高效的神经音频合成。*CoRR*,2018年。网址:[http://arxiv.org/abs/1802.08435](http://arxiv.org/abs/1802.08435),[arXiv:1802.08435](https://arxiv.org/abs/1802.08435)。 - en: '[KPPK15]' + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: '[KPPK15]' - en: Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. Audio augmentation for speech recognition. In *Proc. Interspeech 2015*, 3586–3589\. 2015\. [doi:10.21437/Interspeech.2015-711](https://doi.org/10.21437/Interspeech.2015-711). + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: Tom Ko,Vijayaditya Peddinti,Daniel Povey和Sanjeev Khudanpur。用于语音识别的音频增强。在*Interspeech + 2015会议论文集*中,3586-3589。2015年。[doi:10.21437/Interspeech.2015-711](https://doi.org/10.21437/Interspeech.2015-711)。 - en: '[KBV03]' + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: '[KBV03]' - en: John Kominek, Alan W Black, and Ver Ver. Cmu arctic databases for speech synthesis. Technical Report, 2003. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: John Kominek,Alan W Black和Ver Ver。用于语音合成的CMU北极数据库。技术报告,2003年。 - en: '[KKB20]' + id: totrans-66 prefs: [] type: TYPE_NORMAL + zh: '[KKB20]' - en: 'Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, *Advances in Neural Information Processing Systems*, volume 33, 17022–17033\. Curran Associates, Inc., 2020\. URL: [https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf](https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf).' + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: Jungil Kong,Jaehyeon Kim和Jaekyoung Bae。Hifi-gan:用于高效和高保真度语音合成的生成对抗网络。在H. Larochelle,M. + Ranzato,R. Hadsell,M.F. Balcan和H. Lin编辑的*神经信息处理系统进展*中,卷33,17022-17033。Curran Associates, + Inc.,2020年。网址:[https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf](https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf)。 - en: '[KTN+23]' + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: '[KTN+23]' - en: 'Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, and Buye Xu. Torchaudio-squim: reference-less speech quality and intelligibility measures in torchaudio. *arXiv preprint arXiv:2304.01448*, 2023.' + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: Anurag Kumar,Ke Tan,Zhaoheng Ni,Pranay Manocha,Xiaohui Zhang,Ethan Henderson和Buye + Xu。Torchaudio-squim:Torchaudio中无参考语音质量和可懂度测量。*arXiv预印本arXiv:2304.01448*,2023年。 - en: '[LRI+19]' + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: '[LRI+19]' - en: Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, *Proc. of Interspeech*, 814–818\. 2019. + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: Loren Lugosch,Mirco Ravanelli,Patrick Ignoto,Vikrant Singh Tomar和Yoshua Bengio。端到端口语言理解的语音模型预训练。在Gernot + Kubin和Zdravko Kacic编辑的*Interspeech会议论文集*中,814-818。2019年。 - en: '[LM19]' + id: totrans-72 prefs: [] type: TYPE_NORMAL + zh: '[LM19]' - en: 'Yi Luo and Nima Mesgarani. Conv-tasnet: surpassing ideal time–frequency magnitude masking for speech separation. *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, 27(8):1256–1266, Aug 2019\. URL: [http://dx.doi.org/10.1109/TASLP.2019.2915167](http://dx.doi.org/10.1109/TASLP.2019.2915167), [doi:10.1109/taslp.2019.2915167](https://doi.org/10.1109/taslp.2019.2915167).' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: Yi Luo和Nima Mesgarani。Conv-tasnet:超越理想的时频幅度屏蔽进行语音分离。*IEEE/ACM音频、语音和语言处理交易*,27(8):1256-1266,2019年8月。网址:[http://dx.doi.org/10.1109/TASLP.2019.2915167](http://dx.doi.org/10.1109/TASLP.2019.2915167),[doi:10.1109/taslp.2019.2915167](https://doi.org/10.1109/taslp.2019.2915167)。 - en: '[MK22]' + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: '[MK22]' - en: Pranay Manocha and Anurag Kumar. Speech quality assessment through mos using non-matching references. *arXiv preprint arXiv:2206.12285*, 2022. + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: Pranay Manocha和Anurag Kumar。使用非匹配参考进行MOS的语音质量评估。*arXiv预印本arXiv:2206.12285*,2022年。 - en: '[MRFB+15]' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: '[MRFB+15]' - en: 'Xavier Anguera Miro, Luis Javier Rodriguez-Fuentes, Andi Buzo, Florian Metze, Igor Szoke, and Mikel Peñagarikano. Quesst2014: evaluating query-by-example speech search in a zero-resource setting with real-life queries. *2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, pages 5833–5837, 2015.' + id: totrans-77 prefs: [] type: TYPE_NORMAL + zh: Xavier Anguera Miro,Luis Javier Rodriguez-Fuentes,Andi Buzo,Florian Metze,Igor + Szoke和Mikel Peñagarikano。Quesst2014:在零资源环境中使用真实查询评估基于示例语音搜索。*2015年IEEE国际声学、语音和信号处理会议(ICASSP)*,2015年,页码5833-5837。 - en: '[MPG29]' + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: '[MPG29]' - en: RV Mises and Hilda Pollaczek-Geiringer. Praktische verfahren der gleichungsauflösung. *ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik*, 9(1):58–77, 1929. + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: RV Mises和Hilda Pollaczek-Geiringer。等式求解的实用方法。*ZAMM-应用数学和力学杂志/应用数学和力学杂志*,9(1):58-77,1929年。 - en: '[Mys14]' + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: '[Mys14]' - en: Gautham J Mysore. Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?—a dataset, insights, and challenges. *IEEE Signal Processing Letters*, 22(8):1006–1010, 2014. + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: Gautham J Mysore。我们能否自动将在真实环境中使用普通消费设备录制的语音转换为专业制作质量的语音?—数据集、见解和挑战。*IEEE信号处理通信*,22(8):1006-1010,2014年。 - en: '[NCZ17]' + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: '[NCZ17]' - en: 'Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. *arXiv preprint arXiv:1706.08612*, 2017.' + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: Arsha Nagrani,Joon Son Chung和Andrew Zisserman。Voxceleb:一个大规模的说话者识别数据集。*arXiv预印本arXiv:1706.08612*,2017年。 - en: '[PCPK15]' + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: '[PCPK15]' - en: 'Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In *2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, volume, 5206–5210\. 2015\. [doi:10.1109/ICASSP.2015.7178964](https://doi.org/10.1109/ICASSP.2015.7178964).' + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: Vassil Panayotov,Guoguo Chen,Daniel Povey和Sanjeev Khudanpur。Librispeech:基于公共领域有声书的ASR语料库。在*2015年IEEE国际声学、语音和信号处理会议(ICASSP)*中,卷,5206-5210。2015年。[doi:10.1109/ICASSP.2015.7178964](https://doi.org/10.1109/ICASSP.2015.7178964)。 - en: '[PCZ+19]' + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: '[PCZ+19]' - en: 'Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, and Quoc V. Le. Specaugment: a simple data augmentation method for automatic speech recognition. *Interspeech 2019*, Sep 2019\. URL: [http://dx.doi.org/10.21437/Interspeech.2019-2680](http://dx.doi.org/10.21437/Interspeech.2019-2680), [doi:10.21437/interspeech.2019-2680](https://doi.org/10.21437/interspeech.2019-2680).' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: Daniel S. Park,William Chan,Yu Zhang,Chung-Cheng Chiu,Barret Zoph,Ekin D. Cubuk和Quoc + V. Le。Specaugment:一种用于自动语音识别的简单数据增强方法。*Interspeech 2019*,2019年9月。网址:[http://dx.doi.org/10.21437/Interspeech.2019-2680](http://dx.doi.org/10.21437/Interspeech.2019-2680),[doi:10.21437/interspeech.2019-2680](https://doi.org/10.21437/interspeech.2019-2680)。 - en: '[PBS13]' + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: '[PBS13]' - en: Nathanaël Perraudin, Peter Balazs, and Peter L. Søndergaard. A fast griffin-lim algorithm. In *2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics*, volume, 1–4\. 2013\. [doi:10.1109/WASPAA.2013.6701851](https://doi.org/10.1109/WASPAA.2013.6701851). + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: Nathanaël Perraudin,Peter Balazs和Peter L. Søndergaard。一种快速的Griffin-Lim算法。在*2013年IEEE信号处理应用研讨会*中,卷,1-4。2013年。[doi:10.1109/WASPAA.2013.6701851](https://doi.org/10.1109/WASPAA.2013.6701851)。 - en: '[PTS+23]' + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: '[PTS+23]' - en: Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, and Michael Auli. Scaling speech technology to 1,000+ languages. 2023\. [arXiv:2305.13516](https://arxiv.org/abs/2305.13516). + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: Vineel Pratap,Andros Tjandra,Bowen Shi,Paden Tomasello,Arun Babu,Sayani Kundu,Ali + Elkahky,Zhaoheng Ni,Apoorv Vyas,Maryam Fazel-Zarandi,Alexei Baevski,Yossi Adi,张晓辉,徐伟宁,Alexis + Conneau和Michael Auli。将语音技术扩展到1000多种语言。2023年。arXiv:2305.13516。 - en: '[PXS+20]' + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: '[PXS+20]' - en: 'Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, and Ronan Collobert. Mls: a large-scale multilingual dataset for speech research. *Interspeech 2020*, Oct 2020\. URL: [http://dx.doi.org/10.21437/Interspeech.2020-2826](http://dx.doi.org/10.21437/Interspeech.2020-2826), [doi:10.21437/interspeech.2020-2826](https://doi.org/10.21437/interspeech.2020-2826).' + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: Vineel Pratap,Qiantong Xu,Anuroop Sriram,Gabriel Synnaeve和Ronan Collobert。MLS:用于语音研究的大规模多语言数据集。Interspeech + 2020,2020年10月。URL:http://dx.doi.org/10.21437/Interspeech.2020-2826,doi:10.21437/interspeech.2020-2826。 - en: '[RLStoter+19]' + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: '[RLStoter+19]' - en: 'Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December 2019\. URL: [https://doi.org/10.5281/zenodo.3338373](https://doi.org/10.5281/zenodo.3338373), [doi:10.5281/zenodo.3338373](https://doi.org/10.5281/zenodo.3338373).' + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: Zafar Rafii,Antoine Liutkus,Fabian-Robert Stöter,Stylianos Ioannis Mimilakis和Rachel + Bittner。MUSDB18-HQ - musdb18的未压缩版本。2019年12月。URL:https://doi.org/10.5281/zenodo.3338373,doi:10.5281/zenodo.3338373。 - en: '[RGC+20]' + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: '[RGC+20]' - en: 'Chandan KA Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, and others. The interspeech 2020 deep noise suppression challenge: datasets, subjective testing framework, and challenge results. *arXiv preprint arXiv:2005.13981*, 2020.' + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: Chandan KA Reddy,Vishak Gopal,Ross Cutler,Ebrahim Beyrami,Roger Cheng,Harishchandra + Dubey,Sergiy Matusevych,Robert Aichner,Ashkan Aazami,Sebastian Braun等人。Interspeech + 2020深度降噪挑战:数据集,主观测试框架和挑战结果。arXiv预印本arXiv:2005.13981,2020年。 - en: '[RDelegliseEsteve12]' + id: totrans-98 prefs: [] type: TYPE_NORMAL + zh: '[RDelegliseEsteve12]' - en: 'Anthony Rousseau, Paul Deléglise, and Yannick Estève. Ted-lium: an automatic speech recognition dedicated corpus. In *Conference on Language Resources and Evaluation (LREC)*, 125–129\. 2012.' + id: totrans-99 prefs: [] type: TYPE_NORMAL + zh: 安东尼·鲁索,保罗·德勒格利斯和亚尼克·埃斯特韦。Ted-lium:一种专用于自动语音识别的语料库。在语言资源和评估会议(LREC)中,125-129页。2012年。 - en: '[SY18]' + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: '[SY18]' - en: Seyyed Saeed Sarfjoo and Junichi Yamagishi. Device recorded vctk (small subset version). 2018. + id: totrans-101 prefs: [] type: TYPE_NORMAL + zh: Seyyed Saeed Sarfjoo和山岸淳一。设备录制的vctk(小型子集版本)。2018年。 - en: '[SBDokmanic18]' + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: '[SBDokmanic18]' - en: 'Robin Scheibler, Eric Bezzam, and Ivan Dokmanić. Pyroomacoustics: a python package for audio room simulation and array processing algorithms. In *2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)*, 351–355\. IEEE, 2018.' + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: 罗宾·施伯勒,埃里克·贝扎姆和伊万·多克曼尼奇。Pyroomacoustics:用于音频房间模拟和阵列处理算法的Python软件包。在2018年IEEE国际声学、语音和信号处理会议(ICASSP)中,351-355页。IEEE,2018年。 - en: '[SPW+18]' + id: totrans-104 prefs: [] type: TYPE_NORMAL + zh: '[SPW+18]' - en: Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan, and others. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In *2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 4779–4783\. IEEE, 2018. + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: 乔纳森·申,Ruoming Pang,Ron J Weiss,Mike Schuster,Navdeep Jaitly,Zongheng Yang,Zhifeng + Chen,张宇,王宇轩,Rj Skerrv-Ryan等人。通过在mel频谱图预测上对wavenet进行条件化的自然tts合成。在2018年IEEE国际声学、语音和信号处理会议(ICASSP)中,4779-4783页。IEEE,2018年。 - en: '[SWW+21]' + id: totrans-106 prefs: [] type: TYPE_NORMAL + zh: '[SWW+21]' - en: 'Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, and Mike Seltzer. Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In *ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 6783–6787\. 2021.' + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: 杨洋石,王永强,吴春阳,叶青峰,陈俊,张弗兰克,勒杜克和迈克·塞尔策。Emformer:用于低延迟流式语音识别的高效内存变压器基础声学模型。在ICASSP + 2021 - 2021年IEEE国际声学、语音和信号处理会议(ICASSP)中,6783-6787页。2021年。 - en: '[SWW+22]' + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: '[SWW+22]' - en: Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, and Mike Seltzer. Streaming transformer transducer based speech recognition using non-causal convolution. In *ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, volume, 8277–8281\. 2022\. [doi:10.1109/ICASSP43922.2022.9747706](https://doi.org/10.1109/ICASSP43922.2022.9747706). + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: 杨洋石,春阳吴,迪林王,Alex Xiao,Jay Mahadeokar,张晓辉,刘春喜,李克,尚冠元,瓦伦·纳加拉贾,奥兹莱姆·卡林利和迈克·塞尔策。基于非因果卷积的流式变压器传导器语音识别。在ICASSP + 2022 - 2022年IEEE国际声学、语音和信号处理会议(ICASSP)中,卷,8277-8281页。2022年。doi:10.1109/ICASSP43922.2022.9747706。 - en: '[Smi20]' + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: '[Smi20]' - en: 'Julius O. Smith. Digital audio resampling home page "theory of ideal bandlimited interpolation" section. September 2020\. URL: [https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html](https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html).' + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 朱利叶斯·O·史密斯。数字音频重采样主页“理想带限插值理论”部分。2020年9月。URL:https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html。 - en: '[SCP15]' + id: totrans-112 prefs: [] type: TYPE_NORMAL + zh: '[SCP15]' - en: 'David Snyder, Guoguo Chen, and Daniel Povey. MUSAN: A Music, Speech, and Noise Corpus. 2015\. arXiv:1510.08484v1\. [arXiv:1510.08484](https://arxiv.org/abs/1510.08484).' + id: totrans-113 prefs: [] type: TYPE_NORMAL + zh: 大卫·斯奈德,陈国国和丹尼尔·波维。MUSAN:一个音乐、语音和噪声语料库。2015年。arXiv:1510.08484v1。arXiv:1510.08484。 - en: '[SBA09]' + id: totrans-114 prefs: [] type: TYPE_NORMAL + zh: '[SBA09]' - en: Mehrez Souden, Jacob Benesty, and Sofiene Affes. On optimal frequency-domain multichannel linear filtering for noise reduction. In *IEEE Transactions on audio, speech, and language processing*, volume 18, 260–276\. IEEE, 2009. + id: totrans-115 prefs: [] type: TYPE_NORMAL + zh: Mehrez Souden,Jacob Benesty和Sofiene Affes。关于噪声降低的最佳频域多通道线性滤波。在IEEE音频、语音和语言处理交易中,卷18,260-276页。IEEE,2009年。 - en: '[SWT+22]' + id: totrans-116 prefs: [] type: TYPE_NORMAL + zh: '[SWT+22]' - en: Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, and Yatharth Saraf. Conformer-based self-supervised learning for non-speech audio tasks. In *ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, volume, 8862–8866\. 2022\. [doi:10.1109/ICASSP43922.2022.9746490](https://doi.org/10.1109/ICASSP43922.2022.9746490). + id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika + Singh, and Yatharth Saraf. Conformer-based self-supervised learning for non-speech + audio tasks. In *ICASSP 2022 - 2022 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP)*, volume, 8862–8866\. 2022\. [doi:10.1109/ICASSP43922.2022.9746490](https://doi.org/10.1109/ICASSP43922.2022.9746490). - en: '[TEC01]' + id: totrans-118 prefs: [] type: TYPE_NORMAL + zh: '[TEC01]' - en: 'George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification of audio signals. 2001\. URL: [http://ismir2001.ismir.net/pdf/tzanetakis.pdf](http://ismir2001.ismir.net/pdf/tzanetakis.pdf).' + id: totrans-119 prefs: [] type: TYPE_NORMAL + zh: 'George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification + of audio signals. 2001\. URL: [http://ismir2001.ismir.net/pdf/tzanetakis.pdf](http://ismir2001.ismir.net/pdf/tzanetakis.pdf).' - en: '[VAlumae21]' + id: totrans-120 prefs: [] type: TYPE_NORMAL + zh: '[VAlumae21]' - en: 'Jörgen Valk and Tanel Alumäe. Voxlingua107: a dataset for spoken language recognition. In *2021 IEEE Spoken Language Technology Workshop (SLT)*, 652–658\. IEEE, 2021.' + id: totrans-121 prefs: [] type: TYPE_NORMAL + zh: 'Jörgen Valk and Tanel Alumäe. Voxlingua107: a dataset for spoken language recognition. + In *2021 IEEE Spoken Language Technology Workshop (SLT)*, 652–658\. IEEE, 2021.' - en: '[WRiviereL+21]' + id: totrans-122 prefs: [] type: TYPE_NORMAL + zh: '[WRiviereL+21]' - en: 'Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, and Emmanuel Dupoux. Voxpopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. *CoRR*, 2021\. URL: [https://arxiv.org/abs/2101.00390](https://arxiv.org/abs/2101.00390), [arXiv:2101.00390](https://arxiv.org/abs/2101.00390).' + id: totrans-123 prefs: [] type: TYPE_NORMAL + zh: 'Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel + Haziza, Mary Williamson, Juan Miguel Pino, and Emmanuel Dupoux. Voxpopuli: A large-scale + multilingual speech corpus for representation learning, semi-supervised learning + and interpretation. *CoRR*, 2021\. URL: [https://arxiv.org/abs/2101.00390](https://arxiv.org/abs/2101.00390), + [arXiv:2101.00390](https://arxiv.org/abs/2101.00390).' - en: '[Wei98]' + id: totrans-124 prefs: [] type: TYPE_NORMAL + zh: '[Wei98]' - en: 'R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: [http://www.speech.cs.cmu.edu/cgi-bin/cmudict](http://www.speech.cs.cmu.edu/cgi-bin/cmudict).' + id: totrans-125 prefs: [] type: TYPE_NORMAL + zh: 'R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: [http://www.speech.cs.cmu.edu/cgi-bin/cmudict](http://www.speech.cs.cmu.edu/cgi-bin/cmudict).' - en: '[YVM19]' + id: totrans-126 prefs: [] type: TYPE_NORMAL + zh: '[YVM19]' - en: 'Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. [doi:10.7488/ds/2645](https://doi.org/10.7488/ds/2645).' + id: totrans-127 prefs: [] type: TYPE_NORMAL + zh: 'Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: + english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. + [doi:10.7488/ds/2645](https://doi.org/10.7488/ds/2645).' - en: '[ZDC+19]' + id: totrans-128 prefs: [] type: TYPE_NORMAL + zh: '[ZDC+19]' - en: 'Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. *ArXiv*, 2019.' + id: totrans-129 prefs: [] type: TYPE_NORMAL + zh: 'Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, + Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. + *ArXiv*, 2019.' - en: '[ZSN21]' + id: totrans-130 prefs: [] type: TYPE_NORMAL + zh: '[ZSN21]' - en: Albert Zeyer, Ralf Schlüter, and Hermann Ney. Why does ctc result in peaky behavior? 2021\. [arXiv:2105.14849](https://arxiv.org/abs/2105.14849). + id: totrans-131 prefs: [] type: TYPE_NORMAL + zh: Albert Zeyer, Ralf Schlüter, and Hermann Ney. Why does ctc result in peaky behavior? + 2021\. [arXiv:2105.14849](https://arxiv.org/abs/2105.14849). - en: '[BrianMcFeeColinRaffelDawenLiang+15]' + id: totrans-132 prefs: [] type: TYPE_NORMAL + zh: '[BrianMcFeeColinRaffelDawenLiang+15]' - en: 'Brian McFee, Colin Raffel, Dawen Liang, Daniel P.W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. Librosa: Audio and Music Signal Analysis in Python. In Kathryn Huff and James Bergstra, editors, *Proceedings of the 14th Python in Science Conference*, 18 – 24\. 2015\. [doi:10.25080/Majora-7b98e3ed-003](https://doi.org/10.25080/Majora-7b98e3ed-003).' + id: totrans-133 prefs: [] type: TYPE_NORMAL + zh: 'Brian McFee, Colin Raffel, Dawen Liang, Daniel P.W. Ellis, Matt McVicar, Eric + Battenberg, and Oriol Nieto. Librosa: Audio and Music Signal Analysis in Python. + In Kathryn Huff and James Bergstra, editors, *Proceedings of the 14th Python in + Science Conference*, 18 – 24\. 2015\. [doi:10.25080/Majora-7b98e3ed-003](https://doi.org/10.25080/Majora-7b98e3ed-003).' - en: '[KahnRiviereZheng+20]' + id: totrans-134 prefs: [] type: TYPE_NORMAL + zh: '[KahnRiviereZheng+20]' - en: 'J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. In *ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 7669–7673\. 2020\. [https://github.com/facebookresearch/libri-light](https://github.com/facebookresearch/libri-light).' + id: totrans-135 prefs: [] type: TYPE_NORMAL + zh: 'J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In *ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP)*, 7669–7673\. 2020\. [https://github.com/facebookresearch/libri-light](https://github.com/facebookresearch/libri-light).' - en: '[Warden18]' + id: totrans-136 prefs: [] type: TYPE_NORMAL + zh: '[Warden18]' - en: 'P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. *ArXiv e-prints*, April 2018\. URL: [https://arxiv.org/abs/1804.03209](https://arxiv.org/abs/1804.03209), [arXiv:1804.03209](https://arxiv.org/abs/1804.03209).' + id: totrans-137 prefs: [] type: TYPE_NORMAL + zh: 'P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. + *ArXiv e-prints*, April 2018\. URL: [https://arxiv.org/abs/1804.03209](https://arxiv.org/abs/1804.03209), + [arXiv:1804.03209](https://arxiv.org/abs/1804.03209).' - en: '[Wikipediacontributors]' + id: totrans-138 prefs: [] type: TYPE_NORMAL + zh: '[Wikipediacontributors]' - en: 'Wikipedia contributors. Absorption (acoustics) — Wikipedia, the free encyclopedia. [Online]. URL: [https://en.wikipedia.org/wiki/Absorption_(acoustics)](https://en.wikipedia.org/wiki/Absorption_(acoustics)).' + id: totrans-139 prefs: [] type: TYPE_NORMAL + zh: 'Wikipedia contributors. Absorption (acoustics) — Wikipedia, the free encyclopedia. + [Online]. URL: [https://en.wikipedia.org/wiki/Absorption_(acoustics)](https://en.wikipedia.org/wiki/Absorption_(acoustics)).' diff --git a/totrans/aud22_07.yaml b/totrans/aud22_07.yaml index 73fd4eef26cb845909e0c6e84606b5873068ae04..58bc4744560a93afc1dac7ccf35962efcb58073d 100644 --- a/totrans/aud22_07.yaml +++ b/totrans/aud22_07.yaml @@ -1,4 +1,6 @@ - en: Installation + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 安装 diff --git a/totrans/aud22_08.yaml b/totrans/aud22_08.yaml index eb512c7cf0e2db1a0718b31a58c1c4f60f258e0d..7b40ba4543daec30caf54e7e8dfdf60537795a05 100644 --- a/totrans/aud22_08.yaml +++ b/totrans/aud22_08.yaml @@ -1,101 +1,148 @@ - en: Installing pre-built binaries + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 安装预构建的二进制文件 - en: 原文:[https://pytorch.org/audio/stable/installation.html](https://pytorch.org/audio/stable/installation.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/installation.html](https://pytorch.org/audio/stable/installation.html) - en: '`torchaudio` has binary distributions for PyPI (`pip`) and Anaconda (`conda`).' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio`有PyPI(`pip`)和Anaconda(`conda`)的二进制发行版。' - en: Please refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) for the details. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 有关详细信息,请参考[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)。 - en: Note + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Each `torchaudio` package is compiled against specific version of `torch`. Please refer to the following table and install the correct pair of `torch` and `torchaudio`. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 每个`torchaudio`包都是针对特定版本的`torch`编译的。请参考以下表格并安装正确的`torch`和`torchaudio`配对。 - en: Note + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Starting `0.10`, torchaudio has CPU-only and CUDA-enabled binary distributions, each of which requires a corresponding PyTorch distribution. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 从`0.10`开始,torchaudio有仅CPU和启用CUDA的二进制发行版,每个都需要相应的PyTorch发行版。 - en: Note + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: 'This software was compiled against an unmodified copies of FFmpeg, with the specific rpath removed so as to enable the use of system libraries. The LGPL source can be downloaded from the following locations: [n4.1.8](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.4) ([license](https://github.com/FFmpeg/FFmpeg/blob/n4.4.4/COPYING.LGPLv2.1)), [n5.0.3](https://github.com/FFmpeg/FFmpeg/releases/tag/n5.0.3) ([license](https://github.com/FFmpeg/FFmpeg/blob/n5.0.3/COPYING.LGPLv2.1)) and [n6.0](https://github.com/FFmpeg/FFmpeg/releases/tag/n6.0) ([license](https://github.com/FFmpeg/FFmpeg/blob/n6.0/COPYING.LGPLv2.1)).' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 此软件是针对未经修改的FFmpeg副本编译的,特定的rpath已被移除,以便使用系统库。LGPL源代码可以从以下位置下载:[n4.1.8](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.4)([许可证](https://github.com/FFmpeg/FFmpeg/blob/n4.4.4/COPYING.LGPLv2.1)),[n5.0.3](https://github.com/FFmpeg/FFmpeg/releases/tag/n5.0.3)([许可证](https://github.com/FFmpeg/FFmpeg/blob/n5.0.3/COPYING.LGPLv2.1))和[n6.0](https://github.com/FFmpeg/FFmpeg/releases/tag/n6.0)([许可证](https://github.com/FFmpeg/FFmpeg/blob/n6.0/COPYING.LGPLv2.1))。 - en: Dependencies[](#dependencies "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 依赖项[](#dependencies "Permalink to this heading") - en: '[PyTorch](https://pytorch.org)' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[PyTorch](https://pytorch.org)' - en: Please refer to the compatibility matrix bellow for supported PyTorch versions. + id: totrans-12 prefs: - PREF_IND type: TYPE_NORMAL + zh: 请参考下面的兼容矩阵以获取支持的PyTorch版本。 - en: '### Optional Dependencies[](#optional-dependencies "Permalink to this heading")' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: '### 可选依赖项[](#optional-dependencies "Permalink to this heading")' - en: '[FFmpeg](https://ffmpeg.org)' + id: totrans-14 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[FFmpeg](https://ffmpeg.org)' - en: Required to use [`torchaudio.io`](io.html#module-torchaudio.io "torchaudio.io") module. and `backend="ffmpeg"` in [I/O functions](./torchaudio.html#i-o). + id: totrans-15 prefs: - PREF_IND type: TYPE_NORMAL + zh: 需要使用[`torchaudio.io`](io.html#module-torchaudio.io "torchaudio.io")模块和在[I/O函数](./torchaudio.html#i-o)中使用`backend="ffmpeg"`。 - en: Starting version 2.1, TorchAudio official binary distributions are compatible with FFmpeg version 6, 5 and 4\. (>=4.4, <7). At runtime, TorchAudio first looks for FFmpeg 6, if not found, then it continues to looks for 5 and move on to 4. + id: totrans-16 prefs: - PREF_IND type: TYPE_NORMAL + zh: 从版本2.1开始,TorchAudio官方二进制发行版与FFmpeg版本6、5和4兼容(>=4.4,<7)。在运行时,TorchAudio首先搜索FFmpeg + 6,如果未找到,则继续搜索5,然后转到4。 - en: There are multiple ways to install FFmpeg libraries. Please refer to the official documentation for how to install FFmpeg. If you are using Anaconda Python distribution, `conda install -c conda-forge 'ffmpeg<7'` will install compatible FFmpeg libraries. + id: totrans-17 prefs: - PREF_IND type: TYPE_NORMAL + zh: 有多种安装FFmpeg库的方法。请参考官方文档了解如何安装FFmpeg。如果您使用Anaconda Python发行版,`conda install -c + conda-forge 'ffmpeg<7'`将安装兼容的FFmpeg库。 - en: If you need to specify the version of FFmpeg TorchAudio searches and links, you can specify it via the environment variable `TORIO_USE_FFMPEG_VERSION`. For example, by setting `TORIO_USE_FFMPEG_VERSION=5`, TorchAudio will only look for FFmpeg 5. + id: totrans-18 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果您需要指定TorchAudio搜索和链接的FFmpeg版本,可以通过环境变量`TORIO_USE_FFMPEG_VERSION`指定。例如,通过设置`TORIO_USE_FFMPEG_VERSION=5`,TorchAudio将只搜索FFmpeg + 5。 - en: If for some reason, this search mechanism is causing an issue, you can disable the FFmpeg integration entirely by setting the environment variable `TORIO_USE_FFMPEG=0`. + id: totrans-19 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果由于某种原因,此搜索机制导致问题,您可以通过设置环境变量`TORIO_USE_FFMPEG=0`完全禁用FFmpeg集成。 - en: There are multiple ways to install FFmpeg libraries. If you are using Anaconda Python distribution, `conda install -c conda-forge 'ffmpeg<7'` will install compatible FFmpeg libraries. + id: totrans-20 prefs: - PREF_IND type: TYPE_NORMAL + zh: 有多种安装FFmpeg库的方法。如果您使用Anaconda Python发行版,`conda install -c conda-forge 'ffmpeg<7'`将安装兼容的FFmpeg库。 - en: Note + id: totrans-21 prefs: - PREF_IND type: TYPE_NORMAL + zh: 注意 - en: When searching for FFmpeg installation, TorchAudio looks for library files which have names with version numbers. That is, `libavutil.so.` for Linux, `libavutil..dylib` for macOS, and `avutil-.dll` for Windows. @@ -104,144 +151,220 @@ double check that the library files you installed follow this naming scheme, (and then make sure that they are in one of the directories listed in library search path.) + id: totrans-22 prefs: - PREF_IND type: TYPE_NORMAL + zh: 在搜索FFmpeg安装时,TorchAudio会查找具有版本号的库文件。也就是说,对于Linux是`libavutil.so.`,对于macOS是`libavutil..dylib`,对于Windows是`avutil-.dll`。许多公共预构建的二进制文件遵循这种命名方案,但有些发行版具有无版本号的文件名。如果您在检测FFmpeg时遇到困难,请仔细检查您安装的库文件是否遵循这种命名方案(然后确保它们位于列出的库搜索路径之一)。 - en: '[SoX](https://sox.sourceforge.net/)' + id: totrans-23 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[SoX](https://sox.sourceforge.net/)' - en: Required to use `backend="sox"` in [I/O functions](./torchaudio.html#i-o). + id: totrans-24 prefs: - PREF_IND type: TYPE_NORMAL + zh: 在[I/O函数](./torchaudio.html#i-o)中需要使用`backend="sox"`。 - en: Starting version 2.1, TorchAudio requires separately installed libsox. + id: totrans-25 prefs: - PREF_IND type: TYPE_NORMAL + zh: 从版本2.1开始,TorchAudio需要单独安装libsox。 - en: If dynamic linking is causing an issue, you can set the environment variable `TORCHAUDIO_USE_SOX=0`, and TorchAudio won’t use SoX. + id: totrans-26 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果动态链接导致问题,您可以设置环境变量`TORCHAUDIO_USE_SOX=0`,TorchAudio将不使用SoX。 - en: Note + id: totrans-27 prefs: - PREF_IND type: TYPE_NORMAL + zh: 注意 - en: TorchAudio looks for a library file with unversioned name, that is `libsox.so` for Linux, and `libsox.dylib` for macOS. Some package managers install the library file with different name. For example, aptitude on Ubuntu installs `libsox.so.3`. To have TorchAudio link against it, you can create a symbolic link to it with name `libsox.so` (and put the symlink in a library search path). + id: totrans-28 prefs: - PREF_IND type: TYPE_NORMAL + zh: TorchAudio在Linux上寻找具有无版本名称的库文件,即`libsox.so`,在macOS上为`libsox.dylib`。一些软件包管理器使用不同的名称安装库文件。例如,Ubuntu上的aptitude安装了`libsox.so.3`。为了让TorchAudio链接到它,您可以创建一个名为`libsox.so`的符号链接(并将符号链接放在库搜索路径中)。 - en: Note + id: totrans-29 prefs: - PREF_IND type: TYPE_NORMAL + zh: 注意 - en: TorchAudio is tested on libsox 14.4.2\. (And it is unlikely that other versions would work.) + id: totrans-30 prefs: - PREF_IND type: TYPE_NORMAL + zh: TorchAudio在libsox 14.4.2上进行了测试。 (其他版本可能不起作用。) - en: '[SoundFile](https://pysoundfile.readthedocs.io/)' + id: totrans-31 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[SoundFile](https://pysoundfile.readthedocs.io/)' - en: Required to use `backend="soundfile"` in [I/O functions](./torchaudio.html#i-o). + id: totrans-32 prefs: - PREF_IND type: TYPE_NORMAL + zh: 在[I/O函数](./torchaudio.html#i-o)中使用`backend="soundfile"`所需。 - en: '[sentencepiece](https://pypi.org/project/sentencepiece/)' + id: totrans-33 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[sentencepiece](https://pypi.org/project/sentencepiece/)' - en: Required for performing automatic speech recognition with [Emformer RNN-T](pipelines.html#rnnt). You can install it by running `pip install sentencepiece`. + id: totrans-34 prefs: - PREF_IND type: TYPE_NORMAL + zh: 使用[Emformer RNN-T](pipelines.html#rnnt)执行自动语音识别所需。您可以通过运行`pip install sentencepiece`来安装它。 - en: '[deep-phonemizer](https://pypi.org/project/deep-phonemizer/)' + id: totrans-35 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[deep-phonemizer](https://pypi.org/project/deep-phonemizer/)' - en: Required for performing text-to-speech with [Tacotron2 Text-To-Speech](pipelines.html#tacotron2). + id: totrans-36 prefs: - PREF_IND type: TYPE_NORMAL + zh: 使用[Tacotron2 Text-To-Speech](pipelines.html#tacotron2)执行文本转语音所需。 - en: '[kaldi_io](https://pypi.org/project/kaldi-io/)' + id: totrans-37 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[kaldi_io](https://pypi.org/project/kaldi-io/)' - en: Required to use [`torchaudio.kaldi_io`](kaldi_io.html#module-torchaudio.kaldi_io "torchaudio.kaldi_io") module. + id: totrans-38 prefs: - PREF_IND type: TYPE_NORMAL + zh: 使用[`torchaudio.kaldi_io`](kaldi_io.html#module-torchaudio.kaldi_io "torchaudio.kaldi_io")模块所需。 - en: Compatibility Matrix[](#compatibility-matrix "Permalink to this heading") + id: totrans-39 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 兼容性矩阵[](#compatibility-matrix "跳转到此标题") - en: The official binary distributions of TorchAudio contain extension modules which are written in C++ and linked against specific versions of PyTorch. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: TorchAudio的官方二进制发行版包含用C++编写的扩展模块,并链接到特定版本的PyTorch。 - en: TorchAudio and PyTorch from different releases cannot be used together. Please refer to the following table for the matching versions. + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: TorchAudio和不同版本的PyTorch不能一起使用。请参考以下表格以获取匹配的版本。 - en: '| `PyTorch` | `TorchAudio` | `Python` |' + id: totrans-42 prefs: [] type: TYPE_TB + zh: '| `PyTorch` | `TorchAudio` | `Python` |' - en: '| --- | --- | --- |' + id: totrans-43 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- |' - en: '| `2.1.0` | `2.1.0` | `>=3.8`, `<=3.11` |' + id: totrans-44 prefs: [] type: TYPE_TB + zh: '| `2.1.0` | `2.1.0` | `>=3.8`, `<=3.11` |' - en: '| `2.0.1` | `2.0.2` | `>=3.8`, `<=3.11` |' + id: totrans-45 prefs: [] type: TYPE_TB + zh: '| `2.0.1` | `2.0.2` | `>=3.8`, `<=3.11` |' - en: '| `2.0.0` | `2.0.1` | `>=3.8`, `<=3.11` |' + id: totrans-46 prefs: [] type: TYPE_TB + zh: '| `2.0.0` | `2.0.1` | `>=3.8`, `<=3.11` |' - en: '| `1.13.1` | `0.13.1` | `>=3.7`, `<=3.10` |' + id: totrans-47 prefs: [] type: TYPE_TB + zh: '| `1.13.1` | `0.13.1` | `>=3.7`, `<=3.10` |' - en: '| `1.13.0` | `0.13.0` | `>=3.7`, `<=3.10` |' + id: totrans-48 prefs: [] type: TYPE_TB + zh: '| `1.13.0` | `0.13.0` | `>=3.7`, `<=3.10` |' - en: '| `1.12.1` | `0.12.1` | `>=3.7`, `<=3.10` |' + id: totrans-49 prefs: [] type: TYPE_TB + zh: '| `1.12.1` | `0.12.1` | `>=3.7`, `<=3.10` |' - en: '| `1.12.0` | `0.12.0` | `>=3.7`, `<=3.10` |' + id: totrans-50 prefs: [] type: TYPE_TB + zh: '| `1.12.0` | `0.12.0` | `>=3.7`, `<=3.10` |' - en: '| `1.11.0` | `0.11.0` | `>=3.7`, `<=3.9` |' + id: totrans-51 prefs: [] type: TYPE_TB + zh: '| `1.11.0` | `0.11.0` | `>=3.7`, `<=3.9` |' - en: '| `1.10.0` | `0.10.0` | `>=3.6`, `<=3.9` |' + id: totrans-52 prefs: [] type: TYPE_TB + zh: '| `1.10.0` | `0.10.0` | `>=3.6`, `<=3.9` |' - en: '| `1.9.1` | `0.9.1` | `>=3.6`, `<=3.9` |' + id: totrans-53 prefs: [] type: TYPE_TB + zh: '| `1.9.1` | `0.9.1` | `>=3.6`, `<=3.9` |' - en: '| `1.8.1` | `0.8.1` | `>=3.6`, `<=3.9` |' + id: totrans-54 prefs: [] type: TYPE_TB + zh: '| `1.8.1` | `0.8.1` | `>=3.6`, `<=3.9` |' - en: '| `1.7.1` | `0.7.2` | `>=3.6`, `<=3.9` |' + id: totrans-55 prefs: [] type: TYPE_TB + zh: '| `1.7.1` | `0.7.2` | `>=3.6`, `<=3.9` |' - en: '| `1.7.0` | `0.7.0` | `>=3.6`, `<=3.8` |' + id: totrans-56 prefs: [] type: TYPE_TB + zh: '| `1.7.0` | `0.7.0` | `>=3.6`, `<=3.8` |' - en: '| `1.6.0` | `0.6.0` | `>=3.6`, `<=3.8` |' + id: totrans-57 prefs: [] type: TYPE_TB + zh: '| `1.6.0` | `0.6.0` | `>=3.6`, `<=3.8` |' - en: '| `1.5.0` | `0.5.0` | `>=3.5`, `<=3.8` |' + id: totrans-58 prefs: [] type: TYPE_TB + zh: '| `1.5.0` | `0.5.0` | `>=3.5`, `<=3.8` |' - en: '| `1.4.0` | `0.4.0` | `==2.7`, `>=3.5`, `<=3.8` |' + id: totrans-59 prefs: [] type: TYPE_TB + zh: '| `1.4.0` | `0.4.0` | `==2.7`, `>=3.5`, `<=3.8` |' diff --git a/totrans/aud22_09.yaml b/totrans/aud22_09.yaml index 751ed201cdd17484fdd83095139f86cbf425a0b0..d8b09630953f8e9ba72bdf8f313199ec494ca525 100644 --- a/totrans/aud22_09.yaml +++ b/totrans/aud22_09.yaml @@ -1,103 +1,153 @@ - en: Building from source + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 从源代码构建 - en: 原文:[https://pytorch.org/audio/stable/build.html](https://pytorch.org/audio/stable/build.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/build.html](https://pytorch.org/audio/stable/build.html) - en: TorchAudio integrates PyTorch for numerical computation and third party libraries for multimedia I/O. It requires the following tools to build from source. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: TorchAudio集成了PyTorch进行数值计算和第三方库进行多媒体I/O。构建源代码需要以下工具。 - en: '[PyTorch](https://pytorch.org)' + id: totrans-3 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[PyTorch](https://pytorch.org)' - en: '[CMake](https://cmake.org/)' + id: totrans-4 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[CMake](https://cmake.org/)' - en: '[Ninja](https://ninja-build.org/)' + id: totrans-5 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Ninja](https://ninja-build.org/)' - en: C++ complier with C++ 17 support + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: 具有C++ 17支持的C++编译器 - en: '[GCC](https://gcc.gnu.org/) (Linux)' + id: totrans-7 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[GCC](https://gcc.gnu.org/)(Linux)' - en: '[Clang](https://clang.llvm.org/) (macOS)' + id: totrans-8 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Clang](https://clang.llvm.org/)(macOS)' - en: '[MSVC](https://visualstudio.microsoft.com) 2019 or newer (Windows)' + id: totrans-9 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[MSVC](https://visualstudio.microsoft.com) 2019或更新版本(Windows)' - en: '[CUDA toolkit](https://developer.nvidia.com/cudnn) and [cuDNN](https://developer.nvidia.com/cudnn) (if building CUDA extension)' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[CUDA工具包](https://developer.nvidia.com/cudnn)和[cuDNN](https://developer.nvidia.com/cudnn)(如果构建CUDA扩展)' - en: Most of the tools are available in [Conda](https://conda.io/), so we recommend using conda. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 大多数工具都可以在[Conda](https://conda.io/)中找到,因此我们建议使用conda。 - en: '[Building on Linux and macOS](build.linux.html)' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Linux和macOS上构建](build.linux.html)' - en: '[Building on Windows](build.windows.html)' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Windows上构建](build.windows.html)' - en: '[Building on Jetson](build.jetson.html)' + id: totrans-14 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Jetson上构建](build.jetson.html)' - en: Customizing the build[](#customizing-the-build "Permalink to this heading") + id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 自定义构建[](#customizing-the-build "Permalink to this heading") - en: TorchAudio’s integration with third party libraries can be enabled/disabled via environment variables. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 通过环境变量可以启用/禁用TorchAudio与第三方库的集成。 - en: They can be enabled by passing `1` and disabled by `0`. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 可以通过传递`1`来启用,通过`0`来禁用。 - en: '`BUILD_SOX`: Enable/disable I/O features based on libsox.' + id: totrans-18 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`BUILD_SOX`: 基于libsox的I/O功能的启用/禁用。' - en: '`BUILD_KALDI`: Enable/disable feature extraction based on Kaldi.' + id: totrans-19 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`BUILD_KALDI`: 基于Kaldi的特征提取的启用/禁用。' - en: '`BUILD_RNNT`: Enable/disable custom RNN-T loss function.' + id: totrans-20 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`BUILD_RNNT`: 启用/禁用自定义RNN-T损失函数。' - en: '`USE_FFMPEG`: Enable/disable I/O features based on FFmpeg libraries.' + id: totrans-21 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`USE_FFMPEG`: 基于FFmpeg库的I/O功能的启用/禁用。' - en: '`USE_ROCM`: Enable/disable AMD ROCm support.' + id: totrans-22 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`USE_ROCM`: 启用/禁用AMD ROCm支持。' - en: '`USE_CUDA`: Enable/disable CUDA support.' + id: totrans-23 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`USE_CUDA`: 启用/禁用CUDA支持。' - en: For the latest configurations and their default values, please check the source code. [https://github.com/pytorch/audio/blob/main/tools/setup_helpers/extension.py](https://github.com/pytorch/audio/blob/main/tools/setup_helpers/extension.py) + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 有关最新配置及其默认值,请查看源代码。[https://github.com/pytorch/audio/blob/main/tools/setup_helpers/extension.py](https://github.com/pytorch/audio/blob/main/tools/setup_helpers/extension.py) diff --git a/totrans/aud22_10.yaml b/totrans/aud22_10.yaml index e943dad7c8ece15481da3ff58f84fd48be3c0779..b12142e9170d29395200eed3ff380e7f532b6c0b 100644 --- a/totrans/aud22_10.yaml +++ b/totrans/aud22_10.yaml @@ -1,84 +1,129 @@ - en: Building on Linux and macOS + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 在Linux和macOS上构建 - en: 原文:[https://pytorch.org/audio/stable/build.linux.html](https://pytorch.org/audio/stable/build.linux.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/build.linux.html](https://pytorch.org/audio/stable/build.linux.html) - en: 1\. Install Conda and activate conda environment[](#install-conda-and-activate-conda-environment "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 安装Conda并激活conda环境[](#install-conda-and-activate-conda-environment "此标题的永久链接") - en: Please folllow the instruction at [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 请按照[https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html)上的说明操作 - en: 2\. Install PyTorch[](#install-pytorch "Permalink to this heading") + id: totrans-4 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 安装PyTorch[](#install-pytorch "此标题的永久链接") - en: Please select the version of PyTorch you want to install from [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 请从[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)选择要安装的PyTorch版本 - en: Here, we install nightly build. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 在这里,我们安装夜间构建。 - en: '[PRE0]' + id: totrans-7 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: 3\. Install build tools[](#install-build-tools "Permalink to this heading") + id: totrans-8 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 安装构建工具[](#install-build-tools "此标题的永久链接") - en: '[PRE1]' + id: totrans-9 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 4\. Clone the torchaudio repository[](#clone-the-torchaudio-repository "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 克隆torchaudio存储库[](#clone-the-torchaudio-repository "此标题的永久链接") - en: '[PRE2]' + id: totrans-11 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: 5\. Build[](#build "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 构建[](#build "此标题的永久链接") - en: '[PRE3]' + id: totrans-13 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Note + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Due to the complexity of build process, TorchAudio only supports in-place build. To use `pip`, please use `--no-use-pep517` option. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 由于构建过程的复杂性,TorchAudio仅支持原地构建。要使用`pip`,请使用`--no-use-pep517`选项。 - en: '`pip install -v -e . --no-use-pep517`' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '`pip install -v -e . --no-use-pep517`' - en: '[Optional] Build TorchAudio with a custom built FFmpeg[](#optional-build-torchaudio-with-a-custom-built-ffmpeg "Permalink to this heading")' + id: totrans-17 prefs: - PREF_H2 type: TYPE_NORMAL + zh: '[可选] 使用自定义构建的FFmpeg构建TorchAudio[](#optional-build-torchaudio-with-a-custom-built-ffmpeg + "此标题的永久链接")' - en: By default, torchaudio tries to build FFmpeg extension with support for multiple FFmpeg versions. This process uses pre-built FFmpeg libraries compiled for specific CPU architectures like `x86_64` and `aarch64` (`arm64`). + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 默认情况下,torchaudio尝试构建支持多个FFmpeg版本的FFmpeg扩展。此过程使用为特定CPU架构(如`x86_64`和`aarch64`(`arm64`))编译的预构建FFmpeg库。 - en: If your CPU is not one of those, then the build process can fail. To workaround, one can disable FFmpeg integration (by setting the environment variable `USE_FFMPEG=0`) or switch to the single version FFmpeg extension. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 如果您的CPU不是其中之一,则构建过程可能会失败。为了解决问题,可以禁用FFmpeg集成(通过设置环境变量`USE_FFMPEG=0`)或切换到单版本FFmpeg扩展。 - en: To build single version FFmpeg extension, FFmpeg binaries must be provided by user and available in the build environment. To do so, install FFmpeg and set `FFMPEG_ROOT` environment variable to specify the location of FFmpeg. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 要构建单版本FFmpeg扩展,用户必须提供FFmpeg二进制文件,并在构建环境中可用。为此,请安装FFmpeg并设置`FFMPEG_ROOT`环境变量以指定FFmpeg的位置。 - en: '[PRE4]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE4]' diff --git a/totrans/aud22_11.yaml b/totrans/aud22_11.yaml index 390c72b985b21047327d1ce57c44c22105ef2779..68bc74d792edd8b6b1f89a81b60a73d0750680ce 100644 --- a/totrans/aud22_11.yaml +++ b/totrans/aud22_11.yaml @@ -1,322 +1,491 @@ - en: Building on Windows + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 在Windows上构建 - en: 原文:[https://pytorch.org/audio/stable/build.windows.html](https://pytorch.org/audio/stable/build.windows.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/build.windows.html](https://pytorch.org/audio/stable/build.windows.html) - en: To build TorchAudio on Windows, we need to enable C++ compiler and install build tools and runtime dependencies. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 要在Windows上构建TorchAudio,我们需要启用C++编译器并安装构建工具和运行时依赖。 - en: We use Microsoft Visual C++ for compiling C++ and Conda for managing the other build tools and runtime dependencies. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 我们使用Microsoft Visual C++来编译C++代码,使用Conda来管理其他构建工具和运行时依赖。 - en: 1\. Install build tools[](#install-build-tools "Permalink to this heading") + id: totrans-4 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 安装构建工具[](#install-build-tools "此标题的永久链接") - en: MSVC[](#msvc "Permalink to this heading") + id: totrans-5 prefs: - PREF_H3 type: TYPE_NORMAL + zh: MSVC[](#msvc "此标题的永久链接") - en: Please follow the instruction at [https://visualstudio.microsoft.com/downloads/](https://visualstudio.microsoft.com/downloads/), and make sure to install C++ development tools. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 请按照[https://visualstudio.microsoft.com/downloads/](https://visualstudio.microsoft.com/downloads/)上的说明操作,并确保安装了C++开发工具。 - en: Note + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The official binary distribution are compiled with MSVC 2019. The following section uses path from MSVC 2019 Community Edition. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 官方的二进制发行版是使用MSVC 2019编译的。以下部分使用的路径来自于MSVC 2019社区版。 - en: Conda[](#conda "Permalink to this heading") + id: totrans-9 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Conda[](#conda "此标题的永久链接") - en: Please follow the instruction at [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 请按照[https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html)上的说明操作。 - en: 2\. Start the dev environment[](#start-the-dev-environment "Permalink to this heading") + id: totrans-11 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 启动开发环境[](#start-the-dev-environment "此标题的永久链接") - en: In the following, we need to use C++ compiler (`cl`), and Conda package manager (`conda`). We also use Bash for the sake of similar experience to Linux/macOS. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 在接下来的步骤中,我们需要使用C++编译器(`cl`)和Conda包管理器(`conda`)。我们还使用Bash以便与Linux/macOS有类似的体验。 - en: To do so, the following three steps are required. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 为此,需要执行以下三个步骤。 - en: Open command prompt + id: totrans-14 prefs: - PREF_OL type: TYPE_NORMAL + zh: 打开命令提示符 - en: Enable developer environment + id: totrans-15 prefs: - PREF_OL type: TYPE_NORMAL + zh: 启用开发环境 - en: '[Optional] Launch bash' + id: totrans-16 prefs: - PREF_OL type: TYPE_NORMAL + zh: '[可选] 启动Bash' - en: The following combination is known to work. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 以下组合已知可行。 - en: Launch Anaconda3 Command Prompt. + id: totrans-18 prefs: - PREF_OL type: TYPE_NORMAL + zh: 启动Anaconda3命令提示符。 - en: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-conda.png](../Images/e359bffec700153e5b0c8c00a8b001f7.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-conda.png)' + id: totrans-19 prefs: - PREF_IND type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-conda.png](../Images/e359bffec700153e5b0c8c00a8b001f7.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-conda.png)' - en: Please make sure that `conda` command is recognized. + id: totrans-20 prefs: - PREF_IND type: TYPE_NORMAL + zh: 请确保`conda`命令被识别。 - en: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-conda2.png](../Images/13a95ff6452fc2a52bb6a6b9bf666630.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-conda2.png)' + id: totrans-21 prefs: - PREF_IND type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-conda2.png](../Images/13a95ff6452fc2a52bb6a6b9bf666630.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-conda2.png)' - en: Activate dev tools by running the following command. + id: totrans-22 prefs: - PREF_OL type: TYPE_NORMAL + zh: 通过运行以下命令激活开发工具。 - en: We need to use the MSVC x64 toolset for compilation. To enable the toolset, one can use `vcvarsall.bat` or `vcvars64.bat` file, which are found under Visual Studio’s installation folder, under `VC\Auxiliary\Build\`. More information are available at [https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture](https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture) + id: totrans-23 prefs: - PREF_IND type: TYPE_NORMAL + zh: 我们需要使用MSVC x64工具集进行编译。要启用该工具集,可以使用`vcvarsall.bat`或`vcvars64.bat`文件,这些文件位于Visual + Studio的安装文件夹下的`VC\Auxiliary\Build\`目录中。更多信息请参考[https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture](https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture) - en: '[PRE0]' + id: totrans-24 prefs: - PREF_IND type: TYPE_PRE + zh: '[PRE0]' - en: Please makes sure that `cl` command is recognized. + id: totrans-25 prefs: - PREF_IND type: TYPE_NORMAL + zh: 请确保`cl`命令被识别。 - en: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-msvc.png](../Images/323d3a6ff776378e8f39d87a6893379c.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-msvc.png)' + id: totrans-26 prefs: - PREF_IND type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-msvc.png](../Images/323d3a6ff776378e8f39d87a6893379c.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-msvc.png)' - en: '[Optional] Launch bash with the following command.' + id: totrans-27 prefs: - PREF_OL type: TYPE_NORMAL + zh: '[可选] 使用以下命令启动Bash。' - en: If you want a similar UX as Linux/macOS, you can launch Bash. However, please note that in Bash environment, the file paths are different from native Windows style, and `torchaudio.datasets` module does not work. + id: totrans-28 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果您想要与Linux/macOS类似的用户体验,可以启动Bash。但请注意,在Bash环境中,文件路径与本机Windows风格不同,并且`torchaudio.datasets`模块不起作用。 - en: '[PRE1]' + id: totrans-29 prefs: - PREF_IND type: TYPE_PRE + zh: '[PRE1]' - en: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-bash.png](../Images/c02c1db4f464de7562d28e7eb2f1f87a.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-bash.png)' + id: totrans-30 prefs: - PREF_IND type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-bash.png](../Images/c02c1db4f464de7562d28e7eb2f1f87a.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-bash.png)' - en: 3\. Install PyTorch[](#install-pytorch "Permalink to this heading") + id: totrans-31 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 安装PyTorch[](#install-pytorch "此标题的永久链接") - en: Please refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) for the up-to-date way to install PyTorch. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 请参考[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)以获取安装PyTorch的最新方法。 - en: The following command installs the nightly build version of PyTorch. + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 以下命令安装PyTorch的夜间构建版本。 - en: '[PRE2]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: When installing CUDA-enabled version, it also install CUDA toolkit. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 在安装启用CUDA版本时,也会安装CUDA工具包。 - en: 4\. [Optional] cuDNN[](#optional-cudnn "Permalink to this heading") + id: totrans-36 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. [可选] cuDNN[](#optional-cudnn "此标题的永久链接") - en: If you intend to build CUDA-related features, please install cuDNN. + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 如果您打算构建与CUDA相关的功能,请安装cuDNN。 - en: Download CuDNN from [https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn), and extract files in the same directories as CUDA toolkit. + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 从[https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn)下载CuDNN,并将文件提取到与CUDA工具包相同的目录中。 - en: When using conda, the directories are `${CONDA_PREFIX}/bin`, `${CONDA_PREFIX}/include`, `${CONDA_PREFIX}/Lib/x64`. + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 使用conda时,目录为`${CONDA_PREFIX}/bin`,`${CONDA_PREFIX}/include`,`${CONDA_PREFIX}/Lib/x64`。 - en: 5\. Install external dependencies[](#install-external-dependencies "Permalink to this heading") + id: totrans-40 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5. 安装外部依赖项 - en: '[PRE3]' + id: totrans-41 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: 6\. Build TorchAudio[](#build-torchaudio "Permalink to this heading") + id: totrans-42 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 6. 构建TorchAudio - en: Now that we have everything ready, we can build TorchAudio. + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 现在我们已经准备好了,可以构建TorchAudio了。 - en: '[PRE4]' + id: totrans-44 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[PRE6]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: Note + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Due to the complexity of build process, TorchAudio only supports in-place build. To use `pip`, please use `--no-use-pep517` option. + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 由于构建过程的复杂性,TorchAudio仅支持原地构建。要使用`pip`,请使用`--no-use-pep517`选项。 - en: '`pip install -v -e . --no-use-pep517`' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: '`pip install -v -e . --no-use-pep517`' - en: '[Optional] Build TorchAudio with a custom FFmpeg[](#optional-build-torchaudio-with-a-custom-ffmpeg "Permalink to this heading")' + id: totrans-50 prefs: - PREF_H2 type: TYPE_NORMAL + zh: '[可选] 使用自定义FFmpeg构建TorchAudio' - en: By default, torchaudio tries to build FFmpeg extension with support for multiple FFmpeg versions. This process uses pre-built FFmpeg libraries compiled for specific CPU architectures like `x86_64`. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 默认情况下,torchaudio尝试构建支持多个FFmpeg版本的FFmpeg扩展。此过程使用为特定CPU架构编译的预构建FFmpeg库,如`x86_64`。 - en: If your CPU is different, then the build process can fail. To workaround, one can disable FFmpeg integration (by setting the environment variable `USE_FFMPEG=0`) or switch to the single version FFmpeg extension. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 如果您的CPU不同,那么构建过程可能会失败。为了解决问题,可以禁用FFmpeg集成(通过设置环境变量`USE_FFMPEG=0`)或切换到单版本FFmpeg扩展。 - en: To build single version FFmpeg extension, FFmpeg binaries must be provided by user and available in the build environment. To do so, install FFmpeg and set `FFMPEG_ROOT` environment variable to specify the location of FFmpeg. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 要构建单版本FFmpeg扩展,用户必须提供FFmpeg二进制文件,并且在构建环境中可用。为此,请安装FFmpeg并设置`FFMPEG_ROOT`环境变量以指定FFmpeg的位置。 - en: '[PRE7]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: '[Optional] Building FFmpeg from source[](#optional-building-ffmpeg-from-source "Permalink to this heading")' + id: totrans-55 prefs: - PREF_H2 type: TYPE_NORMAL + zh: '[可选] 从源代码构建FFmpeg' - en: The following section illustrates a way to build FFmpeg libraries from source. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 以下部分说明了从源代码构建FFmpeg库的方法。 - en: Conda-forge’s FFmpeg package comes with support for major codecs and GPU decoders, so regular users and developers do not need to build FFmpeg from source. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: Conda-forge的FFmpeg软件包具有对主要编解码器和GPU解码器的支持,因此常规用户和开发人员不需要从源代码构建FFmpeg。 - en: If you are not using Conda, then you can either find a pre-built binary distribution or build FFmpeg by yourself. + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 如果您不使用Conda,则可以找到预构建的二进制发行版,或者自己构建FFmpeg。 - en: Also, in case torchaudio developer needs to update and customize the CI for FFmpeg build, this section might be helpful. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 此外,如果torchaudio开发人员需要更新和定制FFmpeg构建的CI,本节可能会有所帮助。 - en: 1\. Install MSYS2[](#install-msys2 "Permalink to this heading") + id: totrans-60 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 1. 安装MSYS2 - en: To build FFmpeg in a way it is usable from the TorchAudio development environment, we need to build binaries native to `MINGW64`. To do so, we need tools required by FFmpeg’s build process, such as `pkg-config` and `make`, that work in `MINGW64` environment. For this purpose, we use MSYS2. + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: 为了以一种在TorchAudio开发环境中可用的方式构建FFmpeg,我们需要构建适用于`MINGW64`的本机二进制文件。为此,我们需要FFmpeg构建过程所需的工具,如在`MINGW64`环境中工作的`pkg-config`和`make`。为此目的,我们使用MSYS2。 - en: FFmpeg’s official documentation touches this [https://trac.ffmpeg.org/wiki/CompilationGuide/MinGW](https://trac.ffmpeg.org/wiki/CompilationGuide/MinGW) + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: FFmpeg的官方文档涉及到这一点[https://trac.ffmpeg.org/wiki/CompilationGuide/MinGW](https://trac.ffmpeg.org/wiki/CompilationGuide/MinGW) - en: Please follow the instruction at [https://www.msys2.org/](https://www.msys2.org/) to install MSYS2. + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: 请按照[https://www.msys2.org/](https://www.msys2.org/)上的说明安装MSYS2。 - en: Note + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: In CI environment, often [Chocolatery](https://chocolatey.org/) can be used to install MSYS2. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 在CI环境中,通常可以使用[Chocolatery](https://chocolatey.org/)来安装MSYS2。 - en: 2\. Launch MSYS2[](#launch-msys2 "Permalink to this heading") + id: totrans-66 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 2. 启动MSYS2 - en: Use the shortcut to launch MSYS2 (MINGW64). + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 使用快捷方式启动MSYS2(MINGW64)。 - en: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-msys2.png](../Images/59237156547c1a97b95f4271157a9c1e.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-msys2.png)' + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/windows-msys2.png](../Images/59237156547c1a97b95f4271157a9c1e.png)](https://download.pytorch.org/torchaudio/doc-assets/windows-msys2.png)' - en: Note + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The Bash environment in MSYS2 does not play well with Conda env, so do not add Conda initialization script in `~/.bashrc` of MSYS2 environment. (i.e. `C:\msys2\home\USER\.bashrc`) Instead, add it in `C:\Users\USER\.bashrc`) + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: MSYS2中的Bash环境与Conda环境不兼容,因此不要在MSYS2环境的`~/.bashrc`中添加Conda初始化脚本(即`C:\msys2\home\USER\.bashrc`)。而是将其添加到`C:\Users\USER\.bashrc`中。 - en: 3\. Install build tools[](#id1 "Permalink to this heading") + id: totrans-71 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3. 安装构建工具 - en: '[PRE8]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: After the installation, you should have packages similar to the following; + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 安装完成后,您应该有类似以下的软件包; - en: '[PRE9]' + id: totrans-74 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: 4\. Build FFmpeg[](#build-ffmpeg "Permalink to this heading") + id: totrans-75 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 4. 构建FFmpeg - en: Check out FFmpeg source code. + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: 查看FFmpeg源代码。 - en: '[PRE10]' + id: totrans-77 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Build + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: 构建 - en: '[PRE11]' + id: totrans-79 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: If the build succeeds, `ffmpeg.exe` should be found in the same directory. Make sure that you can run it. + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 如果构建成功,`ffmpeg.exe`应该在同一目录中找到。确保您可以运行它。 - en: 5\. Verify the build[](#verify-the-build "Permalink to this heading") + id: totrans-81 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5. 验证构建 - en: Check that the resulting FFmpeg binary is accessible from Conda env + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 检查生成的FFmpeg二进制文件是否可以从Conda环境访问。 - en: Now launch a new command prompt and enable the TorchAudio development environment. Make sure that you can run the `ffmpeg.exe` command generated in the previous step. + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: 现在启动一个新的命令提示符并启用TorchAudio开发环境。确保您可以运行在上一步生成的`ffmpeg.exe`命令。 diff --git a/totrans/aud22_12.yaml b/totrans/aud22_12.yaml index 5305d272ae9c34aedf18edae1fd11fff6a5c0385..74637cf45a2afddeccad4c594c9beb476e29fbbd 100644 --- a/totrans/aud22_12.yaml +++ b/totrans/aud22_12.yaml @@ -1,136 +1,212 @@ - en: Building on Jetson + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 在Jetson上构建 - en: 原文:[https://pytorch.org/audio/stable/build.jetson.html](https://pytorch.org/audio/stable/build.jetson.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/build.jetson.html](https://pytorch.org/audio/stable/build.jetson.html) - en: 1\. Install JetPack[](#install-jetpack "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 安装JetPack[](#install-jetpack "Permalink to this heading") - en: JetPack includes the collection of CUDA-related libraries that is required to run PyTorch with CUDA. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: JetPack包括了运行带有CUDA的PyTorch所需的CUDA相关库的集合。 - en: Please refer to [https://developer.nvidia.com/embedded/learn/get-started-jetson-agx-orin-devkit](https://developer.nvidia.com/embedded/learn/get-started-jetson-agx-orin-devkit) for the up-to-date instruction. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 请参考[https://developer.nvidia.com/embedded/learn/get-started-jetson-agx-orin-devkit](https://developer.nvidia.com/embedded/learn/get-started-jetson-agx-orin-devkit)获取最新的指导。 - en: '[PRE0]' + id: totrans-5 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Checking the versions[](#checking-the-versions "Permalink to this heading") + id: totrans-6 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 检查版本[](#checking-the-versions "Permalink to this heading") - en: To check the version installed you can use the following commands; + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 要检查已安装的版本,可以使用以下命令; - en: '[PRE1]' + id: totrans-8 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-package-versions.png](../Images/510d69555d6f8cadc50c29ad61243630.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-package-versions.png)' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-package-versions.png](../Images/510d69555d6f8cadc50c29ad61243630.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-package-versions.png)' - en: 2\. [Optional] Install jtop[](#optional-install-jtop "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. [可选] 安装jtop[](#optional-install-jtop "Permalink to this heading") - en: Since Tegra GPUs are not supported by `nvidia-smi` command, it is recommended to isntall `jtop`. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 由于`nvidia-smi`命令不支持Tegra GPU,建议安装`jtop`。 - en: Only super-use can install `jtop`. So make sure to add `-U`, so that running `jtop` won’t require super-user priviledge. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 只有超级用户才能安装`jtop`。因此,请确保添加`-U`,这样运行`jtop`不需要超级用户权限。 - en: 3\. Install `pip` in user env[](#install-pip-in-user-env "Permalink to this heading") + id: totrans-13 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 在用户环境中安装`pip`[](#install-pip-in-user-env "Permalink to this heading") - en: By default, `pip` / `pip3` commands use the ones from system directory `/usr/bin/`, and its `site-packages` directory is protected and cannot be modified without `sudo`. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 默认情况下,`pip` / `pip3`命令使用系统目录`/usr/bin/`中的命令,并且其`site-packages`目录受保护,无法在没有`sudo`的情况下修改。 - en: One way to workaround this is to install `pip` in user directory. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 解决此问题的一种方法是在用户目录中安装`pip`。 - en: '[https://forums.developer.nvidia.com/t/python-3-module-install-folder/181321](https://forums.developer.nvidia.com/t/python-3-module-install-folder/181321)' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[https://forums.developer.nvidia.com/t/python-3-module-install-folder/181321](https://forums.developer.nvidia.com/t/python-3-module-install-folder/181321)' - en: '[PRE2]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: After this verify that `pip` command is pointing the one in user directory. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 之后,请验证`pip`命令是否指向用户目录中的命令。 - en: '[PRE3]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: 4\. Install PyTorch[](#install-pytorch "Permalink to this heading") + id: totrans-20 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 安装PyTorch[](#install-pytorch "Permalink to this heading") - en: As of PyTorch 1.13 and torchaudio 0.13, there is no official pre-built binaries for Linux ARM64\. Nidia provides custom pre-built binaries for PyTorch, which works with specific JetPack. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 截至PyTorch 1.13和torchaudio 0.13,Linux ARM64没有官方预构建的二进制文件。Nidia提供了适用于特定JetPack的自定义预构建的PyTorch二进制文件。 - en: Please refer to [https://docs.nvidia.com/deeplearning/frameworks/install-pytorch-jetson-platform/index.html](https://docs.nvidia.com/deeplearning/frameworks/install-pytorch-jetson-platform/index.html) for up-to-date instruction on how to install PyTorch. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 请参考[https://docs.nvidia.com/deeplearning/frameworks/install-pytorch-jetson-platform/index.html](https://docs.nvidia.com/deeplearning/frameworks/install-pytorch-jetson-platform/index.html)获取有关如何安装PyTorch的最新指导。 - en: '[PRE4]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Verify the installation by checking the version and CUDA device accessibility. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 通过检查版本和CUDA设备的可访问性来验证安装。 - en: '[PRE5]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-torch.png](../Images/a3a5fbe3614beb0175742530a928b956.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-torch.png)' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-torch.png](../Images/a3a5fbe3614beb0175742530a928b956.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-torch.png)' - en: 5\. Build TorchAudio[](#build-torchaudio "Permalink to this heading") + id: totrans-27 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 构建TorchAudio[](#build-torchaudio "Permalink to this heading") - en: 1\. Install build tools[](#install-build-tools "Permalink to this heading") + id: totrans-28 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 1\. 安装构建工具[](#install-build-tools "Permalink to this heading") - en: '[PRE6]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 2\. Install dependencies[](#install-dependencies "Permalink to this heading") + id: totrans-30 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 2\. 安装依赖项[](#install-dependencies "Permalink to this heading") - en: '[PRE7]' + id: totrans-31 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: 3\. Build TorchAudio[](#id1 "Permalink to this heading") + id: totrans-32 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3\. 构建TorchAudio[](#id1 "Permalink to this heading") - en: '[PRE8]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: 4\. Check the installation[](#check-the-installation "Permalink to this heading") + id: totrans-34 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 4\. 检查安装[](#check-the-installation "Permalink to this heading") - en: '[PRE9]' + id: totrans-35 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-36 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-verify-build.png](../Images/54e00283b6bc33749b45ed29bb75ce91.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-verify-build.png)' + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/doc-assets/jetson-verify-build.png](../Images/54e00283b6bc33749b45ed29bb75ce91.png)](https://download.pytorch.org/torchaudio/doc-assets/jetson-verify-build.png)' diff --git a/totrans/aud22_13.yaml b/totrans/aud22_13.yaml index 47b0eeaf048e80310fbe5245dd400b6e9f4943d0..aa876e5c25c2d29696da6e2c80c5f63fd242e8ab 100644 --- a/totrans/aud22_13.yaml +++ b/totrans/aud22_13.yaml @@ -1,279 +1,426 @@ - en: Enabling GPU video decoder/encoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 启用GPU视频解码器/编码器 - en: 原文:[https://pytorch.org/audio/stable/build.ffmpeg.html](https://pytorch.org/audio/stable/build.ffmpeg.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/build.ffmpeg.html](https://pytorch.org/audio/stable/build.ffmpeg.html) - en: TorchAudio can make use of hardware-based video decoding and encoding supported by underlying FFmpeg libraries that are linked at runtime. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: TorchAudio可以利用底层FFmpeg库支持的硬件解码和编码功能。 - en: Using NVIDIA’s GPU decoder and encoder, it is also possible to pass around CUDA Tensor directly, that is decode video into CUDA tensor or encode video from CUDA tensor, without moving data from/to CPU. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 使用NVIDIA的GPU解码器和编码器,还可以直接传递CUDA Tensor,即将视频解码为CUDA张量或从CUDA张量编码视频,而无需在CPU之间移动数据。 - en: This improves the video throughput significantly. However, please note that not all the video formats are supported by hardware acceleration. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 这将显著提高视频吞吐量。但请注意,并非所有视频格式都支持硬件加速。 - en: This page goes through how to build FFmpeg with hardware acceleration. For the detail on the performance of GPU decoder and encoder please see [NVDEC tutoial](tutorials/nvdec_tutorial.html#nvdec-tutorial) and [NVENC tutorial](tutorials/nvenc_tutorial.html#nvenc-tutorial). + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本页介绍了如何使用硬件加速构建FFmpeg。有关GPU解码器和编码器性能的详细信息,请参阅[NVDEC教程](tutorials/nvdec_tutorial.html#nvdec-tutorial)和[NVENC教程](tutorials/nvenc_tutorial.html#nvenc-tutorial)。 - en: Overview[](#overview "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "跳转到此标题") - en: Using them in TorchAduio requires additional FFmpeg configuration. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 在TorchAudio中使用它们需要额外的FFmpeg配置。 - en: In the following, we look into how to enable GPU video decoding with [NVIDIA’s Video codec SDK](https://developer.nvidia.com/nvidia-video-codec-sdk). To use NVENC/NVDEC with TorchAudio, the following items are required. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们将研究如何使用[NVIDIA的视频编解码SDK](https://developer.nvidia.com/nvidia-video-codec-sdk)启用GPU视频解码。要在TorchAudio中使用NVENC/NVDEC,需要以下项目。 - en: NVIDIA GPU with hardware video decoder/encoder. + id: totrans-9 prefs: - PREF_OL type: TYPE_NORMAL + zh: 具有硬件视频解码器/编码器的NVIDIA GPU。 - en: FFmpeg libraries compiled with NVDEC/NVENC support. † + id: totrans-10 prefs: - PREF_OL type: TYPE_NORMAL + zh: 使用已编译具有NVDEC/NVENC支持的FFmpeg库。† - en: PyTorch / TorchAudio with CUDA support. + id: totrans-11 prefs: - PREF_OL type: TYPE_NORMAL + zh: 带有CUDA支持的PyTorch / TorchAudio。 - en: TorchAudio’s official binary distributions are compiled to work with FFmpeg libraries, and they contain the logic to use hardware decoding/encoding. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: TorchAudio的官方二进制发行版已经编译为与FFmpeg库配合使用,并包含使用硬件解码/编码的逻辑。 - en: In the following, we build FFmpeg 4 libraries with NVDEC/NVENC support. You can also use FFmpeg 5 or 6. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们使用NVDEC/NVENC支持构建FFmpeg 4库。您也可以使用FFmpeg 5或6。 - en: The following procedure was tested on Ubuntu. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 以下过程在Ubuntu上进行了测试。 - en: † For details on NVDEC/NVENC and FFmpeg, please refer to the following articles. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: †有关NVDEC/NVENC和FFmpeg的详细信息,请参考以下文章。 - en: '[https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvdec-video-decoder-api-prog-guide/](https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvdec-video-decoder-api-prog-guide/)' + id: totrans-16 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvdec-video-decoder-api-prog-guide/](https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvdec-video-decoder-api-prog-guide/)' - en: '[https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/ffmpeg-with-nvidia-gpu/index.html#compiling-ffmpeg](https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/ffmpeg-with-nvidia-gpu/index.html#compiling-ffmpeg)' + id: totrans-17 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/ffmpeg-with-nvidia-gpu/index.html#compiling-ffmpeg](https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/ffmpeg-with-nvidia-gpu/index.html#compiling-ffmpeg)' - en: '[https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/](https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/)' + id: totrans-18 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/](https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/)' - en: Check the GPU and CUDA version[](#check-the-gpu-and-cuda-version "Permalink to this heading") + id: totrans-19 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 检查GPU和CUDA版本[](#check-the-gpu-and-cuda-version "跳转到此标题") - en: First, check the available GPU. Here, we have Tesla T4 with CUDA Toolkit 11.2 installed. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 首先,检查可用的GPU。这里,我们有安装了CUDA Toolkit 11.2的Tesla T4。 - en: '[PRE0]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Checking the compute capability[](#checking-the-compute-capability "Permalink to this heading") + id: totrans-22 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 检查计算能力[](#checking-the-compute-capability "跳转到此标题") - en: Later, we need the version of compute capability supported by this GPU. The following page lists the GPUs and corresponding compute capabilities. The compute capability of T4 is `7.5`. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 稍后,我们需要此GPU支持的计算能力版本。以下页面列出了GPU及其对应的计算能力。T4的计算能力为`7.5`。 - en: '[https://developer.nvidia.com/cuda-gpus](https://developer.nvidia.com/cuda-gpus)' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: '[https://developer.nvidia.com/cuda-gpus](https://developer.nvidia.com/cuda-gpus)' - en: Install NVIDIA Video Codec Headers[](#install-nvidia-video-codec-headers "Permalink to this heading") + id: totrans-25 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 安装NVIDIA视频编解码头文件[](#install-nvidia-video-codec-headers "跳转到此标题") - en: To build FFmpeg with NVDEC/NVENC, we first need to install the headers that FFmpeg uses to interact with Video Codec SDK. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 要构建具有NVDEC/NVENC的FFmpeg,我们首先需要安装FFmpeg用于与视频编解码SDK交互的头文件。 - en: Since we have CUDA 11 working in the system, we use one of `n11` tag. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 由于系统中已经安装了CUDA 11,我们使用了`n11`标签之一。 - en: '[PRE1]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: The location of installation can be changed with `make PREFIX= install`. + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 安装位置可以使用`make PREFIX= install`进行更改。 - en: '[PRE2]' + id: totrans-30 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Install FFmpeg dependencies[](#install-ffmpeg-dependencies "Permalink to this heading") + id: totrans-31 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 安装FFmpeg依赖项[](#install-ffmpeg-dependencies "跳转到此标题") - en: Next, we install tools and libraries required during the FFmpeg build. The minimum requirement is [Yasm](https://yasm.tortall.net/). Here we additionally install H264 video codec and HTTPS protocol, which we use later for verifying the installation. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们安装在FFmpeg构建过程中所需的工具和库。最低要求是[Yasm](https://yasm.tortall.net/)。在这里,我们还安装了H264视频编解码器和HTTPS协议,稍后我们将用于验证安装。 - en: '[PRE3]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Build FFmpeg with NVDEC/NVENC support[](#build-ffmpeg-with-nvdec-nvenc-support "Permalink to this heading") + id: totrans-35 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 构建具有NVDEC/NVENC支持的FFmpeg[](#build-ffmpeg-with-nvdec-nvenc-support "跳转到此标题") - en: Next we download the source code of FFmpeg 4\. We use 4.4.2 here. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们下载FFmpeg 4的源代码。这里我们使用的是4.4.2版本。 - en: '[PRE5]' + id: totrans-37 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: 'Next we configure FFmpeg build. Note the following:' + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 接下来我们配置FFmpeg构建。请注意以下内容: - en: We provide flags like `-I/usr/local/cuda/include`, `-L/usr/local/cuda/lib64` to let the build process know where the CUDA libraries are found. + id: totrans-39 prefs: - PREF_OL type: TYPE_NORMAL + zh: 我们提供像`-I/usr/local/cuda/include`、`-L/usr/local/cuda/lib64`这样的标志,让构建过程知道CUDA库的位置。 - en: We provide flags like `--enable-nvdec` and `--enable-nvenc` to enable NVDEC/NVENC. + id: totrans-40 prefs: - PREF_OL type: TYPE_NORMAL + zh: 我们提供像`--enable-nvdec`和`--enable-nvenc`这样的标志来启用NVDEC/NVENC。 - en: We also provide NVCC flags with compute capability `75`, which corresponds to `7.5` of T4\. † + id: totrans-41 prefs: - PREF_OL type: TYPE_NORMAL + zh: 我们还提供了带有计算能力`75`的NVCC标志,对应于T4的`7.5`。 - en: We install the library in `/usr/lib/`. + id: totrans-42 prefs: - PREF_OL type: TYPE_NORMAL + zh: 我们将库安装在`/usr/lib/`中。 - en: Note + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: † The configuration script verifies NVCC by compiling a sample code. By default it uses old compute capability such as `30`, which is no longer supported by CUDA 11\. So it is required to set a correct compute capability. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: †配置脚本通过编译示例代码来验证NVCC。默认情况下,它使用旧的计算能力,例如`30`,这在CUDA 11中不再受支持。因此,需要设置正确的计算能力。 - en: '[PRE6]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Now we build and install + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 现在我们构建并安装 - en: '[PRE8]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: '[PRE9]' + id: totrans-49 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: Checking the intallation[](#checking-the-intallation "Permalink to this heading") + id: totrans-50 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 检查安装 - en: To verify that the FFmpeg we built have CUDA support, we can check the list of available decoders and encoders. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 要验证我们构建的FFmpeg是否支持CUDA,我们可以检查可用解码器和编码器的列表。 - en: '[PRE10]' + id: totrans-52 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: '[PRE11]' + id: totrans-53 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '[PRE12]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[PRE13]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: The following command fetches video from remote server, decode with NVDEC (cuvid) and re-encode with NVENC. If this command does not work, then there is an issue with FFmpeg installation, and TorchAudio would not be able to use them either. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 以下命令从远程服务器获取视频,使用NVDEC(cuvid)解码,然后使用NVENC重新编码。如果此命令不起作用,则说明FFmpeg安装存在问题,TorchAudio也无法使用它们。 - en: '[PRE14]' + id: totrans-57 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: 'Note that there is `Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))`, which means that video is decoded with `h264_cuvid` decoder and `h264_nvenc` encoder.' + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: '请注意,存在`Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))`,这意味着视频使用`h264_cuvid`解码器和`h264_nvenc`编码器进行解码。' - en: '[PRE15]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Using the GPU decoder/encoder from TorchAudio[](#using-the-gpu-decoder-encoder-from-torchaudio "Permalink to this heading") + id: totrans-60 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 使用TorchAudio的GPU解码器/编码器 - en: Checking the installation[](#checking-the-installation "Permalink to this heading") + id: totrans-61 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 检查安装 - en: Once the FFmpeg is properly working with hardware acceleration, we need to check if TorchAudio can pick it up correctly. + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 一旦FFmpeg正确使用硬件加速,我们需要检查TorchAudio是否能够正确识别它。 - en: There are utility functions to query the capability of FFmpeg in [`torchaudio.utils.ffmpeg_utils`](generated/torchaudio.utils.ffmpeg_utils.html#module-torchaudio.utils.ffmpeg_utils "torchaudio.utils.ffmpeg_utils"). + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: 在[`torchaudio.utils.ffmpeg_utils`](generated/torchaudio.utils.ffmpeg_utils.html#module-torchaudio.utils.ffmpeg_utils + "torchaudio.utils.ffmpeg_utils")中有用于查询FFmpeg功能的实用函数。 - en: You can first use [`get_video_decoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_video_decoders "torchaudio.utils.ffmpeg_utils.get_video_decoders") and [`get_video_encoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_video_encoders "torchaudio.utils.ffmpeg_utils.get_video_encoders") to check if GPU decoders and encoders (such as `h264_cuvid` and `h264_nvenc`) are listed. + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 您可以首先使用[`get_video_decoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_video_decoders + "torchaudio.utils.ffmpeg_utils.get_video_decoders")和[`get_video_encoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_video_encoders + "torchaudio.utils.ffmpeg_utils.get_video_encoders")来检查GPU解码器和编码器(如`h264_cuvid`和`h264_nvenc`)是否已列出。 - en: It is often the case where there are multiple FFmpeg installations in the system, and TorchAudio is loading one different than expected. In such cases, use of `ffmpeg` to check the installation does not help. You can use functions like [`get_build_config()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_build_config "torchaudio.utils.ffmpeg_utils.get_build_config") and [`get_versions()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_versions "torchaudio.utils.ffmpeg_utils.get_versions") to get information about FFmpeg libraries TorchAudio loaded. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 通常情况下,系统中存在多个FFmpeg安装,TorchAudio加载的可能与预期不同。在这种情况下,使用`ffmpeg`检查安装是无济于事的。您可以使用[`get_build_config()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_build_config + "torchaudio.utils.ffmpeg_utils.get_build_config")和[`get_versions()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_versions + "torchaudio.utils.ffmpeg_utils.get_versions")等函数来获取有关TorchAudio加载的FFmpeg库的信息。 - en: '[PRE16]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '[PRE17]' + id: totrans-67 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: Using the hardware decoder and encoder[](#using-the-hardware-decoder-and-encoder "Permalink to this heading") + id: totrans-68 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用硬件解码器和编码器 - en: Once the installation and the runtime linking work fine, then you can test the GPU decoding with the following. + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: 一旦安装和运行时链接正常,您可以使用以下内容测试GPU解码。 - en: For the detail on the performance of GPU decoder and encoder please see [NVDEC tutoial](tutorials/nvdec_tutorial.html#nvdec-tutorial) and [NVENC tutorial](tutorials/nvenc_tutorial.html#nvenc-tutorial). + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 有关GPU解码器和编码器性能的详细信息,请参阅[NVDEC教程](tutorials/nvdec_tutorial.html#nvdec-tutorial)和[NVENC教程](tutorials/nvenc_tutorial.html#nvenc-tutorial)。 diff --git a/totrans/aud22_14.yaml b/totrans/aud22_14.yaml index 05b815ade65635b5efe08e53ea3ac83e844c1d0b..aa0dab5367428ac725edacc032154caf89d47c64 100644 --- a/totrans/aud22_14.yaml +++ b/totrans/aud22_14.yaml @@ -1,4 +1,6 @@ - en: API Tutorials + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: API教程 diff --git a/totrans/aud22_15.yaml b/totrans/aud22_15.yaml index 89a9e656ab6cf805d26c7a014565b3fcaa65e931..b4e7bcebe11e38004ad45e05f219955819626674 100644 --- a/totrans/aud22_15.yaml +++ b/totrans/aud22_15.yaml @@ -1,403 +1,628 @@ - en: Audio I/O + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 音频I/O - en: 原文:[https://pytorch.org/audio/stable/tutorials/audio_io_tutorial.html](https://pytorch.org/audio/stable/tutorials/audio_io_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/audio_io_tutorial.html](https://pytorch.org/audio/stable/tutorials/audio_io_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-audio-io-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-audio-io-tutorial-py)下载完整示例代码 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Moto Hira](mailto:moto%40meta.com)' - en: This tutorial shows how to use TorchAudio’s basic I/O API to inspect audio data, load them into PyTorch Tensors and save PyTorch Tensors. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用TorchAudio的基本I/O API来检查音频数据,将其加载到PyTorch张量中并保存PyTorch张量。 - en: Warning + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 警告 - en: There are multiple changes planned/made to audio I/O in recent releases. For the detail of these changes please refer to [Introduction of Dispatcher](../torchaudio.html#dispatcher-migration). + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 最近的版本中计划/已经对音频I/O进行了多个更改。有关这些更改的详细信息,请参阅[Dispatcher介绍](../torchaudio.html#dispatcher-migration)。 - en: '[PRE0]' + id: totrans-8 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-9 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Preparation[](#preparation "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 准备[](#preparation "跳转到此标题的永久链接") - en: First, we import the modules and download the audio assets we use in this tutorial. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 首先,我们导入模块并下载本教程中使用的音频资产。 - en: Note + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: 'When running this tutorial in Google Colab, install the required packages with the following:' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 在Google Colab中运行此教程时,请使用以下命令安装所需的软件包: - en: '[PRE2]' + id: totrans-14 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '[PRE3]' + id: totrans-15 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Querying audio metadata[](#querying-audio-metadata "Permalink to this heading") + id: totrans-17 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 查询音频元数据[](#querying-audio-metadata "跳转到此标题的永久链接") - en: Function [`torchaudio.info()`](../generated/torchaudio.info.html#torchaudio.info "torchaudio.info") fetches audio metadata. You can provide a path-like object or file-like object. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 函数[`torchaudio.info()`](../generated/torchaudio.info.html#torchaudio.info "torchaudio.info")获取音频元数据。您可以提供路径类似对象或类似文件对象。 - en: '[PRE5]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[PRE6]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: Where + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 其中 - en: '`sample_rate` is the sampling rate of the audio' + id: totrans-22 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`sample_rate`是音频的采样率' - en: '`num_channels` is the number of channels' + id: totrans-23 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`num_channels`是通道数' - en: '`num_frames` is the number of frames per channel' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`num_frames`是每个通道的帧数' - en: '`bits_per_sample` is bit depth' + id: totrans-25 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`bits_per_sample` 是比特深度' - en: '`encoding` is the sample coding format' + id: totrans-26 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`encoding`是样本编码格式' - en: '`encoding` can take on one of the following values:' + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: '`encoding`可以取以下值之一:' - en: '`"PCM_S"`: Signed integer linear PCM' + id: totrans-28 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"PCM_S"`:有符号整数线性PCM' - en: '`"PCM_U"`: Unsigned integer linear PCM' + id: totrans-29 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"PCM_U"`:无符号整数线性PCM' - en: '`"PCM_F"`: Floating point linear PCM' + id: totrans-30 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"PCM_F"`:浮点线性PCM' - en: '`"FLAC"`: Flac, [Free Lossless Audio Codec](https://xiph.org/flac/)' + id: totrans-31 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"FLAC"`:Flac,[无损音频编解码器](https://xiph.org/flac/)' - en: '`"ULAW"`: Mu-law, [[wikipedia](https://en.wikipedia.org/wiki/%CE%9C-law_algorithm)]' + id: totrans-32 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"ULAW"`:Mu-law,[[维基百科](https://en.wikipedia.org/wiki/%CE%9C-law_algorithm)]' - en: '`"ALAW"`: A-law [[wikipedia](https://en.wikipedia.org/wiki/A-law_algorithm)]' + id: totrans-33 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"ALAW"`:A-law[[维基百科](https://en.wikipedia.org/wiki/A-law_algorithm)]' - en: '`"MP3"` : MP3, MPEG-1 Audio Layer III' + id: totrans-34 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"MP3"`:MP3,MPEG-1音频层III' - en: '`"VORBIS"`: OGG Vorbis [[xiph.org](https://xiph.org/vorbis/)]' + id: totrans-35 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"VORBIS"`:OGG Vorbis[[xiph.org](https://xiph.org/vorbis/)]' - en: '`"AMR_NB"`: Adaptive Multi-Rate [[wikipedia](https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_codec)]' + id: totrans-36 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"AMR_NB"`:自适应多速率[[维基百科](https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_codec)]' - en: '`"AMR_WB"`: Adaptive Multi-Rate Wideband [[wikipedia](https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband)]' + id: totrans-37 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"AMR_WB"`:自适应多速率宽带[[维基百科](https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband)]' - en: '`"OPUS"`: Opus [[opus-codec.org](https://opus-codec.org/)]' + id: totrans-38 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"OPUS"`:Opus[[opus-codec.org](https://opus-codec.org/)]' - en: '`"GSM"`: GSM-FR [[wikipedia](https://en.wikipedia.org/wiki/Full_Rate)]' + id: totrans-39 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"GSM"`:GSM-FR[[维基百科](https://en.wikipedia.org/wiki/Full_Rate)]' - en: '`"HTK"`: Single channel 16-bit PCM' + id: totrans-40 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"HTK"`:单声道16位PCM' - en: '`"UNKNOWN"` None of above' + id: totrans-41 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`"UNKNOWN"` 以上都不是' - en: '**Note**' + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: '**注意**' - en: '`bits_per_sample` can be `0` for formats with compression and/or variable bit rate (such as MP3).' + id: totrans-43 prefs: - PREF_UL type: TYPE_NORMAL + zh: 对于具有压缩和/或可变比特率(如MP3)的格式,`bits_per_sample`可以是`0`。 - en: '`num_frames` can be `0` for GSM-FR format.' + id: totrans-44 prefs: - PREF_UL type: TYPE_NORMAL + zh: 对于GSM-FR格式,`num_frames`可以是`0`。 - en: '[PRE7]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: '[PRE8]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: Querying file-like object[](#querying-file-like-object "Permalink to this heading") + id: totrans-47 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 查询类似文件的对象[](#querying-file-like-object "跳转到此标题的永久链接") - en: '[`torchaudio.info()`](../generated/torchaudio.info.html#torchaudio.info "torchaudio.info") works on file-like objects.' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.info()`](../generated/torchaudio.info.html#torchaudio.info "torchaudio.info")适用于类似文件的对象。' - en: '[PRE9]' + id: totrans-49 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-50 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Note + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: When passing a file-like object, `info` does not read all of the underlying data; rather, it reads only a portion of the data from the beginning. Therefore, for a given audio format, it may not be able to retrieve the correct metadata, including the format itself. In such case, you can pass `format` argument to specify the format of the audio. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 当传递类似文件的对象时,`info`不会读取所有底层数据;相反,它只从开头读取部分数据。因此,对于给定的音频格式,可能无法检索正确的元数据,包括格式本身。在这种情况下,您可以传递`format`参数来指定音频的格式。 - en: Loading audio data[](#loading-audio-data "Permalink to this heading") + id: totrans-53 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 加载音频数据[](#loading-audio-data "跳转到此标题的永久链接") - en: To load audio data, you can use [`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load "torchaudio.load"). + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 要加载音频数据,您可以使用[`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load + "torchaudio.load")。 - en: This function accepts a path-like object or file-like object as input. + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 此函数接受路径类似对象或类似文件对象作为输入。 - en: The returned value is a tuple of waveform (`Tensor`) and sample rate (`int`). + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 返回的值是波形(`Tensor`)和采样率(`int`)的元组。 - en: By default, the resulting tensor object has `dtype=torch.float32` and its value range is `[-1.0, 1.0]`. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 默认情况下,生成的张量对象的`dtype=torch.float32`,其值范围是`[-1.0, 1.0]`。 - en: For the list of supported format, please refer to [the torchaudio documentation](https://pytorch.org/audio). + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 有关支持的格式列表,请参阅[torchaudio文档](https://pytorch.org/audio)。 - en: '[PRE11]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '[PRE12]' + id: totrans-60 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[PRE13]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '![waveform](../Images/9466cf198ed5765d8c8e3bd73ec41b5b.png)' + id: totrans-62 prefs: [] type: TYPE_IMG + zh: '![波形](../Images/9466cf198ed5765d8c8e3bd73ec41b5b.png)' - en: '[PRE14]' + id: totrans-63 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '[PRE15]' + id: totrans-64 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '![Spectrogram](../Images/0c90411a5757b0c0402e369e8c52cc02.png)' + id: totrans-65 prefs: [] type: TYPE_IMG + zh: '![频谱图](../Images/0c90411a5757b0c0402e369e8c52cc02.png)' - en: '[PRE16]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: + id: totrans-67 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Loading from file-like object[](#loading-from-file-like-object "Permalink to this heading") + id: totrans-69 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 从类似文件的对象加载[](#loading-from-file-like-object "跳转到此标题") - en: The I/O functions support file-like objects. This allows for fetching and decoding audio data from locations within and beyond the local file system. The following examples illustrate this. + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: I/O函数支持类似文件的对象。这允许从本地文件系统内部和外部的位置获取和解码音频数据。以下示例说明了这一点。 - en: '[PRE17]' + id: totrans-71 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '![HTTP datasource](../Images/aa8a354438ec7213bf2e13d228e574da.png)' + id: totrans-72 prefs: [] type: TYPE_IMG + zh: '![HTTP数据源](../Images/aa8a354438ec7213bf2e13d228e574da.png)' - en: '[PRE18]' + id: totrans-73 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '![TAR file](../Images/9e8e1ea10bbe1e3e7fb26fa34d71fe35.png)' + id: totrans-74 prefs: [] type: TYPE_IMG + zh: '![TAR文件](../Images/9e8e1ea10bbe1e3e7fb26fa34d71fe35.png)' - en: '[PRE19]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '[PRE20]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: '![From S3](../Images/2934ed5eec98ab84e92b9bb4f4e2fd1f.png)' + id: totrans-77 prefs: [] type: TYPE_IMG + zh: '![来自S3](../Images/2934ed5eec98ab84e92b9bb4f4e2fd1f.png)' - en: Tips on slicing[](#tips-on-slicing "Permalink to this heading") + id: totrans-78 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 切片技巧[](#tips-on-slicing "跳转到此标题") - en: Providing `num_frames` and `frame_offset` arguments restricts decoding to the corresponding segment of the input. + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 提供`num_frames`和`frame_offset`参数会限制解码到输入的相应段。 - en: The same result can be achieved using vanilla Tensor slicing, (i.e. `waveform[:, frame_offset:frame_offset+num_frames]`). However, providing `num_frames` and `frame_offset` arguments is more efficient. + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 使用普通张量切片也可以实现相同的结果(即`waveform[:, frame_offset:frame_offset+num_frames]`)。但是,提供`num_frames`和`frame_offset`参数更有效。 - en: This is because the function will end data acquisition and decoding once it finishes decoding the requested frames. This is advantageous when the audio data are transferred via network as the data transfer will stop as soon as the necessary amount of data is fetched. + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: 这是因为一旦完成对请求帧的解码,该函数将结束数据采集和解码。当音频数据通过网络传输时,这是有利的,因为数据传输将在获取到必要数量的数据后立即停止。 - en: The following example illustrates this. + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 以下示例说明了这一点。 - en: '[PRE21]' + id: totrans-83 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-84 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: Saving audio to file[](#saving-audio-to-file "Permalink to this heading") + id: totrans-85 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 保存音频到文件[](#saving-audio-to-file "跳转到此标题") - en: To save audio data in formats interpretable by common applications, you can use [`torchaudio.save()`](../generated/torchaudio.save.html#torchaudio.save "torchaudio.save"). + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 要将音频数据保存为常见应用程序可解释的格式,您可以使用[`torchaudio.save()`](../generated/torchaudio.save.html#torchaudio.save + "torchaudio.save")。 - en: This function accepts a path-like object or file-like object. + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 此函数接受类似路径的对象或类似文件的对象。 - en: When passing a file-like object, you also need to provide argument `format` so that the function knows which format it should use. In the case of a path-like object, the function will infer the format from the extension. If you are saving to a file without an extension, you need to provide argument `format`. + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 当传递类似文件的对象时,您还需要提供参数`format`,以便函数知道应该使用哪种格式。对于类似路径的对象,函数将从扩展名推断格式。如果要保存到没有扩展名的文件中,您需要提供参数`format`。 - en: When saving WAV-formatted data, the default encoding for `float32` Tensor is 32-bit floating-point PCM. You can provide arguments `encoding` and `bits_per_sample` to change this behavior. For example, to save data in 16-bit signed integer PCM, you can do the following. + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: 保存为WAV格式数据时,默认的`float32`张量编码为32位浮点PCM。您可以提供参数`encoding`和`bits_per_sample`来更改此行为。例如,要以16位有符号整数PCM保存数据,可以执行以下操作。 - en: Note + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Saving data in encodings with a lower bit depth reduces the resulting file size but also precision. + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: 以较低比特深度保存数据会减小生成文件的大小,但也会降低精度。 - en: '[PRE23]' + id: totrans-92 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '[PRE24]' + id: totrans-93 prefs: [] type: TYPE_PRE + zh: '[PRE24]' - en: Save without any encoding option. The function will pick up the encoding which the provided data fit + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: 不使用任何编码选项保存。函数将选择提供的数据适合的编码 - en: '[PRE25]' + id: totrans-95 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '[PRE26]' + id: totrans-96 prefs: [] type: TYPE_PRE + zh: '[PRE26]' - en: Save as 16-bit signed integer Linear PCM The resulting file occupies half the storage but loses precision + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 保存为16位有符号整数线性PCM,生成的文件占用一半的存储空间,但失去了精度 - en: '[PRE27]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '[PRE28]' + id: totrans-99 prefs: [] type: TYPE_PRE + zh: '[PRE28]' - en: '[`torchaudio.save()`](../generated/torchaudio.save.html#torchaudio.save "torchaudio.save") can also handle other formats. To name a few:' + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.save()`](../generated/torchaudio.save.html#torchaudio.save "torchaudio.save")也可以处理其他格式。举几个例子:' - en: '[PRE29]' + id: totrans-101 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '[PRE30]' + id: totrans-102 prefs: [] type: TYPE_PRE + zh: '[PRE30]' - en: '[PRE31]' + id: totrans-103 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: Saving to file-like object[](#saving-to-file-like-object "Permalink to this heading") + id: totrans-104 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 保存到类似文件的对象[](#saving-to-file-like-object "跳转到此标题") - en: Similar to the other I/O functions, you can save audio to file-like objects. When saving to a file-like object, argument `format` is required. + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: 与其他I/O函数类似,您可以将音频保存到类似文件的对象中。保存到类似文件的对象时,需要提供参数`format`。 - en: '[PRE32]' + id: totrans-106 prefs: [] type: TYPE_PRE + zh: '[PRE32]' - en: '[PRE33]' + id: totrans-107 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: '**Total running time of the script:** ( 0 minutes 1.941 seconds)' + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟1.941秒)' - en: '[`Download Python source code: audio_io_tutorial.py`](../_downloads/a50b7a9d7eda039b9579621100be1417/audio_io_tutorial.py)' + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:audio_io_tutorial.py`](../_downloads/a50b7a9d7eda039b9579621100be1417/audio_io_tutorial.py)' - en: '[`Download Jupyter notebook: audio_io_tutorial.ipynb`](../_downloads/4d63e50ab0e70c0e96fd6641e0823ce8/audio_io_tutorial.ipynb)' + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:audio_io_tutorial.ipynb`](../_downloads/4d63e50ab0e70c0e96fd6641e0823ce8/audio_io_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'