使用多个CPU训练报错
Created by: yeyupiaoling
为了可以使用多个CPU进行训练,因为我电脑有CPU有6核,我执行了
export CPU_NUM=6
在docker训练,在执行训练时就报错了,
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
----------- Configuration Arguments -----------
augment_conf_path: ./conf/augmentation.config
batch_size: 32
dev_manifest: ./dataset/manifest.dev
init_from_pretrained_model: None
is_local: 1
learning_rate: 0.0005
max_duration: 27.0
mean_std_path: ./dataset/mean_std.npz
min_duration: 0.0
num_conv_layers: 2
num_epoch: 50
num_iter_print: 100
num_rnn_layers: 3
num_samples: 120000
output_model_dir: ./models/checkpoints/
rnn_layer_size: 2048
save_epoch: 1
share_rnn_weights: 0
shuffle_method: batch_shuffle_clipped
specgram_type: linear
test_off: 0
train_manifest: ./dataset/manifest.train
use_gpu: 0
use_gru: 1
use_sortagrad: 1
vocab_path: ./dataset/zh_vocab.txt
------------------------------------------------
I1228 04:08:01.524451 123 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 6. And the Program will be copied 6 copies
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:774: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "train.py", line 118, in <module>
main()
File "train.py", line 114, in main
train()
File "train.py", line 109, in train
test_off=args.test_off)
File "/DeepSpeech/model_utils/model.py", line 332, in train
return_numpy=False)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 775, in run
six.reraise(*sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 770, in run
use_program_cache=use_program_cache)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 819, in _run_impl
program._compile(scope, self.place)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/compiler.py", line 392, in _compile
places=self._places)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/compiler.py", line 355, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
paddle.fluid.core_avx.EnforceNotMet: 0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::memory::detail::AlignedMalloc(unsigned long)
3 paddle::memory::detail::CPUAllocator::Alloc(unsigned long*, unsigned long)
4 paddle::memory::detail::BuddyAllocator::RefillPool(unsigned long)
5 paddle::memory::detail::BuddyAllocator::Alloc(unsigned long)
6 void* paddle::memory::legacy::Alloc<paddle::platform::CPUPlace>(paddle::platform::CPUPlace const&, unsigned long)
7 paddle::memory::allocation::NaiveBestFitAllocator::AllocateImpl(unsigned long)
8 paddle::memory::allocation::AllocatorFacade::Alloc(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long)
9 paddle::memory::allocation::AllocatorFacade::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long)
10 paddle::memory::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long)
11 paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, paddle::framework::proto::VarType_Type, unsigned long)
12 paddle::framework::ParallelExecutor::BCastParamsToDevices(std::vector<std::string, std::allocator<std::string> > const&, int) const
13 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::allocator<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > const&, std::vector<std::string, std::allocator<std::string> > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocator<paddle::framework::Scope*> > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)
PaddleCheckError: Expected posix_memalign(&p, alignment, size) == 0, but received posix_memalign(&p, alignment, size):12 != 0:0.
Alloc 522324864 error! at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:59]
Failed in training!