提交 d1d483f3 编写于 作者: G gineshidalgo99

Multi-scale much faster & less memory

上级 7ae8d776
...@@ -115,20 +115,23 @@ OpenPose Library - Release Notes ...@@ -115,20 +115,23 @@ OpenPose Library - Release Notes
## Current version (future OpenPose 1.2.0alpha) ## Current version (future OpenPose 1.2.0alpha)
1. Main improvements: 1. Main improvements:
1. Added IP camera support. 1. Speed increase when processing images with different aspect ratios. E.g. ~20% increase over 3.7k COCO validation images on 1 scale.
2. Output images can have the input size, OpenPose able to change its size for each image and not required fixed size anymore. 2. Huge speed increase and memory reduction when processing multi-scale. E.g. over 3.7k COCO validation images on 4 scales: ~40% (~770 to ~450 sec) speed increase, ~25% memory reduction (from ~8.9 to ~6.7 GB / GPU).
3. Slightly increase of accuracy given the fixed mini-bugs.
4. Added IP camera support.
5. Output images can have the input size, OpenPose able to change its size for each image and not required fixed size anymore.
1. FrameDisplayer accepts variable size images by rescaling every time a frame with bigger width or height is displayed (gui module). 1. FrameDisplayer accepts variable size images by rescaling every time a frame with bigger width or height is displayed (gui module).
2. OpOutputToCvMat & GuiInfoAdder does not require to know the output size at construction time, deduced from each image. 2. OpOutputToCvMat & GuiInfoAdder does not require to know the output size at construction time, deduced from each image.
3. CvMatToOutput and Renderers allow to keep input resolution as output for images (core module). 3. CvMatToOutput and Renderers allow to keep input resolution as output for images (core module).
3. New standalone face keypoint detector based on OpenCV face detector: much faster if body keypoint detection is not required but much less accurate. 6. New standalone face keypoint detector based on OpenCV face detector: much faster if body keypoint detection is not required but much less accurate.
4. Face and hand keypoint detectors now can return each keypoint heatmap. 7. Face and hand keypoint detectors now can return each keypoint heatmap.
5. The flag `USE_CUDNN` is no longer required; `USE_CAFFE` and `USE_CUDA` (replacing the old `CPU_ONLY`) are no longer required to use the library, only to build it. In addition, Boost, Caffe, and its dependencies have been removed from the OpenPose header files. Only OpenCV include and lib folders are required when building a project using OpenPose. 8. The flag `USE_CUDNN` is no longer required; `USE_CAFFE` and `USE_CUDA` (replacing the old `CPU_ONLY`) are no longer required to use the library, only to build it. In addition, Boost, Caffe, and its dependencies have been removed from the OpenPose header files. Only OpenCV include and lib folders are required when building a project using OpenPose.
6. OpenPose successfully compiles if the flags `USE_CAFFE` and/or `USE_CUDA` are not enabled, although it will give an error saying they are required. 9. OpenPose successfully compiles if the flags `USE_CAFFE` and/or `USE_CUDA` are not enabled, although it will give an error saying they are required.
7. COCO JSON file outputs 0 as score for non-detected keypoints. 10. COCO JSON file outputs 0 as score for non-detected keypoints.
8. Added example for OpenPose for user asynchronous output and cleaned all `tutorial_wrapper/` examples. 11. Added example for OpenPose for user asynchronous output and cleaned all `tutorial_wrapper/` examples.
9. Added `-1` option for `net_resolution` in order to auto-select the best possible aspect ratio given the user input. 12. Added `-1` option for `net_resolution` in order to auto-select the best possible aspect ratio given the user input.
10. Net resolution can be dynamically changed (e.g. for images with different size). 13. Net resolution can be dynamically changed (e.g. for images with different size).
11. Added example to add functionality/modules to OpenPose. 14. Added example to add functionality/modules to OpenPose.
2. Functions or parameters renamed: 2. Functions or parameters renamed:
1. OpenPose able to change its size and initial size dynamically: 1. OpenPose able to change its size and initial size dynamically:
1. Flag `resolution` renamed as `output_resolution`. 1. Flag `resolution` renamed as `output_resolution`.
......
# Script for internal use. We might completely change it continuously and we will not answer questions about it.
clear && clear
# USAGE EXAMPLE
# See ./examples/tests/pose_accuracy_coco_test.sh
# Parameters
IMAGE_FOLDER=/media/posefs3b/Users/gines/openpose_train/dataset/COCO/images/test2017_dev/
JSON_FOLDER=../evaluation/coco_val_jsons/
OP_BIN=./build/examples/openpose/openpose.bin
# 1 scale
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_test.json --no_display --render_pose 0
# # 3 scales
# $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25
# 4 scales
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_4_test.json --no_display --render_pose 0 --scale_number 4 --scale_gap 0.25 --net_resolution "1312x736"
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
clear && clear clear && clear
# USAGE EXAMPLE # USAGE EXAMPLE
# clear && clear && make all -j24 && bash ./examples/tests/pose_accuracy_coco_test.sh # clear && clear && make all -j`nproc` && bash ./examples/tests/pose_accuracy_coco_test.sh
# # Go back to main folder # # Go back to main folder
# cd ../../ # cd ../../
...@@ -23,14 +23,14 @@ OP_BIN=./build/examples/openpose/openpose.bin ...@@ -23,14 +23,14 @@ OP_BIN=./build/examples/openpose/openpose.bin
# 1 scale # 1 scale
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558 $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558
# 1 scale - Debugging # 1 scale - Debugging
# $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --frame_last 3558 --write_images ~/Desktop/CppValidation/ # $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --frame_last 3558 --write_images ~/Desktop/CppValidation/
# # 3 scales # # 3 scales
# $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25 --frame_last 3558 # $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25 --frame_last 3558
# # 4 scales # # 4 scales
# $OP_BIN --num_gpu 1 --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_4.json --no_display --render_pose 0 --num_gpu 1 --scale_number 4 --scale_gap 0.25 --net_resolution "1312x736" --frame_last 3558 # $OP_BIN --num_gpu 1 --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_4.json --no_display --render_pose 0 --scale_number 4 --scale_gap 0.25 --net_resolution "1312x736" --frame_last 3558
# Debugging - Rendered frames saved # Debugging - Rendered frames saved
# $OP_BIN --image_dir $IMAGE_FOLDER --write_images ${JSON_FOLDER}frameOutput --no_display # $OP_BIN --image_dir $IMAGE_FOLDER --write_images ${JSON_FOLDER}frameOutput --no_display
# Script for internal use. We might completely change it continuously and we will not answer questions about it.
clear && clear
# USAGE EXAMPLE
# See ./examples/tests/pose_accuracy_coco_test.sh
# Parameters
IMAGE_FOLDER=/home/gines/devel/images/val2014/
JSON_FOLDER=../evaluation/coco_val_jsons/
OP_BIN=./build/examples/openpose/openpose.bin
# 1 scale
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558
# 3 scales
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25 --frame_last 3558
# 4 scales
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_4.json --no_display --render_pose 0 --scale_number 4 --scale_gap 0.25 --net_resolution "1312x736" --frame_last 3558
...@@ -9,8 +9,9 @@ namespace op ...@@ -9,8 +9,9 @@ namespace op
class OP_API CvMatToOpInput class OP_API CvMatToOpInput
{ {
public: public:
Array<float> createArray(const cv::Mat& cvInputData, const std::vector<double>& scaleInputToNetInputs, std::vector<Array<float>> createArray(const cv::Mat& cvInputData,
const std::vector<Point<int>>& netInputSizes) const; const std::vector<double>& scaleInputToNetInputs,
const std::vector<Point<int>>& netInputSizes) const;
}; };
} }
......
...@@ -35,13 +35,14 @@ namespace op ...@@ -35,13 +35,14 @@ namespace op
* with the net. * with the net.
* In case of >1 scales, then each scale is right- and bottom-padded to fill the greatest resolution. The * In case of >1 scales, then each scale is right- and bottom-padded to fill the greatest resolution. The
* scales are sorted from bigger to smaller. * scales are sorted from bigger to smaller.
* Size: #scales x 3 x input_net_height x input_net_width * Vector size: #scales
* Each array size: 3 x input_net_height x input_net_width
*/ */
Array<float> inputNetData; std::vector<Array<float>> inputNetData;
/** /**
* Rendered image in Array<float> format. * Rendered image in Array<float> format.
* It consists of a blending of the inputNetData and the pose/body part(s) heatmap/PAF(s). * It consists of a blending of the cvInputData and the pose/body part(s) heatmap/PAF(s).
* If rendering is disabled (e.g. `no_render_pose` flag in the demo), then outputData will be empty. * If rendering is disabled (e.g. `no_render_pose` flag in the demo), then outputData will be empty.
* Size: 3 x output_net_height x output_net_width * Size: 3 x output_net_height x output_net_width
*/ */
......
...@@ -6,11 +6,15 @@ ...@@ -6,11 +6,15 @@
namespace op namespace op
{ {
template <typename T> template <typename T>
OP_API void resizeAndMergeCpu(T* targetPtr, const T* const sourcePtr, const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize, OP_API void resizeAndMergeCpu(T* targetPtr, const std::vector<const T*>& sourcePtrs,
const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<T>& scaleInputToNetInputs = {1.f}); const std::vector<T>& scaleInputToNetInputs = {1.f});
template <typename T> template <typename T>
OP_API void resizeAndMergeGpu(T* targetPtr, const T* const sourcePtr, const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize, OP_API void resizeAndMergeGpu(T* targetPtr, const std::vector<const T*>& sourcePtrs,
const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<T>& scaleInputToNetInputs = {1.f}); const std::vector<T>& scaleInputToNetInputs = {1.f});
} }
......
...@@ -24,7 +24,7 @@ namespace op ...@@ -24,7 +24,7 @@ namespace op
virtual void LayerSetUp(const std::vector<caffe::Blob<T>*>& bottom, const std::vector<caffe::Blob<T>*>& top); virtual void LayerSetUp(const std::vector<caffe::Blob<T>*>& bottom, const std::vector<caffe::Blob<T>*>& top);
virtual void Reshape(const std::vector<caffe::Blob<T>*>& bottom, const std::vector<caffe::Blob<T>*>& top, virtual void Reshape(const std::vector<caffe::Blob<T>*>& bottom, const std::vector<caffe::Blob<T>*>& top,
const float netFactor, const float scaleFactor, const bool mergeFirstDimension = true); const T netFactor, const T scaleFactor, const bool mergeFirstDimension = true);
virtual inline const char* type() const { return "ResizeAndMerge"; } virtual inline const char* type() const { return "ResizeAndMerge"; }
...@@ -42,7 +42,7 @@ namespace op ...@@ -42,7 +42,7 @@ namespace op
private: private:
std::vector<T> mScaleRatios; std::vector<T> mScaleRatios;
std::array<int, 4> mBottomSize; std::vector<std::array<int, 4>> mBottomSizes;
std::array<int, 4> mTopSize; std::array<int, 4> mTopSize;
DELETE_COPY(ResizeAndMergeCaffe); DELETE_COPY(ResizeAndMergeCaffe);
......
...@@ -20,7 +20,7 @@ namespace op ...@@ -20,7 +20,7 @@ namespace op
void initializationOnThread(); void initializationOnThread();
virtual void forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize, virtual void forwardPass(const std::vector<Array<float>>& inputNetData, const Point<int>& inputDataSize,
const std::vector<double>& scaleRatios = {1.f}) = 0; const std::vector<double>& scaleRatios = {1.f}) = 0;
virtual const float* getHeatMapCpuConstPtr() const = 0; virtual const float* getHeatMapCpuConstPtr() const = 0;
......
...@@ -19,7 +19,7 @@ namespace op ...@@ -19,7 +19,7 @@ namespace op
void netInitializationOnThread(); void netInitializationOnThread();
void forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize, void forwardPass(const std::vector<Array<float>>& inputNetData, const Point<int>& inputDataSize,
const std::vector<double>& scaleInputToNetInputs = {1.f}); const std::vector<double>& scaleInputToNetInputs = {1.f});
const float* getHeatMapCpuConstPtr() const; const float* getHeatMapCpuConstPtr() const;
......
...@@ -289,6 +289,7 @@ namespace op ...@@ -289,6 +289,7 @@ namespace op
}; };
const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_INTER_MIN_ABOVE_THRESHOLD{ const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_INTER_MIN_ABOVE_THRESHOLD{
0.95f, 0.95f, 0.95f, 0.95f, 0.95f, 0.95f 0.95f, 0.95f, 0.95f, 0.95f, 0.95f, 0.95f
// 0.85f, 0.85f, 0.85f, 0.85f, 0.85f, 0.85f // Matlab version
}; };
const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_INTER_THRESHOLD{ const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_INTER_THRESHOLD{
0.05f, 0.01f, 0.01f, 0.05f, 0.05f, 0.05f 0.05f, 0.01f, 0.01f, 0.05f, 0.05f, 0.05f
...@@ -298,6 +299,7 @@ namespace op ...@@ -298,6 +299,7 @@ namespace op
}; };
const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_MIN_SUBSET_SCORE{ const std::array<float, (int)PoseModel::Size> POSE_DEFAULT_CONNECT_MIN_SUBSET_SCORE{
0.4f, 0.4f, 0.4f, 0.4f, 0.4f, 0.4f 0.4f, 0.4f, 0.4f, 0.4f, 0.4f, 0.4f
// 0.2f, 0.4f, 0.4f, 0.4f, 0.4f, 0.4f // Matlab version
}; };
// Rendering parameters // Rendering parameters
......
...@@ -4,9 +4,9 @@ ...@@ -4,9 +4,9 @@
namespace op namespace op
{ {
Array<float> CvMatToOpInput::createArray(const cv::Mat& cvInputData, std::vector<Array<float>> CvMatToOpInput::createArray(const cv::Mat& cvInputData,
const std::vector<double>& scaleInputToNetInputs, const std::vector<double>& scaleInputToNetInputs,
const std::vector<Point<int>>& netInputSizes) const const std::vector<Point<int>>& netInputSizes) const
{ {
try try
{ {
...@@ -19,22 +19,22 @@ namespace op ...@@ -19,22 +19,22 @@ namespace op
error("scaleInputToNetInputs.size() != netInputSizes.size().", __LINE__, __FUNCTION__, __FILE__); error("scaleInputToNetInputs.size() != netInputSizes.size().", __LINE__, __FUNCTION__, __FILE__);
// inputNetData - Reescale keeping aspect ratio and transform to float the input deep net image // inputNetData - Reescale keeping aspect ratio and transform to float the input deep net image
const auto numberScales = (int)scaleInputToNetInputs.size(); const auto numberScales = (int)scaleInputToNetInputs.size();
Array<float> inputNetData{{numberScales, 3, netInputSizes.at(0).y, netInputSizes.at(0).x}}; std::vector<Array<float>> inputNetData(numberScales);
std::vector<double> scaleRatios(numberScales, 1.f); for (auto i = 0u ; i < inputNetData.size() ; i++)
const auto inputNetDataOffset = inputNetData.getVolume(1, 3);
for (auto i = 0; i < numberScales; i++)
{ {
inputNetData[i].reset({1, 3, netInputSizes.at(i).y, netInputSizes.at(i).x});
std::vector<double> scaleRatios(numberScales, 1.f);
const cv::Mat frameWithNetSize = resizeFixedAspectRatio(cvInputData, scaleInputToNetInputs[i], const cv::Mat frameWithNetSize = resizeFixedAspectRatio(cvInputData, scaleInputToNetInputs[i],
netInputSizes[i]); netInputSizes[i]);
// Fill inputNetData // Fill inputNetData[i]
uCharCvMatToFloatPtr(inputNetData.getPtr() + i * inputNetDataOffset, frameWithNetSize, true); uCharCvMatToFloatPtr(inputNetData[i].getPtr(), frameWithNetSize, true);
} }
return inputNetData; return inputNetData;
} }
catch (const std::exception& e) catch (const std::exception& e)
{ {
error(e.what(), __LINE__, __FUNCTION__, __FILE__); error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return Array<float>{}; return {};
} }
} }
} }
...@@ -157,7 +157,9 @@ namespace op ...@@ -157,7 +157,9 @@ namespace op
datum.name = name; datum.name = name;
// Input image and rendered version // Input image and rendered version
datum.cvInputData = cvInputData.clone(); datum.cvInputData = cvInputData.clone();
datum.inputNetData = inputNetData.clone(); datum.inputNetData.resize(inputNetData.size());
for (auto i = 0u ; i < datum.inputNetData.size() ; i++)
datum.inputNetData[i] = inputNetData[i].clone();
datum.outputData = outputData.clone(); datum.outputData = outputData.clone();
datum.cvOutputData = cvOutputData.clone(); datum.cvOutputData = cvOutputData.clone();
// Resulting Array<float> data // Resulting Array<float> data
......
...@@ -33,8 +33,7 @@ namespace op ...@@ -33,8 +33,7 @@ namespace op
mGpuId{gpuId}, mGpuId{gpuId},
mCaffeProto{caffeProto}, mCaffeProto{caffeProto},
mCaffeTrainedModel{caffeTrainedModel}, mCaffeTrainedModel{caffeTrainedModel},
mLastBlobName{lastBlobName}, mLastBlobName{lastBlobName}
mNetInputSize4D{0,0,0,0}
{ {
const std::string message{".\nPossible causes:\n\t1. Not downloading the OpenPose trained models." const std::string message{".\nPossible causes:\n\t1. Not downloading the OpenPose trained models."
"\n\t2. Not running OpenPose from the same directory where the `model`" "\n\t2. Not running OpenPose from the same directory where the `model`"
...@@ -160,7 +159,10 @@ namespace op ...@@ -160,7 +159,10 @@ namespace op
#endif #endif
// Perform deep network forward pass // Perform deep network forward pass
upImpl->upCaffeNet->ForwardFrom(0); upImpl->upCaffeNet->ForwardFrom(0);
cudaCheck(__LINE__, __FUNCTION__, __FILE__); // Cuda checks
#ifdef USE_CUDA
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#endif
#else #else
UNUSED(inputData); UNUSED(inputData);
#endif #endif
......
...@@ -4,16 +4,18 @@ ...@@ -4,16 +4,18 @@
namespace op namespace op
{ {
template <typename T> template <typename T>
void resizeAndMergeCpu(T* targetPtr, const T* const sourcePtr, const std::array<int, 4>& targetSize, void resizeAndMergeCpu(T* targetPtr, const std::vector<const T*>& sourcePtrs,
const std::array<int, 4>& sourceSize, const std::vector<T>& scaleInputToNetInputs) const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<T>& scaleInputToNetInputs)
{ {
try try
{ {
UNUSED(targetPtr); UNUSED(targetPtr);
UNUSED(sourcePtr); UNUSED(sourcePtrs);
UNUSED(scaleInputToNetInputs); UNUSED(scaleInputToNetInputs);
UNUSED(targetSize); UNUSED(targetSize);
UNUSED(sourceSize); UNUSED(sourceSizes);
error("CPU version not completely implemented.", __LINE__, __FUNCTION__, __FILE__); error("CPU version not completely implemented.", __LINE__, __FUNCTION__, __FILE__);
// TODO: THIS CODE IS WORKING, BUT IT DOES NOT CONSIDER THE SCALES (I.E. SCALE NUMBER, START AND GAP) // TODO: THIS CODE IS WORKING, BUT IT DOES NOT CONSIDER THE SCALES (I.E. SCALE NUMBER, START AND GAP)
...@@ -34,10 +36,10 @@ namespace op ...@@ -34,10 +36,10 @@ namespace op
// const auto sourceOffsetChannel = sourceHeight * sourceWidth; // const auto sourceOffsetChannel = sourceHeight * sourceWidth;
// const auto sourceOffsetNum = sourceOffsetChannel * channel; // const auto sourceOffsetNum = sourceOffsetChannel * channel;
// const auto sourceOffset = n*sourceOffsetNum + c*sourceOffsetChannel; // const auto sourceOffset = n*sourceOffsetNum + c*sourceOffsetChannel;
// const T* const sourcePtr = bottom->cpu_data(); // const T* const sourcePtrs = bottom->cpu_data();
// for (int y = 0; y < sourceHeight; y++) // for (int y = 0; y < sourceHeight; y++)
// for (int x = 0; x < sourceWidth; x++) // for (int x = 0; x < sourceWidth; x++)
// source.at<T>(x,y) = sourcePtr[sourceOffset + y*sourceWidth + x]; // source.at<T>(x,y) = sourcePtrs[sourceOffset + y*sourceWidth + x];
// // spatial resize // // spatial resize
// cv::Mat target; // cv::Mat target;
...@@ -60,8 +62,12 @@ namespace op ...@@ -60,8 +62,12 @@ namespace op
} }
} }
template void resizeAndMergeCpu(float* targetPtr, const float* const sourcePtr, const std::array<int, 4>& targetSize, template void resizeAndMergeCpu(float* targetPtr, const std::vector<const float*>& sourcePtrs,
const std::array<int, 4>& sourceSize, const std::vector<float>& scaleInputToNetInputs); const std::array<int, 4>& targetSize,
template void resizeAndMergeCpu(double* targetPtr, const double* const sourcePtr, const std::array<int, 4>& targetSize, const std::vector<std::array<int, 4>>& sourceSizes,
const std::array<int, 4>& sourceSize, const std::vector<double>& scaleInputToNetInputs); const std::vector<float>& scaleInputToNetInputs);
template void resizeAndMergeCpu(double* targetPtr, const std::vector<const double*>& sourcePtrs,
const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<double>& scaleInputToNetInputs);
} }
...@@ -15,110 +15,112 @@ namespace op ...@@ -15,110 +15,112 @@ namespace op
if (x < targetWidth && y < targetHeight) if (x < targetWidth && y < targetHeight)
{ {
const auto scaleWidth = targetWidth / T(sourceWidth); const T xSource = (x + 0.5f) * sourceWidth / T(targetWidth) - 0.5f;
const auto scaleHeight = targetHeight / T(sourceHeight); const T ySource = (y + 0.5f) * sourceHeight / T(targetHeight) - 0.5f;
const T xSource = (x + 0.5f) / scaleWidth - 0.5f;
const T ySource = (y + 0.5f) / scaleHeight - 0.5f;
targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight, targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight,
sourceWidth); sourceWidth);
} }
} }
template <typename T> template <typename T>
__global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const int sourceNumOffset, __global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const T scaleWidth,
const int num, const T* scaleInputToNetInputs, const int sourceWidth, const T scaleHeight, const int sourceWidth, const int sourceHeight,
const int sourceHeight, const int targetWidth, const int targetHeight) const int targetWidth, const int targetHeight, const int averageCounter)
{ {
const auto x = (blockIdx.x * blockDim.x) + threadIdx.x; const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
const auto y = (blockIdx.y * blockDim.y) + threadIdx.y; const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;
if (x < targetWidth && y < targetHeight) if (x < targetWidth && y < targetHeight)
{ {
const T xSource = (x + 0.5f) / scaleWidth - 0.5f;
const T ySource = (y + 0.5f) / scaleHeight - 0.5f;
const auto interpolated = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight,
sourceWidth);
auto& targetPixel = targetPtr[y*targetWidth+x]; auto& targetPixel = targetPtr[y*targetWidth+x];
targetPixel = 0.f; // For average targetPixel = ((averageCounter * targetPixel) + interpolated) / T(averageCounter + 1);
// targetPixel = -1000.f; // For fastMax // targetPixel = fastMax(targetPixel, interpolated);
for (auto n = 0; n < num; n++)
{
const auto currentWidth = sourceWidth * scaleInputToNetInputs[n] / scaleInputToNetInputs[0];
const auto currentHeight = sourceHeight * scaleInputToNetInputs[n] / scaleInputToNetInputs[0];
const auto scaleWidth = targetWidth / currentWidth;
const auto scaleHeight = targetHeight / currentHeight;
const T xSource = (x + 0.5f) / scaleWidth - 0.5f;
const T ySource = (y + 0.5f) / scaleHeight - 0.5f;
const T* const sourcePtrN = sourcePtr + n * sourceNumOffset;
const auto interpolated = bicubicInterpolate(sourcePtrN, xSource, ySource, intRound(currentWidth),
intRound(currentHeight), sourceWidth);
targetPixel += interpolated;
// targetPixel = fastMax(targetPixel, interpolated);
}
targetPixel /= num;
} }
} }
template <typename T> template <typename T>
void resizeAndMergeGpu(T* targetPtr, const T* const sourcePtr, const std::array<int, 4>& targetSize, void resizeAndMergeGpu(T* targetPtr, const std::vector<const T*>& sourcePtrs, const std::array<int, 4>& targetSize,
const std::array<int, 4>& sourceSize, const std::vector<T>& scaleInputToNetInputs) const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<T>& scaleInputToNetInputs)
{ {
try try
{ {
const auto num = sourceSize[0]; // Security checks
const auto channels = sourceSize[1]; if (sourceSizes.empty())
const auto sourceHeight = sourceSize[2]; error("sourceSizes cannot be empty.", __LINE__, __FUNCTION__, __FILE__);
const auto sourceWidth = sourceSize[3]; if (sourcePtrs.size() != sourceSizes.size() || sourceSizes.size() != scaleInputToNetInputs.size())
error("Size(sourcePtrs) must match size(sourceSizes) and size(scaleInputToNetInputs). Currently: "
+ std::to_string(sourcePtrs.size()) + " vs. " + std::to_string(sourceSizes.size()) + " vs. "
+ std::to_string(scaleInputToNetInputs.size()) + ".", __LINE__, __FUNCTION__, __FILE__);
// Parameters
const auto channels = targetSize[1];
const auto targetHeight = targetSize[2]; const auto targetHeight = targetSize[2];
const auto targetWidth = targetSize[3]; const auto targetWidth = targetSize[3];
const dim3 threadsPerBlock{THREADS_PER_BLOCK_1D, THREADS_PER_BLOCK_1D}; const dim3 threadsPerBlock{THREADS_PER_BLOCK_1D, THREADS_PER_BLOCK_1D};
const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x), const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x),
getNumberCudaBlocks(targetHeight, threadsPerBlock.y)}; getNumberCudaBlocks(targetHeight, threadsPerBlock.y)};
const auto sourceChannelOffset = sourceHeight * sourceWidth; const auto& sourceSize = sourceSizes[0];
const auto targetChannelOffset = targetWidth * targetHeight; const auto sourceHeight = sourceSize[2];
const auto sourceWidth = sourceSize[3];
// No multi-scale merging // No multi-scale merging or no merging required
if (targetSize[0] > 1) if (sourceSizes.size() == 1)
{ {
for (auto n = 0; n < num; n++) const auto num = sourceSize[0];
if (targetSize[0] > 1 || num == 1)
{ {
const auto offsetBase = n*channels; const auto sourceChannelOffset = sourceHeight * sourceWidth;
for (auto c = 0 ; c < channels ; c++) const auto targetChannelOffset = targetWidth * targetHeight;
for (auto n = 0; n < num; n++)
{ {
const auto offset = offsetBase + c; const auto offsetBase = n*channels;
resizeKernel<<<numBlocks, threadsPerBlock>>>(targetPtr + offset * targetChannelOffset, for (auto c = 0 ; c < channels ; c++)
sourcePtr + offset * sourceChannelOffset, {
sourceWidth, sourceHeight, targetWidth, const auto offset = offsetBase + c;
targetHeight); resizeKernel<<<numBlocks, threadsPerBlock>>>(targetPtr + offset * targetChannelOffset,
sourcePtrs.at(0) + offset * sourceChannelOffset,
sourceWidth, sourceHeight, targetWidth,
targetHeight);
}
} }
} }
// Old inefficient multi-scale merging
else
error("It should never reaches this point. Notify us.", __LINE__, __FUNCTION__, __FILE__);
} }
// Multi-scale merging // Multi-scaling merging
else else
{ {
// If scale_number > 1 --> scaleInputToNetInputs must be set const auto targetChannelOffset = targetWidth * targetHeight;
if (scaleInputToNetInputs.size() != num) cudaMemset(targetPtr, 0.f, channels*targetChannelOffset * sizeof(T));
error("The scale ratios size must be equal than the number of scales.", auto averageCounter = -1;
__LINE__, __FUNCTION__, __FILE__); const auto scaleToMainScaleWidth = targetWidth / T(sourceWidth);
const auto maxScales = 10; const auto scaleToMainScaleHeight = targetHeight / T(sourceHeight);
if (scaleInputToNetInputs.size() > maxScales)
error("The maximum number of scales is " + std::to_string(maxScales) + ".", for (auto i = 0u ; i < sourceSizes.size(); i++)
__LINE__, __FUNCTION__, __FILE__); {
// Copy scaleInputToNetInputs const auto& currentSize = sourceSizes.at(i);
T* scaleInputToNetInputsPtr; const auto currentHeight = currentSize[2];
cudaMalloc((void**)&scaleInputToNetInputsPtr, maxScales * sizeof(T)); const auto currentWidth = currentSize[3];
cudaMemcpy(scaleInputToNetInputsPtr, scaleInputToNetInputs.data(), const auto sourceChannelOffset = currentHeight * currentWidth;
scaleInputToNetInputs.size() * sizeof(T), cudaMemcpyHostToDevice); const auto scaleInputToNet = scaleInputToNetInputs[i] / scaleInputToNetInputs[0];
// Perform resize + merging const auto scaleWidth = scaleToMainScaleWidth / scaleInputToNet;
const auto sourceNumOffset = channels * sourceChannelOffset; const auto scaleHeight = scaleToMainScaleHeight / scaleInputToNet;
for (auto c = 0 ; c < channels ; c++) averageCounter++;
resizeKernelAndMerge<<<numBlocks, threadsPerBlock>>>(targetPtr + c * targetChannelOffset, for (auto c = 0 ; c < channels ; c++)
sourcePtr + c * sourceChannelOffset, {
sourceNumOffset, num, resizeKernelAndMerge<<<numBlocks, threadsPerBlock>>>(
scaleInputToNetInputsPtr, sourceWidth, targetPtr + c * targetChannelOffset, sourcePtrs[i] + c * sourceChannelOffset,
sourceHeight, targetWidth, targetHeight); scaleWidth, scaleHeight, currentWidth, currentHeight, targetWidth,
// Free memory targetHeight, averageCounter
cudaFree(scaleInputToNetInputsPtr); );
}
}
} }
cudaCheck(__LINE__, __FUNCTION__, __FILE__); cudaCheck(__LINE__, __FUNCTION__, __FILE__);
...@@ -129,10 +131,12 @@ namespace op ...@@ -129,10 +131,12 @@ namespace op
} }
} }
template void resizeAndMergeGpu(float* targetPtr, const float* const sourcePtr, template void resizeAndMergeGpu(float* targetPtr, const std::vector<const float*>& sourcePtrs,
const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize, const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<float>& scaleInputToNetInputs); const std::vector<float>& scaleInputToNetInputs);
template void resizeAndMergeGpu(double* targetPtr, const double* const sourcePtr, template void resizeAndMergeGpu(double* targetPtr, const std::vector<const double*>& sourcePtrs,
const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize, const std::array<int, 4>& targetSize,
const std::vector<std::array<int, 4>>& sourceSizes,
const std::vector<double>& scaleInputToNetInputs); const std::vector<double>& scaleInputToNetInputs);
} }
...@@ -32,9 +32,9 @@ namespace op ...@@ -32,9 +32,9 @@ namespace op
{ {
#ifdef USE_CAFFE #ifdef USE_CAFFE
if (top.size() != 1) if (top.size() != 1)
error("top.size() != 1", __LINE__, __FUNCTION__, __FILE__); error("top.size() != 1.", __LINE__, __FUNCTION__, __FILE__);
if (bottom.size() != 1) if (bottom.size() != 1)
error("bottom.size() != 2", __LINE__, __FUNCTION__, __FILE__); error("bottom.size() != 1.", __LINE__, __FUNCTION__, __FILE__);
#else #else
UNUSED(bottom); UNUSED(bottom);
UNUSED(top); UNUSED(top);
...@@ -49,16 +49,21 @@ namespace op ...@@ -49,16 +49,21 @@ namespace op
template <typename T> template <typename T>
void ResizeAndMergeCaffe<T>::Reshape(const std::vector<caffe::Blob<T>*>& bottom, void ResizeAndMergeCaffe<T>::Reshape(const std::vector<caffe::Blob<T>*>& bottom,
const std::vector<caffe::Blob<T>*>& top, const std::vector<caffe::Blob<T>*>& top,
const float netFactor, const T netFactor,
const float scaleFactor, const T scaleFactor,
const bool mergeFirstDimension) const bool mergeFirstDimension)
{ {
try try
{ {
#ifdef USE_CAFFE #ifdef USE_CAFFE
// Security checks
if (top.size() != 1)
error("top.size() != 1", __LINE__, __FUNCTION__, __FILE__);
if (bottom.empty())
error("bottom cannot be empty.", __LINE__, __FUNCTION__, __FILE__);
// Data // Data
const auto* bottomBlob = bottom.at(0);
auto* topBlob = top.at(0); auto* topBlob = top.at(0);
const auto* bottomBlob = bottom.at(0);
// Set top shape // Set top shape
auto topShape = bottomBlob->shape(); auto topShape = bottomBlob->shape();
topShape[0] = (mergeFirstDimension ? 1 : bottomBlob->shape(0)); topShape[0] = (mergeFirstDimension ? 1 : bottomBlob->shape(0));
...@@ -66,18 +71,21 @@ namespace op ...@@ -66,18 +71,21 @@ namespace op
// E.g. 100x100 image --> 200x200 --> 0-99 to 0-199 --> scale = 199/99 (not 2!) // E.g. 100x100 image --> 200x200 --> 0-99 to 0-199 --> scale = 199/99 (not 2!)
// E.g. 101x101 image --> 201x201 --> scale = 2 // E.g. 101x101 image --> 201x201 --> scale = 2
// Test: pixel 0 --> 0, pixel 99 (ex 1) --> 199, pixel 100 (ex 2) --> 200 // Test: pixel 0 --> 0, pixel 99 (ex 1) --> 199, pixel 100 (ex 2) --> 200
topShape[2] = intRound((topShape[2]*netFactor - 1.f) * scaleFactor + 1); topShape[2] = intRound((topShape[2]*netFactor - 1.f) * scaleFactor) + 1;
topShape[3] = intRound((topShape[3]*netFactor - 1.f) * scaleFactor + 1); topShape[3] = intRound((topShape[3]*netFactor - 1.f) * scaleFactor) + 1;
topBlob->Reshape(topShape); topBlob->Reshape(topShape);
// Array sizes // Array sizes
mTopSize = std::array<int, 4>{topBlob->shape(0), topBlob->shape(1), topBlob->shape(2), mTopSize = std::array<int, 4>{topBlob->shape(0), topBlob->shape(1), topBlob->shape(2),
topBlob->shape(3)}; topBlob->shape(3)};
mBottomSize = std::array<int, 4>{bottomBlob->shape(0), bottomBlob->shape(1), mBottomSizes.resize(bottom.size());
bottomBlob->shape(2), bottomBlob->shape(3)}; for (auto i = 0u ; i < mBottomSizes.size() ; i++)
mBottomSizes[i] = std::array<int, 4>{bottom[i]->shape(0), bottom[i]->shape(1),
bottom[i]->shape(2), bottom[i]->shape(3)};
#else #else
UNUSED(bottom); UNUSED(bottom);
UNUSED(top); UNUSED(top);
UNUSED(factor); UNUSED(netFactor);
UNUSED(scaleFactor);
UNUSED(mergeFirstDimension); UNUSED(mergeFirstDimension);
#endif #endif
} }
...@@ -107,7 +115,10 @@ namespace op ...@@ -107,7 +115,10 @@ namespace op
try try
{ {
#ifdef USE_CAFFE #ifdef USE_CAFFE
resizeAndMergeCpu(top.at(0)->mutable_cpu_data(), bottom.at(0)->cpu_data(), mTopSize, mBottomSize, std::vector<const T*> sourcePtrs(bottom.size());
for (auto i = 0u ; i < sourcePtrs.size() ; i++)
sourcePtrs[i] = bottom[i]->cpu_data();
resizeAndMergeCpu(top.at(0)->mutable_cpu_data(), sourcePtrs, mTopSize, mBottomSizes,
mScaleRatios); mScaleRatios);
#else #else
UNUSED(bottom); UNUSED(bottom);
...@@ -127,7 +138,10 @@ namespace op ...@@ -127,7 +138,10 @@ namespace op
try try
{ {
#if defined USE_CAFFE && defined USE_CUDA #if defined USE_CAFFE && defined USE_CUDA
resizeAndMergeGpu(top.at(0)->mutable_gpu_data(), bottom.at(0)->gpu_data(), mTopSize, mBottomSize, std::vector<const T*> sourcePtrs(bottom.size());
for (auto i = 0u ; i < sourcePtrs.size() ; i++)
sourcePtrs[i] = bottom[i]->gpu_data();
resizeAndMergeGpu(top.at(0)->mutable_gpu_data(), sourcePtrs, mTopSize, mBottomSizes,
mScaleRatios); mScaleRatios);
#else #else
UNUSED(bottom); UNUSED(bottom);
......
...@@ -54,9 +54,9 @@ namespace op ...@@ -54,9 +54,9 @@ namespace op
poseNetInputSize.x * inputResolution.y / (float) inputResolution.x / 16.f poseNetInputSize.x * inputResolution.y / (float) inputResolution.x / 16.f
); );
} }
// scaleInputToNetInputs & sizes - Reescale keeping aspect ratio // scaleInputToNetInputs & netInputSizes - Reescale keeping aspect ratio
std::vector<double> scaleInputToNetInputs(mScaleNumber, 1.f); std::vector<double> scaleInputToNetInputs(mScaleNumber, 1.f);
std::vector<Point<int>> sizes(mScaleNumber); std::vector<Point<int>> netInputSizes(mScaleNumber);
for (auto i = 0; i < mScaleNumber; i++) for (auto i = 0; i < mScaleNumber; i++)
{ {
const auto currentScale = 1. - i*mScaleGap; const auto currentScale = 1. - i*mScaleGap;
...@@ -70,7 +70,7 @@ namespace op ...@@ -70,7 +70,7 @@ namespace op
poseNetInputSize.y); poseNetInputSize.y);
const Point<int> targetSize{targetWidth, targetHeight}; const Point<int> targetSize{targetWidth, targetHeight};
scaleInputToNetInputs[i] = resizeGetScaleFactor(inputResolution, targetSize); scaleInputToNetInputs[i] = resizeGetScaleFactor(inputResolution, targetSize);
sizes[i] = poseNetInputSize; netInputSizes[i] = targetSize;
} }
// scaleInputToOutput - Scale between input and desired output size // scaleInputToOutput - Scale between input and desired output size
Point<int> outputResolution; Point<int> outputResolution;
...@@ -88,7 +88,7 @@ namespace op ...@@ -88,7 +88,7 @@ namespace op
scaleInputToOutput = 1.; scaleInputToOutput = 1.;
} }
// Return result // Return result
return std::make_tuple(scaleInputToNetInputs, sizes, scaleInputToOutput, outputResolution); return std::make_tuple(scaleInputToNetInputs, netInputSizes, scaleInputToOutput, outputResolution);
} }
catch (const std::exception& e) catch (const std::exception& e)
{ {
......
...@@ -18,23 +18,30 @@ namespace op ...@@ -18,23 +18,30 @@ namespace op
struct PoseExtractorCaffe::ImplPoseExtractorCaffe struct PoseExtractorCaffe::ImplPoseExtractorCaffe
{ {
#ifdef USE_CAFFE #ifdef USE_CAFFE
std::shared_ptr<NetCaffe> spNetCaffe; // Used when increasing spCaffeNets
const PoseModel mPoseModel;
const int mGpuId;
const std::string mModelFolder;
const bool mEnableGoogleLogging;
// General parameters
std::vector<std::shared_ptr<NetCaffe>> spCaffeNets;
std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe; std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
std::shared_ptr<NmsCaffe<float>> spNmsCaffe; std::shared_ptr<NmsCaffe<float>> spNmsCaffe;
std::shared_ptr<BodyPartConnectorCaffe<float>> spBodyPartConnectorCaffe; std::shared_ptr<BodyPartConnectorCaffe<float>> spBodyPartConnectorCaffe;
std::vector<int> mNetInputSize4D; std::vector<std::vector<int>> mNetInput4DSizes;
std::vector<double> mScaleInputToNetInputs; std::vector<double> mScaleInputToNetInputs;
// Init with thread // Init with thread
boost::shared_ptr<caffe::Blob<float>> spCaffeNetOutputBlob; std::vector<boost::shared_ptr<caffe::Blob<float>>> spCaffeNetOutputBlobs;
std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob; std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob;
std::shared_ptr<caffe::Blob<float>> spPeaksBlob; std::shared_ptr<caffe::Blob<float>> spPeaksBlob;
std::shared_ptr<caffe::Blob<float>> spPoseBlob; std::shared_ptr<caffe::Blob<float>> spPoseBlob;
ImplPoseExtractorCaffe(const PoseModel poseModel, const int gpuId, ImplPoseExtractorCaffe(const PoseModel poseModel, const int gpuId,
const std::string& modelFolder, const bool enableGoogleLogging) : const std::string& modelFolder, const bool enableGoogleLogging) :
spNetCaffe{std::make_shared<NetCaffe>(modelFolder + POSE_PROTOTXT[(int)poseModel], mPoseModel{poseModel},
modelFolder + POSE_TRAINED_MODEL[(int)poseModel], gpuId, mGpuId{gpuId},
enableGoogleLogging)}, mModelFolder{modelFolder},
mEnableGoogleLogging{enableGoogleLogging},
spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()}, spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
spNmsCaffe{std::make_shared<NmsCaffe<float>>()}, spNmsCaffe{std::make_shared<NmsCaffe<float>>()},
spBodyPartConnectorCaffe{std::make_shared<BodyPartConnectorCaffe<float>>()} spBodyPartConnectorCaffe{std::make_shared<BodyPartConnectorCaffe<float>>()}
...@@ -44,10 +51,28 @@ namespace op ...@@ -44,10 +51,28 @@ namespace op
}; };
#ifdef USE_CAFFE #ifdef USE_CAFFE
std::vector<caffe::Blob<float>*> caffeNetSharedToPtr(
std::vector<boost::shared_ptr<caffe::Blob<float>>>& caffeNetOutputBlob)
{
try
{
// Prepare spCaffeNetOutputBlobss
std::vector<caffe::Blob<float>*> caffeNetOutputBlobs(caffeNetOutputBlob.size());
for (auto i = 0u ; i < caffeNetOutputBlobs.size() ; i++)
caffeNetOutputBlobs[i] = caffeNetOutputBlob[i].get();
return caffeNetOutputBlobs;
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return {};
}
}
inline void reshapePoseExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe, inline void reshapePoseExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
std::shared_ptr<NmsCaffe<float>>& nmsCaffe, std::shared_ptr<NmsCaffe<float>>& nmsCaffe,
std::shared_ptr<BodyPartConnectorCaffe<float>>& bodyPartConnectorCaffe, std::shared_ptr<BodyPartConnectorCaffe<float>>& bodyPartConnectorCaffe,
boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob, std::vector<boost::shared_ptr<caffe::Blob<float>>>& caffeNetOutputBlob,
std::shared_ptr<caffe::Blob<float>>& heatMapsBlob, std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
std::shared_ptr<caffe::Blob<float>>& peaksBlob, std::shared_ptr<caffe::Blob<float>>& peaksBlob,
std::shared_ptr<caffe::Blob<float>>& poseBlob, std::shared_ptr<caffe::Blob<float>>& poseBlob,
...@@ -57,14 +82,47 @@ namespace op ...@@ -57,14 +82,47 @@ namespace op
try try
{ {
// HeatMaps extractor blob and layer // HeatMaps extractor blob and layer
resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()}, const auto caffeNetOutputBlobs = caffeNetSharedToPtr(caffeNetOutputBlob);
resizeAndMergeCaffe->Reshape(caffeNetOutputBlobs, {heatMapsBlob.get()},
POSE_CCN_DECREASE_FACTOR[(int)poseModel], 1.f/scaleInputToNetInput); POSE_CCN_DECREASE_FACTOR[(int)poseModel], 1.f/scaleInputToNetInput);
// Pose extractor blob and layer // Pose extractor blob and layer
nmsCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()}, POSE_MAX_PEAKS[(int)poseModel]); nmsCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()}, POSE_MAX_PEAKS[(int)poseModel]);
// Pose extractor blob and layer // Pose extractor blob and layer
bodyPartConnectorCaffe->Reshape({heatMapsBlob.get(), peaksBlob.get()}, {poseBlob.get()}); bodyPartConnectorCaffe->Reshape({heatMapsBlob.get(), peaksBlob.get()}, {poseBlob.get()});
// Cuda check // Cuda check
cudaCheck(__LINE__, __FUNCTION__, __FILE__); #ifdef USE_CUDA
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#endif
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
void addCaffeNetOnThread(std::vector<std::shared_ptr<NetCaffe>>& netCaffe,
std::vector<boost::shared_ptr<caffe::Blob<float>>>& caffeNetOutputBlob,
const PoseModel poseModel, const int gpuId,
const std::string& modelFolder, const bool enableGoogleLogging)
{
try
{
// Add Caffe Net
netCaffe.emplace_back(
std::make_shared<NetCaffe>(modelFolder + POSE_PROTOTXT[(int)poseModel],
modelFolder + POSE_TRAINED_MODEL[(int)poseModel],
gpuId, enableGoogleLogging)
);
// Initializing them on the thread
netCaffe.back()->initializationOnThread();
caffeNetOutputBlob.emplace_back(netCaffe.back()->getOutputBlob());
// Security checks
if (netCaffe.size() != caffeNetOutputBlob.size())
error("Weird error, this should not happen. Notify us.", __LINE__, __FUNCTION__, __FILE__);
// Cuda check
#ifdef USE_CUDA
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#endif
} }
catch (const std::exception& e) catch (const std::exception& e)
{ {
...@@ -114,14 +172,18 @@ namespace op ...@@ -114,14 +172,18 @@ namespace op
// Logging // Logging
log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__); log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
// Initialize Caffe net // Initialize Caffe net
upImpl->spNetCaffe->initializationOnThread(); addCaffeNetOnThread(upImpl->spCaffeNets, upImpl->spCaffeNetOutputBlobs, upImpl->mPoseModel,
cudaCheck(__LINE__, __FUNCTION__, __FILE__); upImpl->mGpuId, upImpl->mModelFolder, upImpl->mEnableGoogleLogging);
#ifdef USE_CUDA
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#endif
// Initialize blobs // Initialize blobs
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)}; upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)}; upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spPoseBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)}; upImpl->spPoseBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
cudaCheck(__LINE__, __FUNCTION__, __FILE__); #ifdef USE_CUDA
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#endif
// Logging // Logging
log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__); log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
#endif #endif
...@@ -132,7 +194,8 @@ namespace op ...@@ -132,7 +194,8 @@ namespace op
} }
} }
void PoseExtractorCaffe::forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize, void PoseExtractorCaffe::forwardPass(const std::vector<Array<float>>& inputNetData,
const Point<int>& inputDataSize,
const std::vector<double>& scaleInputToNetInputs) const std::vector<double>& scaleInputToNetInputs)
{ {
try try
...@@ -141,30 +204,50 @@ namespace op ...@@ -141,30 +204,50 @@ namespace op
// Security checks // Security checks
if (inputNetData.empty()) if (inputNetData.empty())
error("Empty inputNetData.", __LINE__, __FUNCTION__, __FILE__); error("Empty inputNetData.", __LINE__, __FUNCTION__, __FILE__);
for (const auto& inputNetDataI : inputNetData)
if (inputNetDataI.empty())
error("Empty inputNetData.", __LINE__, __FUNCTION__, __FILE__);
if (inputNetData.size() != scaleInputToNetInputs.size())
error("Size(inputNetData) must be same than size(scaleInputToNetInputs).",
__LINE__, __FUNCTION__, __FILE__);
// 1. Caffe deep network // Resize std::vectors if required
upImpl->spNetCaffe->forwardPass(inputNetData); // ~80ms const auto numberScales = inputNetData.size();
upImpl->mNetInput4DSizes.resize(numberScales);
while (upImpl->spCaffeNets.size() < numberScales)
addCaffeNetOnThread(upImpl->spCaffeNets, upImpl->spCaffeNetOutputBlobs, upImpl->mPoseModel,
upImpl->mGpuId, upImpl->mModelFolder, false);
// Reshape blobs if required // Process each image
// Note: In order to resize to input size to have same results as Matlab, uncomment the commented lines for (auto i = 0u ; i < inputNetData.size(); i++)
if (!vectorsAreEqual(upImpl->mNetInputSize4D, inputNetData.getSize()))
// || !vectorsAreEqual(upImpl->mScaleInputToNetInputs, scaleInputToNetInputs))
{ {
upImpl->mNetInputSize4D = inputNetData.getSize(); // 1. Caffe deep network
mNetOutputSize = Point<int>{upImpl->mNetInputSize4D[3], upImpl->mNetInputSize4D[2]}; upImpl->spCaffeNets.at(i)->forwardPass(inputNetData[i]); // ~80ms
// upImpl->mScaleInputToNetInputs = scaleInputToNetInputs;
reshapePoseExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spNmsCaffe, // Reshape blobs if required
upImpl->spBodyPartConnectorCaffe, upImpl->spCaffeNetOutputBlob, // Note: In order to resize to input size to have same results as Matlab, uncomment the commented
upImpl->spHeatMapsBlob, upImpl->spPeaksBlob, upImpl->spPoseBlob, // lines
1.f, mPoseModel); if (!vectorsAreEqual(upImpl->mNetInput4DSizes.at(i), inputNetData[i].getSize()))
// scaleInputToNetInputs[0], mPoseModel); // || !vectorsAreEqual(upImpl->mScaleInputToNetInputs, scaleInputToNetInputs))
{
upImpl->mNetInput4DSizes.at(i) = inputNetData[i].getSize();
mNetOutputSize = Point<int>{upImpl->mNetInput4DSizes[0][3],
upImpl->mNetInput4DSizes[0][2]};
// upImpl->mScaleInputToNetInputs = scaleInputToNetInputs;
reshapePoseExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spNmsCaffe,
upImpl->spBodyPartConnectorCaffe, upImpl->spCaffeNetOutputBlobs,
upImpl->spHeatMapsBlob, upImpl->spPeaksBlob, upImpl->spPoseBlob,
1.f, mPoseModel);
// scaleInputToNetInputs[i], mPoseModel);
}
} }
// 2. Resize heat maps + merge different scales // 2. Resize heat maps + merge different scales
const auto caffeNetOutputBlobs = caffeNetSharedToPtr(upImpl->spCaffeNetOutputBlobs);
const std::vector<float> floatScaleRatios(scaleInputToNetInputs.begin(), scaleInputToNetInputs.end()); const std::vector<float> floatScaleRatios(scaleInputToNetInputs.begin(), scaleInputToNetInputs.end());
upImpl->spResizeAndMergeCaffe->setScaleRatios(floatScaleRatios); upImpl->spResizeAndMergeCaffe->setScaleRatios(floatScaleRatios);
#ifdef USE_CUDA #ifdef USE_CUDA
upImpl->spResizeAndMergeCaffe->Forward_gpu({upImpl->spCaffeNetOutputBlob.get()}, // ~5ms upImpl->spResizeAndMergeCaffe->Forward_gpu(caffeNetOutputBlobs, // ~5ms
{upImpl->spHeatMapsBlob.get()}); {upImpl->spHeatMapsBlob.get()});
cudaCheck(__LINE__, __FUNCTION__, __FILE__); cudaCheck(__LINE__, __FUNCTION__, __FILE__);
#else #else
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册