提交 bc55f86c 编写于 作者: G gineshidalgo99

Net resolution can be dynamically changed

上级 00438f47
......@@ -108,8 +108,6 @@ We enumerate some of the most important flags, check the `Flags Detailed Descrip
- `--part_to_show`: Prediction channel to visualize.
- `--no_display`: Display window not opened. Useful for servers and/or to slightly speed up OpenPose.
- `--num_gpu 2 --num_gpu_start 1`: Parallelize over this number of GPUs starting by the desired device id. By default it uses all the available GPUs.
- `--net_resolution 656x368: For HD input (default value).
- `--net_resolution 496x368: For VGA input.
- `--model_pose MPI`: Model to use, affects number keypoints, speed and accuracy.
- `--logging_level 3`: Logging messages threshold, range [0,255]: 0 will output any message & 255 will output none. Current messages in the range [1-4], 1 for low priority messages and 4 for important ones.
......@@ -145,7 +143,7 @@ Each flag is divided into flag name, default value, and description.
4. OpenPose Body Pose
- DEFINE_bool(body_disable, false, "Disable body keypoint detection. Option only possible for faster (but less accurate) face keypoint detection.");
- DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), `MPI_4_layers` (15 keypoints, even faster but less accurate).");
- DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is decreased, the speed increases. For maximum speed-accuracy balance, it should keep the closest aspect ratio possible to the images or videos to be processed. Using `-1` in any of the dimensions, OP will choose the optimal resolution depending on the other value introduced by the user. E.g. the default `-1x368` is equivalent to `656x368` in 16:9 videos, e.g. full HD (1980x1080) and HD (1280x720) resolutions.");
- DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is decreased, the speed increases. For maximum speed-accuracy balance, it should keep the closest aspect ratio possible to the images or videos to be processed. Using `-1` in any of the dimensions, OP will choose the optimal resolution depending on the other value introduced by the user. E.g. the default `-1x368` is equivalent to `656x368` in 16:9 videos, e.g. full HD (1980x1080) and HD (1280x720) resolutions.");
- DEFINE_int32(scale_number, 1, "Number of scales to average.");
- DEFINE_double(scale_gap, 0.3, "Scale gap between scales. No effect unless scale_number > 1. Initial scale is always 1. If you want to change the initial scale, you actually want to multiply the `net_resolution` by your desired initial scale.");
- DEFINE_bool(heatmaps_add_parts, false, "If true, it will add the body part heatmaps to the final op::Datum::poseHeatMaps array, and analogously face & hand heatmaps to op::Datum::faceHeatMaps & op::Datum::handHeatMaps (program speed will decrease). Not required for our library, enable it only if you intend to process this information later. If more than one `add_heatmaps_X` flag is enabled, it will place then in sequential memory order: body parts + bkg + PAFs. It will follow the order on POSE_BODY_PART_MAPPING in `include/openpose/pose/poseParameters.hpp`.");
......
......@@ -113,7 +113,7 @@ OpenPose Library - Release Notes
## Current version (future OpenPose 1.2.0)
## Current version (future OpenPose 2.0.0alpha)
1. Main improvements:
1. Added IP camera support.
2. Output images can have the input size, OpenPose able to change its size for each image and not required fixed size anymore.
......@@ -127,15 +127,18 @@ OpenPose Library - Release Notes
7. COCO JSON file outputs 0 as score for non-detected keypoints.
8. Added example for OpenPose for user asynchronous output and cleaned all `tutorial_wrapper/` examples.
9. Added `-1` option for `net_resolution` in order to auto-select the best possible aspect ratio given the user input.
10. Added example to add functionality to OpenPose.
10. Net resolution can be dynamically changed (e.g. for images with different size).
11. Added example to add functionality to OpenPose.
2. Functions or parameters renamed:
1. OpenPose able to change its size and initial size:
1. OpenPose able to change its size and initial size dynamically:
1. Flag `resolution` renamed as `output_resolution`.
2. FrameDisplayer, GuiInfoAdder and Gui constructors arguments modified (gui module).
3. OpOutputToCvMat constructor removed (core module).
4. New Renders classes to split GpuRenderers from CpuRenderers.
5. Etc.
2. `CPU_ONLY` changed by `USE_CUDA` to keep format.
2. OpenPose able to change its net resolution size dynamically:
1. Changed several functions on `core/`, `pose/`, `face/`, and `hand/` modules.
3. `CPU_ONLY` changed by `USE_CUDA` to keep format.
3. Main bugs fixed:
1. Ubuntu installer script now works even if Python pip was not installed previously.
2. Flags to set first and last frame as well as jumping frames backward and forward now works on image directory reader.
......@@ -72,7 +72,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -21,7 +21,8 @@ JSON_FOLDER=../evaluation/coco_val_jsons/
OP_BIN=./build/examples/openpose/openpose.bin
# 1 scale
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558 --output_resolution "1280x720"
$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558
# --output_resolution "1280x720"
# # 3 scales
# $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25 --frame_last 3558 --output_resolution "1280x720"
......
......@@ -232,11 +232,6 @@ namespace op
// Re-scale pose if desired
if (wrapperStructPose.keypointScale != ScaleMode::InputResolution)
error("Only wrapperStructPose.keypointScale == ScaleMode::InputResolution.", __LINE__, __FUNCTION__, __FILE__);
if (finalOutputSize != producerSize)
{
auto keypointScaler = std::make_shared<KeypointScaler>(ScaleMode::InputResolution);
mPostProcessingWs.emplace_back(std::make_shared<WKeypointScaler<TDatumsPtr>>(keypointScaler));
}
mOutputWs.clear();
// Write people pose data on disk (json format)
......
......@@ -82,7 +82,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -35,7 +35,7 @@ DEFINE_string(image_path, "examples/media/COCO_val2014_00000000019
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(model_folder, "models/", "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......@@ -75,8 +75,6 @@ int openPoseTutorialPose1()
const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
// netInputSize
const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
// netOutputSize
const auto netOutputSize = netInputSize;
// poseModel
const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
// Check no contradictory flags enabled
......@@ -93,9 +91,8 @@ int openPoseTutorialPose1()
op::ScaleAndSizeExtractor scaleAndSizeExtractor(netInputSize, outputSize, FLAGS_scale_number, FLAGS_scale_gap);
op::CvMatToOpInput cvMatToOpInput;
op::CvMatToOpOutput cvMatToOpOutput;
op::PoseExtractorCaffe poseExtractorCaffe{netInputSize, netOutputSize, outputSize, FLAGS_scale_number, poseModel,
FLAGS_model_folder, FLAGS_num_gpu_start, {}, op::ScaleMode::ZeroToOne,
enableGoogleLogging};
op::PoseExtractorCaffe poseExtractorCaffe{poseModel, FLAGS_model_folder,
FLAGS_num_gpu_start, {}, op::ScaleMode::ZeroToOne, enableGoogleLogging};
op::PoseCpuRenderer poseRenderer{poseModel, (float)FLAGS_render_threshold, !FLAGS_disable_blending,
(float)FLAGS_alpha_pose};
op::OpOutputToCvMat opOutputToCvMat;
......@@ -125,7 +122,7 @@ int openPoseTutorialPose1()
poseExtractorCaffe.forwardPass(netInputArray, imageSize, scaleInputToNetInputs);
const auto poseKeypoints = poseExtractorCaffe.getPoseKeypoints();
// Step 5 - Render poseKeypoints
poseRenderer.renderPose(outputArray, poseKeypoints);
poseRenderer.renderPose(outputArray, poseKeypoints, scaleInputToOutput);
// Step 6 - OpenPose output format to cv::Mat
auto outputImage = opOutputToCvMat.formatToCvMat(outputArray);
......
......@@ -35,7 +35,7 @@ DEFINE_string(image_path, "examples/media/COCO_val2014_00000000019
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(model_folder, "models/", "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......@@ -80,8 +80,6 @@ int openPoseTutorialPose2()
const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
// netInputSize
const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
// netOutputSize
const auto netOutputSize = netInputSize;
// poseModel
const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
// Check no contradictory flags enabled
......@@ -99,8 +97,8 @@ int openPoseTutorialPose2()
op::CvMatToOpInput cvMatToOpInput;
op::CvMatToOpOutput cvMatToOpOutput;
auto poseExtractorPtr = std::make_shared<op::PoseExtractorCaffe>(
netInputSize, netOutputSize, outputSize, FLAGS_scale_number, poseModel, FLAGS_model_folder,
FLAGS_num_gpu_start, std::vector<op::HeatMapType>{}, op::ScaleMode::ZeroToOne, enableGoogleLogging
poseModel, FLAGS_model_folder, FLAGS_num_gpu_start, std::vector<op::HeatMapType>{}, op::ScaleMode::ZeroToOne,
enableGoogleLogging
);
op::PoseGpuRenderer poseGpuRenderer{poseModel, poseExtractorPtr, (float)FLAGS_render_threshold,
!FLAGS_disable_blending, (float)FLAGS_alpha_pose, (float)FLAGS_alpha_heatmap};
......
......@@ -72,7 +72,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -55,7 +55,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -55,7 +55,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -55,7 +55,7 @@ DEFINE_bool(body_disable, false, "Disable body keypoint d
" keypoint detection.");
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
"`MPI_4_layers` (15 keypoints, even faster but less accurate).");
DEFINE_string(net_resolution, "656x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
DEFINE_string(net_resolution, "-1x368", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
" decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
" closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
" any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
......
......@@ -11,9 +11,11 @@ namespace op
public:
explicit KeypointScaler(const ScaleMode scaleMode);
void scale(Array<float>& arrayToScale, const double scaleInputToOutput, const double scaleNetToOutput, const Point<int>& producerSize) const;
void scale(Array<float>& arrayToScale, const double scaleInputToOutput, const double scaleNetToOutput,
const Point<int>& producerSize) const;
void scale(std::vector<Array<float>>& arraysToScale, const double scaleInputToOutput, const double scaleNetToOutput, const Point<int>& producerSize) const;
void scale(std::vector<Array<float>>& arraysToScale, const double scaleInputToOutput,
const double scaleNetToOutput, const Point<int>& producerSize) const;
private:
const ScaleMode mScaleMode;
......
......@@ -10,13 +10,7 @@ namespace op
public:
virtual void initializationOnThread() = 0;
// Alternative a) getInputDataCpuPtr or getInputDataGpuPtr + forwardPass()
virtual float* getInputDataCpuPtr() const = 0;
virtual float* getInputDataGpuPtr() const = 0;
// Alternative b)
virtual void forwardPass(const float* const inputData = nullptr) const = 0;
virtual void forwardPass(const Array<float>& inputData) const = 0;
};
}
......
......@@ -9,22 +9,14 @@ namespace op
class OP_API NetCaffe : public Net
{
public:
NetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
const std::string& caffeTrainedModel, const int gpuId = 0,
const bool enableGoogleLogging = true,
const std::string& lastBlobName = "net_output");
NetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId = 0,
const bool enableGoogleLogging = true, const std::string& lastBlobName = "net_output");
virtual ~NetCaffe();
void initializationOnThread();
// Alternative a) getInputDataCpuPtr or getInputDataGpuPtr + forwardPass
float* getInputDataCpuPtr() const;
float* getInputDataGpuPtr() const;
// Alternative b)
void forwardPass(const float* const inputNetData = nullptr) const;
void forwardPass(const Array<float>& inputNetData) const;
boost::shared_ptr<caffe::Blob<float>> getOutputBlob() const;
......
......@@ -18,6 +18,7 @@ namespace op
const float alphaHeatMap = POSE_DEFAULT_ALPHA_HEAT_MAP);
std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
const float scaleInputToOutput,
const float scaleNetToOutput = -1.f);
private:
......
......@@ -12,7 +12,7 @@ namespace op
class OP_API PoseExtractor
{
public:
PoseExtractor(const Point<int>& netOutputSize, const Point<int>& outputSize, const PoseModel poseModel,
PoseExtractor(const PoseModel poseModel,
const std::vector<HeatMapType>& heatMapTypes = {},
const ScaleMode heatMapScale = ScaleMode::ZeroToOne);
......@@ -45,8 +45,7 @@ namespace op
protected:
const PoseModel mPoseModel;
const Point<int> mNetOutputSize;
const Point<int> mOutputSize;
Point<int> mNetOutputSize;
Array<float> mPoseKeypoints;
float mScaleNetToOutput;
......
......@@ -10,9 +10,7 @@ namespace op
class OP_API PoseExtractorCaffe : public PoseExtractor
{
public:
PoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
const Point<int>& outputSize, const int scaleNumber, const PoseModel poseModel,
const std::string& modelFolder, const int gpuId,
PoseExtractorCaffe(const PoseModel poseModel, const std::string& modelFolder, const int gpuId,
const std::vector<HeatMapType>& heatMapTypes = {},
const ScaleMode heatMapScale = ScaleMode::ZeroToOne,
const bool enableGoogleLogging = true);
......@@ -22,7 +20,7 @@ namespace op
void netInitializationOnThread();
void forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize,
const std::vector<double>& scaleRatios = {1.f});
const std::vector<double>& scaleInputToNetInputs = {1.f});
const float* getHeatMapCpuConstPtr() const;
......
......@@ -24,6 +24,7 @@ namespace op
void initializationOnThread();
std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
const float scaleInputToOutput,
const float scaleNetToOutput = -1.f);
private:
......
......@@ -13,7 +13,9 @@ namespace op
virtual void initializationOnThread(){};
virtual std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints, const float scaleNetToOutput = -1.f) = 0;
virtual std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
const float scaleInputToOutput,
const float scaleNetToOutput = -1.f) = 0;
protected:
const PoseModel mPoseModel;
......
......@@ -57,7 +57,9 @@ namespace op
const auto profilerKey = Profiler::timerInit(__LINE__, __FUNCTION__, __FILE__);
// Render people pose
for (auto& tDatum : *tDatums)
tDatum.elementRendered = spPoseRenderer->renderPose(tDatum.outputData, tDatum.poseKeypoints, (float)tDatum.scaleNetToOutput);
tDatum.elementRendered = spPoseRenderer->renderPose(tDatum.outputData, tDatum.poseKeypoints,
(float)tDatum.scaleInputToOutput,
(float)tDatum.scaleNetToOutput);
// Profiling speed
Profiler::timerEnd(profilerKey);
Profiler::printAveragedTimeMsOnIterationX(profilerKey, __LINE__, __FUNCTION__, __FILE__);
......
......@@ -576,28 +576,6 @@ namespace op
if (finalOutputSize.x == -1 || finalOutputSize.y == -1)
finalOutputSize = producerSize;
}
// Set poseNetInputSize if -1 used
Point<int> poseNetInputSize = wrapperStructPose.netInputSize;
if (poseNetInputSize.x == -1 && poseNetInputSize.y == -1)
error("Net input size cannot be -1x-1.", __LINE__, __FUNCTION__, __FILE__);
else if (poseNetInputSize.x == -1 || poseNetInputSize.y == -1)
{
if (producerSize.x <= 0 || producerSize.y <= 0)
error("Net resolution cannot be -1 for image_dir, only for video, webcam, and IP camera.",
__LINE__, __FUNCTION__, __FILE__);
else if (poseNetInputSize.x == -1)
poseNetInputSize.x = 16 * intRound(
poseNetInputSize.y * producerSize.x / (float) producerSize.y / 16.f
);
else // if (poseNetInputSize.y == -1)
poseNetInputSize.y = 16 * intRound(
poseNetInputSize.x * producerSize.y / (float) producerSize.x / 16.f
);
}
// Security checks
if ((poseNetInputSize.x > 0 && poseNetInputSize.x % 16 != 0)
|| (poseNetInputSize.y > 0 && poseNetInputSize.y % 16 != 0))
error("Net input resolution must be multiples of 16.", __LINE__, __FUNCTION__, __FILE__);
// Producer
if (wrapperStructInput.producerSharedPtr != nullptr)
......@@ -613,7 +591,8 @@ namespace op
// Get input scales and sizes
const auto scaleAndSizeExtractor = std::make_shared<ScaleAndSizeExtractor>(
poseNetInputSize, finalOutputSize, wrapperStructPose.scalesNumber, wrapperStructPose.scaleGap
wrapperStructPose.netInputSize, finalOutputSize, wrapperStructPose.scalesNumber,
wrapperStructPose.scaleGap
);
spWScaleAndSizeExtractor = std::make_shared<WScaleAndSizeExtractor<TDatumsPtr>>(scaleAndSizeExtractor);
......@@ -627,7 +606,6 @@ namespace op
}
// Pose estimators & renderers
const Point<int>& poseNetOutputSize = poseNetInputSize;
std::vector<std::shared_ptr<PoseExtractor>> poseExtractors;
std::vector<std::shared_ptr<PoseGpuRenderer>> poseGpuRenderers;
std::shared_ptr<PoseCpuRenderer> poseCpuRenderer;
......@@ -639,7 +617,6 @@ namespace op
// Pose estimators
for (auto gpuId = 0; gpuId < gpuNumber; gpuId++)
poseExtractors.emplace_back(std::make_shared<PoseExtractorCaffe>(
poseNetInputSize, poseNetOutputSize, finalOutputSize, wrapperStructPose.scalesNumber,
wrapperStructPose.poseModel, modelFolder, gpuId + gpuNumberStart,
wrapperStructPose.heatMapTypes, wrapperStructPose.heatMapScale,
wrapperStructPose.enableGoogleLogging
......@@ -846,14 +823,14 @@ namespace op
mPostProcessingWs.emplace_back(std::make_shared<WOpOutputToCvMat<TDatumsPtr>>(opOutputToCvMat));
}
// Re-scale pose if desired
// If desired scale is not the current output
if (wrapperStructPose.keypointScale != ScaleMode::OutputResolution
// and desired scale is not input when size(output) = size(input)
&& !(wrapperStructPose.keypointScale == ScaleMode::InputResolution &&
// If desired scale is not the current input
if (wrapperStructPose.keypointScale != ScaleMode::InputResolution
// and desired scale is not output when size(input) = size(output)
&& !(wrapperStructPose.keypointScale == ScaleMode::OutputResolution &&
(finalOutputSize == producerSize || finalOutputSize.x <= 0 || finalOutputSize.y <= 0))
// and desired scale is not net output when size(output) = size(net output)
// and desired scale is not net output when size(input) = size(net output)
&& !(wrapperStructPose.keypointScale == ScaleMode::NetOutputResolution
&& finalOutputSize == poseNetOutputSize))
&& producerSize == wrapperStructPose.netInputSize))
{
// Then we must rescale the keypoints
auto keypointScaler = std::make_shared<KeypointScaler>(wrapperStructPose.keypointScale);
......
......@@ -27,31 +27,33 @@ namespace op
{
try
{
if (mScaleMode != ScaleMode::OutputResolution)
if (mScaleMode != ScaleMode::InputResolution)
{
// InputResolution
if (mScaleMode == ScaleMode::InputResolution)
// OutputResolution
if (mScaleMode == ScaleMode::OutputResolution)
{
for (auto& arrayToScale : arrayToScalesToScale)
scaleKeypoints(arrayToScale, float(1./scaleInputToOutput));
scaleKeypoints(arrayToScale, float(scaleInputToOutput));
}
// NetOutputResolution
else if (mScaleMode == ScaleMode::NetOutputResolution)
{
for (auto& arrayToScale : arrayToScalesToScale)
scaleKeypoints(arrayToScale, float(1./scaleNetToOutput));
}
// [0,1]
else if (mScaleMode == ScaleMode::ZeroToOne)
{
const auto scale = float(1./scaleInputToOutput);
const auto scaleX = scale / ((float)producerSize.x - 1.f);
const auto scaleY = scale / ((float)producerSize.y - 1.f);
const auto scaleX = 1.f / ((float)producerSize.x - 1.f);
const auto scaleY = 1.f / ((float)producerSize.y - 1.f);
for (auto& arrayToScale : arrayToScalesToScale)
scaleKeypoints(arrayToScale, scaleX, scaleY);
}
// [-1,1]
else if (mScaleMode == ScaleMode::PlusMinusOne)
{
const auto scale = float(2./scaleInputToOutput);
const auto scaleX = (scale / ((float)producerSize.x - 1.f));
const auto scaleY = (scale / ((float)producerSize.y - 1.f));
const auto scaleX = (2.f / ((float)producerSize.x - 1.f));
const auto scaleY = (2.f / ((float)producerSize.y - 1.f));
const auto offset = -1.f;
for (auto& arrayToScale : arrayToScalesToScale)
scaleKeypoints(arrayToScale, scaleX, scaleY, offset, offset);
......
......@@ -19,26 +19,21 @@ namespace op
#ifdef USE_CAFFE
// Init with constructor
const int mGpuId;
const std::array<int, 4> mNetInputSize4D;
const unsigned long mNetInputMemory;
const std::string mCaffeProto;
const std::string mCaffeTrainedModel;
const std::string mLastBlobName;
std::vector<int> mNetInputSize4D;
// Init with thread
std::unique_ptr<caffe::Net<float>> upCaffeNet;
boost::shared_ptr<caffe::Blob<float>> spOutputBlob;
ImplNetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
const std::string& caffeTrainedModel, const int gpuId,
ImplNetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId,
const bool enableGoogleLogging, const std::string& lastBlobName) :
mGpuId{gpuId},
// mNetInputSize4D{netInputSize4D}, // This line crashes on some devices with old G++
mNetInputSize4D{netInputSize4D[0], netInputSize4D[1], netInputSize4D[2], netInputSize4D[3]},
mNetInputMemory{sizeof(float) * std::accumulate(mNetInputSize4D.begin(), mNetInputSize4D.end(), 1,
std::multiplies<int>())},
mCaffeProto{caffeProto},
mCaffeTrainedModel{caffeTrainedModel},
mLastBlobName{lastBlobName}
mLastBlobName{lastBlobName},
mNetInputSize4D{0,0,0,0}
{
const std::string message{".\nPossible causes:\n\t1. Not downloading the OpenPose trained models."
"\n\t2. Not running OpenPose from the same directory where the `model`"
......@@ -62,11 +57,40 @@ namespace op
#endif
};
NetCaffe::NetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
const std::string& caffeTrainedModel, const int gpuId,
#ifdef USE_CAFFE
inline void reshapeNetCaffe(caffe::Net<float>* caffeNet, const std::vector<int>& dimensions)
{
try
{
caffeNet->blobs()[0]->Reshape(dimensions);
caffeNet->Reshape();
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
inline bool requiredReshapeNetCaffe(const std::vector<int>& dimensionsA, const std::vector<int>& dimensionsB)
{
try
{
return (dimensionsA[0] != dimensionsB[0] || dimensionsA[1] != dimensionsB[1]
|| dimensionsA[2] != dimensionsB[2] || dimensionsA[3] != dimensionsB[3]);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return false;
}
}
#endif
NetCaffe::NetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId,
const bool enableGoogleLogging, const std::string& lastBlobName)
#ifdef USE_CAFFE
: upImpl{new ImplNetCaffe{netInputSize4D, caffeProto, caffeTrainedModel, gpuId, enableGoogleLogging,
: upImpl{new ImplNetCaffe{caffeProto, caffeTrainedModel, gpuId, enableGoogleLogging,
lastBlobName}}
#endif
{
......@@ -98,13 +122,14 @@ namespace op
{
#ifdef USE_CAFFE
// Initialize net
caffe::Caffe::set_mode(caffe::Caffe::GPU);
caffe::Caffe::SetDevice(upImpl->mGpuId);
#ifdef USE_CUDA
caffe::Caffe::set_mode(caffe::Caffe::GPU);
caffe::Caffe::SetDevice(upImpl->mGpuId);
#else
caffe::Caffe::set_mode(caffe::Caffe::CPU);
#endif
upImpl->upCaffeNet.reset(new caffe::Net<float>{upImpl->mCaffeProto, caffe::TEST});
upImpl->upCaffeNet->CopyTrainedLayersFrom(upImpl->mCaffeTrainedModel);
upImpl->upCaffeNet->blobs()[0]->Reshape({upImpl->mNetInputSize4D[0], upImpl->mNetInputSize4D[1],
upImpl->mNetInputSize4D[2], upImpl->mNetInputSize4D[3]});
upImpl->upCaffeNet->Reshape();
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Set spOutputBlob
upImpl->spOutputBlob = upImpl->upCaffeNet->blob_by_name(upImpl->mLastBlobName);
......@@ -120,58 +145,32 @@ namespace op
}
}
float* NetCaffe::getInputDataCpuPtr() const
{
try
{
#ifdef USE_CAFFE
return upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
#else
return nullptr;
#endif
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return nullptr;
}
}
float* NetCaffe::getInputDataGpuPtr() const
{
try
{
#ifdef USE_CAFFE
return upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
#else
return nullptr;
#endif
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return nullptr;
}
}
void NetCaffe::forwardPass(const float* const inputData) const
void NetCaffe::forwardPass(const Array<float>& inputData) const
{
try
{
#ifdef USE_CAFFE
// Copy frame data to GPU memory
if (inputData != nullptr)
// Security checks
if (inputData.empty())
error("The Array inputData cannot be empty.", __LINE__, __FUNCTION__, __FILE__);
if (inputData.getNumberDimensions() != 4 || inputData.getSize(1) != 3)
error("The Array inputData must have 4 dimensions: [batch size, 3 (RGB), height, width].",
__LINE__, __FUNCTION__, __FILE__);
// Reshape Caffe net if required
if (requiredReshapeNetCaffe(upImpl->mNetInputSize4D, inputData.getSize()))
{
#ifdef USE_CUDA
auto* gpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
cudaMemcpy(gpuImagePtr, inputData, upImpl->mNetInputMemory, cudaMemcpyHostToDevice);
#else
auto* cpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
std::copy(inputData,
inputData + upImpl->mNetInputMemory/sizeof(float),
cpuImagePtr);
#endif
upImpl->mNetInputSize4D = inputData.getSize();
reshapeNetCaffe(upImpl->upCaffeNet.get(), inputData.getSize());
}
// Copy frame data to GPU memory
#ifdef USE_CUDA
auto* gpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
cudaMemcpy(gpuImagePtr, inputData.getConstPtr(), inputData.getVolume() * sizeof(float),
cudaMemcpyHostToDevice);
#else
auto* cpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
std::copy(inputData.getConstPtr(), inputData.getConstPtr() + inputData.getVolume(), cpuImagePtr);
#endif
// Perform deep network forward pass
upImpl->upCaffeNet->ForwardFrom(0);
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
......
......@@ -7,8 +7,8 @@ namespace op
const auto THREADS_PER_BLOCK_1D = 16u;
template <typename T>
__global__ void resizeKernel(T* targetPtr, const T* const sourcePtr, const int sourceWidth, const int sourceHeight, const int targetWidth,
const int targetHeight)
__global__ void resizeKernel(T* targetPtr, const T* const sourcePtr, const int sourceWidth, const int sourceHeight,
const int targetWidth, const int targetHeight)
{
const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;
......@@ -20,13 +20,15 @@ namespace op
const T xSource = (x + 0.5f) / scaleWidth - 0.5f;
const T ySource = (y + 0.5f) / scaleHeight - 0.5f;
targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight, sourceWidth);
targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight,
sourceWidth);
}
}
template <typename T>
__global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const int sourceNumOffset, const int num, const T* scaleInputToNetInputs,
const int sourceWidth, const int sourceHeight, const int targetWidth, const int targetHeight)
__global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const int sourceNumOffset,
const int num, const T* scaleInputToNetInputs, const int sourceWidth,
const int sourceHeight, const int targetWidth, const int targetHeight)
{
const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;
......@@ -70,7 +72,8 @@ namespace op
const auto targetWidth = targetSize[3];
const dim3 threadsPerBlock{THREADS_PER_BLOCK_1D, THREADS_PER_BLOCK_1D};
const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x), getNumberCudaBlocks(targetHeight, threadsPerBlock.y)};
const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x),
getNumberCudaBlocks(targetHeight, threadsPerBlock.y)};
const auto sourceChannelOffset = sourceHeight * sourceWidth;
const auto targetChannelOffset = targetWidth * targetHeight;
......@@ -85,7 +88,8 @@ namespace op
const auto offset = offsetBase + c;
resizeKernel<<<numBlocks, threadsPerBlock>>>(targetPtr + offset * targetChannelOffset,
sourcePtr + offset * sourceChannelOffset,
sourceWidth, sourceHeight, targetWidth, targetHeight);
sourceWidth, sourceHeight, targetWidth,
targetHeight);
}
}
}
......@@ -94,20 +98,25 @@ namespace op
{
// If scale_number > 1 --> scaleInputToNetInputs must be set
if (scaleInputToNetInputs.size() != num)
error("The scale ratios size must be equal than the number of scales.", __LINE__, __FUNCTION__, __FILE__);
error("The scale ratios size must be equal than the number of scales.",
__LINE__, __FUNCTION__, __FILE__);
const auto maxScales = 10;
if (scaleInputToNetInputs.size() > maxScales)
error("The maximum number of scales is " + std::to_string(maxScales) + ".", __LINE__, __FUNCTION__, __FILE__);
error("The maximum number of scales is " + std::to_string(maxScales) + ".",
__LINE__, __FUNCTION__, __FILE__);
// Copy scaleInputToNetInputs
T* scaleInputToNetInputsPtr;
cudaMalloc((void**)&scaleInputToNetInputsPtr, maxScales * sizeof(T));
cudaMemcpy(scaleInputToNetInputsPtr, scaleInputToNetInputs.data(), scaleInputToNetInputs.size() * sizeof(T), cudaMemcpyHostToDevice);
cudaMemcpy(scaleInputToNetInputsPtr, scaleInputToNetInputs.data(),
scaleInputToNetInputs.size() * sizeof(T), cudaMemcpyHostToDevice);
// Perform resize + merging
const auto sourceNumOffset = channels * sourceChannelOffset;
for (auto c = 0 ; c < channels ; c++)
resizeKernelAndMerge<<<numBlocks, threadsPerBlock>>>(targetPtr + c * targetChannelOffset,
sourcePtr + c * sourceChannelOffset, sourceNumOffset,
num, scaleInputToNetInputsPtr, sourceWidth, sourceHeight, targetWidth, targetHeight);
sourcePtr + c * sourceChannelOffset,
sourceNumOffset, num,
scaleInputToNetInputsPtr, sourceWidth,
sourceHeight, targetWidth, targetHeight);
// Free memory
cudaFree(scaleInputToNetInputsPtr);
}
......@@ -120,8 +129,10 @@ namespace op
}
}
template void resizeAndMergeGpu(float* targetPtr, const float* const sourcePtr, const std::array<int, 4>& targetSize,
const std::array<int, 4>& sourceSize, const std::vector<float>& scaleInputToNetInputs);
template void resizeAndMergeGpu(double* targetPtr, const double* const sourcePtr, const std::array<int, 4>& targetSize,
const std::array<int, 4>& sourceSize, const std::vector<double>& scaleInputToNetInputs);
template void resizeAndMergeGpu(float* targetPtr, const float* const sourcePtr,
const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize,
const std::vector<float>& scaleInputToNetInputs);
template void resizeAndMergeGpu(double* targetPtr, const double* const sourcePtr,
const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize,
const std::vector<double>& scaleInputToNetInputs);
}
......@@ -9,7 +9,7 @@ namespace op
{
template <typename T>
ResizeAndMergeCaffe<T>::ResizeAndMergeCaffe() :
mScaleRatios{1}
mScaleRatios{T(1)}
{
try
{
......@@ -158,7 +158,8 @@ namespace op
}
template <typename T>
void ResizeAndMergeCaffe<T>::Backward_gpu(const std::vector<caffe::Blob<T>*>& top, const std::vector<bool>& propagate_down,
void ResizeAndMergeCaffe<T>::Backward_gpu(const std::vector<caffe::Blob<T>*>& top,
const std::vector<bool>& propagate_down,
const std::vector<caffe::Blob<T>*>& bottom)
{
try
......
......@@ -37,8 +37,25 @@ namespace op
// Security checks
if (inputResolution.area() <= 0)
error("Wrong input element (empty cvInputData).", __LINE__, __FUNCTION__, __FILE__);
// scaleRatios & sizes - Reescale keeping aspect ratio
std::vector<double> scaleRatios(mScaleNumber, 1.f);
// Set poseNetInputSize
auto poseNetInputSize = mNetInputResolution;
if (poseNetInputSize.x <= 0 || poseNetInputSize.y <= 0)
{
// Security checks
if (poseNetInputSize.x <= 0 && poseNetInputSize.y <= 0)
error("Only 1 of the dimensions of net input resolution can be <= 0.",
__LINE__, __FUNCTION__, __FILE__);
if (poseNetInputSize.x <= 0)
poseNetInputSize.x = 16 * intRound(
poseNetInputSize.y * inputResolution.x / (float) inputResolution.y / 16.f
);
else // if (poseNetInputSize.y <= 0)
poseNetInputSize.y = 16 * intRound(
poseNetInputSize.x * inputResolution.y / (float) inputResolution.x / 16.f
);
}
// scaleInputToNetInputs & sizes - Reescale keeping aspect ratio
std::vector<double> scaleInputToNetInputs(mScaleNumber, 1.f);
std::vector<Point<int>> sizes(mScaleNumber);
for (auto i = 0; i < mScaleNumber; i++)
{
......@@ -47,13 +64,13 @@ namespace op
error("All scales must be in the range [0, 1], i.e. 0 <= 1-scale_number*scale_gap <= 1",
__LINE__, __FUNCTION__, __FILE__);
const auto targetWidth = fastTruncate(intRound(mNetInputResolution.x * currentScale) / 16 * 16, 1,
mNetInputResolution.x);
const auto targetHeight = fastTruncate(intRound(mNetInputResolution.y * currentScale) / 16 * 16, 1,
mNetInputResolution.y);
const auto targetWidth = fastTruncate(intRound(poseNetInputSize.x * currentScale) / 16 * 16, 1,
poseNetInputSize.x);
const auto targetHeight = fastTruncate(intRound(poseNetInputSize.y * currentScale) / 16 * 16, 1,
poseNetInputSize.y);
const Point<int> targetSize{targetWidth, targetHeight};
scaleRatios[i] = resizeGetScaleFactor(inputResolution, targetSize);
sizes[i] = mNetInputResolution;
scaleInputToNetInputs[i] = resizeGetScaleFactor(inputResolution, targetSize);
sizes[i] = poseNetInputSize;
}
// scaleInputToOutput - Scale between input and desired output size
Point<int> outputResolution;
......@@ -71,7 +88,7 @@ namespace op
scaleInputToOutput = 1.;
}
// Return result
return std::make_tuple(scaleRatios, sizes, scaleInputToOutput, outputResolution);
return std::make_tuple(scaleInputToNetInputs, sizes, scaleInputToOutput, outputResolution);
}
catch (const std::exception& e)
{
......
......@@ -6,7 +6,7 @@ namespace op
FaceExtractor::FaceExtractor(const Point<int>& netInputSize, const Point<int>& netOutputSize,
const std::vector<HeatMapType>& heatMapTypes, const ScaleMode heatMapScale) :
mNetOutputSize{netOutputSize},
mFaceImageCrop{mNetOutputSize.area()*3},
mFaceImageCrop{{1, 3, mNetOutputSize.y, mNetOutputSize.x}},
mHeatMapScaleMode{heatMapScale},
mHeatMapTypes{heatMapTypes}
{
......
......@@ -16,6 +16,7 @@ namespace op
struct FaceExtractorCaffe::ImplFaceExtractorCaffe
{
#if defined USE_CAFFE && defined USE_CUDA
bool netInitialized;
std::shared_ptr<NetCaffe> spNetCaffe;
std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
std::shared_ptr<MaximumCaffe<float>> spMaximumCaffe;
......@@ -24,11 +25,9 @@ namespace op
std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob;
std::shared_ptr<caffe::Blob<float>> spPeaksBlob;
ImplFaceExtractorCaffe(const Point<int>& netOutputSize,
const std::string& modelFolder, const int gpuId,
const bool enableGoogleLogging) :
spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, netOutputSize.y, netOutputSize.x},
modelFolder + FACE_PROTOTXT, modelFolder + FACE_TRAINED_MODEL,
ImplFaceExtractorCaffe(const std::string& modelFolder, const int gpuId, const bool enableGoogleLogging) :
netInitialized{false},
spNetCaffe{std::make_shared<NetCaffe>(modelFolder + FACE_PROTOTXT, modelFolder + FACE_TRAINED_MODEL,
gpuId, enableGoogleLogging)},
spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
spMaximumCaffe{std::make_shared<MaximumCaffe<float>>()}
......@@ -69,6 +68,29 @@ namespace op
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
inline void reshapeFaceExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
std::shared_ptr<MaximumCaffe<float>>& maximumCaffe,
boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
std::shared_ptr<caffe::Blob<float>>& peaksBlob)
{
try
{
// HeatMaps extractor blob and layer
const bool mergeFirstDimension = true;
resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
FACE_CCN_DECREASE_FACTOR, mergeFirstDimension);
// Pose extractor blob and layer
maximumCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()});
// Cuda check
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
#endif
FaceExtractorCaffe::FaceExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
......@@ -77,7 +99,7 @@ namespace op
const ScaleMode heatMapScale, const bool enableGoogleLogging) :
FaceExtractor{netInputSize, netOutputSize, heatMapTypes, heatMapScale}
#if defined USE_CAFFE && defined USE_CUDA
, upImpl{new ImplFaceExtractorCaffe{mNetOutputSize, modelFolder, gpuId, enableGoogleLogging}}
, upImpl{new ImplFaceExtractorCaffe{modelFolder, gpuId, enableGoogleLogging}}
#endif
{
try
......@@ -110,20 +132,13 @@ namespace op
#if defined USE_CAFFE && defined USE_CUDA
// Logging
log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
// Caffe net
// Initialize Caffe net
upImpl->spNetCaffe->initializationOnThread();
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// HeatMaps extractor blob and layer
// Initialize blobs
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
const bool mergeFirstDimension = true;
upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()},
{upImpl->spHeatMapsBlob.get()},
FACE_CCN_DECREASE_FACTOR, mergeFirstDimension);
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Pose extractor blob and layer
upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spMaximumCaffe->Reshape({upImpl->spHeatMapsBlob.get()}, {upImpl->spPeaksBlob.get()});
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Logging
log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
......@@ -207,11 +222,16 @@ namespace op
// cv::imshow("faceImage" + std::to_string(person), faceImage);
// 1. Caffe deep network
auto* inputDataGpuPtr = upImpl->spNetCaffe->getInputDataGpuPtr();
cudaMemcpy(inputDataGpuPtr, mFaceImageCrop.getPtr(),
mNetOutputSize.area() * 3 * sizeof(float),
cudaMemcpyHostToDevice);
upImpl->spNetCaffe->forwardPass();
upImpl->spNetCaffe->forwardPass(mFaceImageCrop);
// Reshape blobs
if (!upImpl->netInitialized)
{
upImpl->netInitialized = true;
reshapeFaceExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spMaximumCaffe,
upImpl->spCaffeNetOutputBlob, upImpl->spHeatMapsBlob,
upImpl->spPeaksBlob);
}
// 2. Resize heat maps + merge different scales
#ifdef USE_CUDA
......
......@@ -75,9 +75,9 @@ namespace op
for (auto bodyPart = 0u ; bodyPart < indexesInCocoOrder.size() ; bodyPart++)
{
const auto finalIndex = 3*(person*numberBodyParts + indexesInCocoOrder.at(bodyPart));
mJsonOfstream.plainText(poseKeypoints[finalIndex]);
mJsonOfstream.plainText(poseKeypoints[finalIndex] + 0.5f);
mJsonOfstream.comma();
mJsonOfstream.plainText(poseKeypoints[finalIndex+1]);
mJsonOfstream.plainText(poseKeypoints[finalIndex+1] + 0.5f);
mJsonOfstream.comma();
mJsonOfstream.plainText((poseKeypoints[finalIndex+2] > 0.f ? 1 : 0));
if (bodyPart < indexesInCocoOrder.size() - 1u)
......
......@@ -8,7 +8,7 @@ namespace op
const std::vector<HeatMapType>& heatMapTypes, const ScaleMode heatMapScale) :
mMultiScaleNumberAndRange{std::make_pair(numberScales, rangeScales)},
mNetOutputSize{netOutputSize},
mHandImageCrop{mNetOutputSize.area()*3},
mHandImageCrop{{1, 3, mNetOutputSize.y, mNetOutputSize.x}},
mHeatMapScaleMode{heatMapScale},
mHeatMapTypes{heatMapTypes}
{
......
......@@ -17,6 +17,7 @@ namespace op
struct HandExtractorCaffe::ImplHandExtractorCaffe
{
#if defined USE_CAFFE && defined USE_CUDA
bool netInitialized;
std::shared_ptr<NetCaffe> spNetCaffe;
std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
std::shared_ptr<MaximumCaffe<float>> spMaximumCaffe;
......@@ -25,11 +26,10 @@ namespace op
std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob;
std::shared_ptr<caffe::Blob<float>> spPeaksBlob;
ImplHandExtractorCaffe(const Point<int>& netOutputSize,
const std::string& modelFolder, const int gpuId,
ImplHandExtractorCaffe(const std::string& modelFolder, const int gpuId,
const bool enableGoogleLogging) :
spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, netOutputSize.y, netOutputSize.x},
modelFolder + HAND_PROTOTXT, modelFolder + HAND_TRAINED_MODEL,
netInitialized{false},
spNetCaffe{std::make_shared<NetCaffe>(modelFolder + HAND_PROTOTXT, modelFolder + HAND_TRAINED_MODEL,
gpuId, enableGoogleLogging)},
spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
spMaximumCaffe{std::make_shared<MaximumCaffe<float>>()}
......@@ -154,6 +154,29 @@ namespace op
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
inline void reshapeFaceExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
std::shared_ptr<MaximumCaffe<float>>& maximumCaffe,
boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
std::shared_ptr<caffe::Blob<float>>& peaksBlob)
{
try
{
// HeatMaps extractor blob and layer
const bool mergeFirstDimension = true;
resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
HAND_CCN_DECREASE_FACTOR, mergeFirstDimension);
// Pose extractor blob and layer
maximumCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()});
// Cuda check
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
#endif
HandExtractorCaffe::HandExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
......@@ -164,7 +187,7 @@ namespace op
const bool enableGoogleLogging) :
HandExtractor{netInputSize, netOutputSize, numberScales, rangeScales, heatMapTypes, heatMapScale}
#if defined USE_CAFFE && defined USE_CUDA
, upImpl{new ImplHandExtractorCaffe{mNetOutputSize, modelFolder, gpuId, enableGoogleLogging}}
, upImpl{new ImplHandExtractorCaffe{modelFolder, gpuId, enableGoogleLogging}}
#endif
{
try
......@@ -199,20 +222,13 @@ namespace op
#if defined USE_CAFFE && defined USE_CUDA
// Logging
log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
// Caffe net
// Initialize Caffe net
upImpl->spNetCaffe->initializationOnThread();
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// HeatMaps extractor blob and layer
// Initialize blobs
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
const bool mergeFirstDimension = true;
upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()},
{upImpl->spHeatMapsBlob.get()},
HAND_CCN_DECREASE_FACTOR, mergeFirstDimension);
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Pose extractor blob and layer
upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spMaximumCaffe->Reshape({upImpl->spHeatMapsBlob.get()}, {upImpl->spPeaksBlob.get()});
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Logging
log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
......@@ -369,12 +385,17 @@ namespace op
try
{
#if defined USE_CAFFE && defined USE_CUDA
// Deep net
// 1. Caffe deep network
auto* inputDataGpuPtr = upImpl->spNetCaffe->getInputDataGpuPtr();
cudaMemcpy(inputDataGpuPtr, mHandImageCrop.getConstPtr(), mNetOutputSize.area() * 3 * sizeof(float),
cudaMemcpyHostToDevice);
upImpl->spNetCaffe->forwardPass();
// 1. Deep net
upImpl->spNetCaffe->forwardPass(mHandImageCrop);
// Reshape blobs
if (!upImpl->netInitialized)
{
upImpl->netInitialized = true;
reshapeFaceExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spMaximumCaffe,
upImpl->spCaffeNetOutputBlob, upImpl->spHeatMapsBlob,
upImpl->spPeaksBlob);
}
// 2. Resize heat maps + merge different scales
#ifdef USE_CUDA
......
......@@ -6,9 +6,10 @@
namespace op
{
template <typename T>
void connectBodyPartsCpu(Array<T>& poseKeypoints, const T* const heatMapPtr, const T* const peaksPtr, const PoseModel poseModel,
const Point<int>& heatMapSize, const int maxPeaks, const int interMinAboveThreshold,
const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor)
void connectBodyPartsCpu(Array<T>& poseKeypoints, const T* const heatMapPtr, const T* const peaksPtr,
const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks,
const int interMinAboveThreshold, const T interThreshold, const int minSubsetCnt,
const T minSubsetScore, const T scaleFactor)
{
try
{
......@@ -18,7 +19,8 @@ namespace op
const auto numberBodyParts = POSE_NUMBER_BODY_PARTS[(int)poseModel];
const auto numberBodyPartPairs = bodyPartPairs.size() / 2;
std::vector<std::pair<std::vector<int>, double>> subset; // Vector<int> = Each body part + body parts counter; double = subsetScore
// Vector<int> = Each body part + body parts counter; double = subsetScore
std::vector<std::pair<std::vector<int>, double>> subset;
const auto subsetCounterIndex = numberBodyParts;
const auto subsetSize = numberBodyParts+1;
......@@ -59,9 +61,12 @@ namespace op
if (!num)
{
std::vector<int> rowVector(subsetSize, 0);
rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2; //store the index
rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
const auto subsetScore = candidateB[i*3+2]; //second last number in each row is the total score
// Store the index
rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2;
// Last number in each row is the parts number of that person
rowVector[subsetCounterIndex] = 1;
const auto subsetScore = candidateB[i*3+2];
// Second last number in each row is the total score
subset.emplace_back(std::make_pair(rowVector, subsetScore));
}
}
......@@ -71,14 +76,18 @@ namespace op
for (auto i = 1; i <= nB; i++)
{
std::vector<int> rowVector(subsetSize, 0);
rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2; //store the index
rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
const auto subsetScore = candidateB[i*3+2]; //second last number in each row is the total score
// Store the index
rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2;
// Last number in each row is the parts number of that person
rowVector[subsetCounterIndex] = 1;
// Second last number in each row is the total score
const auto subsetScore = candidateB[i*3+2];
subset.emplace_back(std::make_pair(rowVector, subsetScore));
}
}
else
error("Unknown model, cast to int = " + std::to_string((int)poseModel), __LINE__, __FUNCTION__, __FILE__);
error("Unknown model, cast to int = " + std::to_string((int)poseModel),
__LINE__, __FUNCTION__, __FILE__);
}
else // if (nA != 0 && nB == 0)
{
......@@ -101,9 +110,12 @@ namespace op
if (!num)
{
std::vector<int> rowVector(subsetSize, 0);
rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2; //store the index
rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
const auto subsetScore = candidateA[i*3+2]; //second last number in each row is the total score
// Store the index
rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2;
// Last number in each row is the parts number of that person
rowVector[subsetCounterIndex] = 1;
// Second last number in each row is the total score
const auto subsetScore = candidateA[i*3+2];
subset.emplace_back(std::make_pair(rowVector, subsetScore));
}
}
......@@ -113,14 +125,18 @@ namespace op
for (auto i = 1; i <= nA; i++)
{
std::vector<int> rowVector(subsetSize, 0);
rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2; //store the index
rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
const auto subsetScore = candidateA[i*3+2]; //second last number in each row is the total score
// Store the index
rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2;
// Last number in each row is the parts number of that person
rowVector[subsetCounterIndex] = 1;
// Second last number in each row is the total score
const auto subsetScore = candidateA[i*3+2];
subset.emplace_back(std::make_pair(rowVector, subsetScore));
}
}
else
error("Unknown model, cast to int = " + std::to_string((int)poseModel), __LINE__, __FUNCTION__, __FILE__);
error("Unknown model, cast to int = " + std::to_string((int)poseModel),
__LINE__, __FUNCTION__, __FILE__);
}
}
else // if (nA != 0 && nB != 0)
......@@ -216,9 +232,10 @@ namespace op
}
}
// Add ears connections (in case person is looking to opposite direction to camera)
else if (((poseModel == PoseModel::COCO_18 || poseModel == PoseModel::BODY_18) && (pairIndex==17 || pairIndex==18))
|| (poseModel == PoseModel::BODY_19 && (pairIndex==18 || pairIndex==19))
|| (poseModel == PoseModel::BODY_23 && (pairIndex==22 || pairIndex==23)))
else if (((poseModel == PoseModel::COCO_18
|| poseModel == PoseModel::BODY_18) && (pairIndex==17 || pairIndex==18))
|| (poseModel == PoseModel::BODY_19 && (pairIndex==18 || pairIndex==19))
|| (poseModel == PoseModel::BODY_23 && (pairIndex==22 || pairIndex==23)))
{
for (const auto& connectionKI : connectionK)
{
......@@ -291,7 +308,8 @@ namespace op
break;
}
else if (subsetCounter < 1)
error("Bad subsetCounter. Bug in this function if this happens.", __LINE__, __FUNCTION__, __FILE__);
error("Bad subsetCounter. Bug in this function if this happens.",
__LINE__, __FUNCTION__, __FILE__);
}
// Fill and return poseKeypoints
......@@ -327,10 +345,16 @@ namespace op
}
}
template void connectBodyPartsCpu(Array<float>& poseKeypoints, const float* const heatMapPtr, const float* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
const int maxPeaks, const int interMinAboveThreshold, const float interThreshold, const int minSubsetCnt,
const float minSubsetScore, const float scaleFactor);
template void connectBodyPartsCpu(Array<double>& poseKeypoints, const double* const heatMapPtr, const double* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
const int maxPeaks, const int interMinAboveThreshold, const double interThreshold, const int minSubsetCnt,
const double minSubsetScore, const double scaleFactor);
template void connectBodyPartsCpu(Array<float>& poseKeypoints, const float* const heatMapPtr,
const float* const peaksPtr, const PoseModel poseModel,
const Point<int>& heatMapSize, const int maxPeaks,
const int interMinAboveThreshold, const float interThreshold,
const int minSubsetCnt, const float minSubsetScore,
const float scaleFactor);
template void connectBodyPartsCpu(Array<double>& poseKeypoints, const double* const heatMapPtr,
const double* const peaksPtr, const PoseModel poseModel,
const Point<int>& heatMapSize, const int maxPeaks,
const int interMinAboveThreshold, const double interThreshold,
const int minSubsetCnt, const double minSubsetScore,
const double scaleFactor);
}
#include <openpose/pose/renderPose.hpp>
#include <openpose/utilities/keypoint.hpp>
#include <openpose/pose/poseCpuRenderer.hpp>
namespace op
......@@ -13,6 +14,7 @@ namespace op
std::pair<int, std::string> PoseCpuRenderer::renderPose(Array<float>& outputData,
const Array<float>& poseKeypoints,
const float scaleInputToOutput,
const float scaleNetToOutput)
{
try
......@@ -25,7 +27,14 @@ namespace op
std::string elementRenderedName;
// Draw poseKeypoints
if (elementRendered == 0)
renderPoseKeypointsCpu(outputData, poseKeypoints, mPoseModel, mRenderThreshold, mBlendOriginalFrame);
{
// Rescale keypoints to output size
auto poseKeypointsRescaled = poseKeypoints.clone();
scaleKeypoints(poseKeypointsRescaled, scaleInputToOutput);
// Render keypoints
renderPoseKeypointsCpu(outputData, poseKeypointsRescaled, mPoseModel, mRenderThreshold,
mBlendOriginalFrame);
}
// Draw heat maps / PAFs
else
{
......
......@@ -43,12 +43,10 @@ namespace op
}
}
PoseExtractor::PoseExtractor(const Point<int>& netOutputSize, const Point<int>& outputSize,
const PoseModel poseModel, const std::vector<HeatMapType>& heatMapTypes,
PoseExtractor::PoseExtractor(const PoseModel poseModel, const std::vector<HeatMapType>& heatMapTypes,
const ScaleMode heatMapScale) :
mPoseModel{poseModel},
mNetOutputSize{netOutputSize},
mOutputSize{outputSize},
mNetOutputSize{0, 0},
mHeatMapTypes{heatMapTypes},
mHeatMapScaleMode{heatMapScale}
{
......
......@@ -17,7 +17,7 @@ namespace op
struct PoseExtractorCaffe::ImplPoseExtractorCaffe
{
#ifdef USE_CAFFE
const float mResizeScale;
std::vector<int> mNetInputSize4D;
std::shared_ptr<NetCaffe> spNetCaffe;
std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
std::shared_ptr<NmsCaffe<float>> spNmsCaffe;
......@@ -28,13 +28,10 @@ namespace op
std::shared_ptr<caffe::Blob<float>> spPeaksBlob;
std::shared_ptr<caffe::Blob<float>> spPoseBlob;
ImplPoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
const int scaleNumber, const PoseModel poseModel, const int gpuId,
ImplPoseExtractorCaffe(const PoseModel poseModel, const int gpuId,
const std::string& modelFolder, const bool enableGoogleLogging) :
mResizeScale{netOutputSize.x / (float)netInputSize.x},
spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{scaleNumber, 3, (int)netInputSize.y,
(int)netInputSize.x},
modelFolder + POSE_PROTOTXT[(int)poseModel],
mNetInputSize4D{0,0,0,0},
spNetCaffe{std::make_shared<NetCaffe>(modelFolder + POSE_PROTOTXT[(int)poseModel],
modelFolder + POSE_TRAINED_MODEL[(int)poseModel], gpuId,
enableGoogleLogging)},
spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
......@@ -45,32 +42,66 @@ namespace op
#endif
};
PoseExtractorCaffe::PoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
const Point<int>& outputSize, const int scaleNumber,
const PoseModel poseModel, const std::string& modelFolder,
#ifdef USE_CAFFE
inline void reshapePoseExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
std::shared_ptr<NmsCaffe<float>>& nmsCaffe,
std::shared_ptr<BodyPartConnectorCaffe<float>>& bodyPartConnectorCaffe,
boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
std::shared_ptr<caffe::Blob<float>>& peaksBlob,
std::shared_ptr<caffe::Blob<float>>& poseBlob,
const float scaleInputToNetInput,
const PoseModel poseModel)
{
try
{
// HeatMaps extractor blob and layer
UNUSED(scaleInputToNetInput);
resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
POSE_CCN_DECREASE_FACTOR[(int)poseModel]);
// Pose extractor blob and layer
nmsCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()}, POSE_MAX_PEAKS[(int)poseModel]);
// Pose extractor blob and layer
bodyPartConnectorCaffe->Reshape({heatMapsBlob.get(), peaksBlob.get()}, {poseBlob.get()});
// Cuda check
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
}
}
inline bool requiredReshapePoseExtractorCaffe(const std::vector<int>& dimensionsA,
const std::vector<int>& dimensionsB)
{
try
{
return (dimensionsA[0] != dimensionsB[0] || dimensionsA[1] != dimensionsB[1]
|| dimensionsA[2] != dimensionsB[2] || dimensionsA[3] != dimensionsB[3]);
}
catch (const std::exception& e)
{
error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return false;
}
}
#endif
PoseExtractorCaffe::PoseExtractorCaffe(const PoseModel poseModel, const std::string& modelFolder,
const int gpuId, const std::vector<HeatMapType>& heatMapTypes,
const ScaleMode heatMapScale, const bool enableGoogleLogging) :
PoseExtractor{netOutputSize, outputSize, poseModel, heatMapTypes, heatMapScale}
PoseExtractor{poseModel, heatMapTypes, heatMapScale}
#ifdef USE_CAFFE
, upImpl{new ImplPoseExtractorCaffe{netInputSize, netOutputSize, scaleNumber, poseModel,
gpuId, modelFolder, enableGoogleLogging}}
, upImpl{new ImplPoseExtractorCaffe{poseModel, gpuId, modelFolder, enableGoogleLogging}}
#endif
{
try
{
#ifdef USE_CAFFE
const auto resizeScale = mNetOutputSize.x / (float)netInputSize.x;
const auto resizeScaleCheck = resizeScale / (mNetOutputSize.y/(float)netInputSize.y);
if (1+1e-6 < resizeScaleCheck || resizeScaleCheck < 1-1e-6)
error("Net input and output size must be proportional. resizeScaleCheck = "
+ std::to_string(resizeScaleCheck), __LINE__, __FUNCTION__, __FILE__);
// Layers parameters
upImpl->spBodyPartConnectorCaffe->setPoseModel(mPoseModel);
#else
UNUSED(netInputSize);
UNUSED(netOutputSize);
UNUSED(outputSize);
UNUSED(scaleNumber);
UNUSED(poseModel);
UNUSED(modelFolder);
UNUSED(gpuId);
......@@ -97,24 +128,14 @@ namespace op
#ifdef USE_CAFFE
// Logging
log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
// Caffe net
// Initialize Caffe net
upImpl->spNetCaffe->initializationOnThread();
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// HeatMaps extractor blob and layer
// Initialize blobs
upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()}, {upImpl->spHeatMapsBlob.get()},
upImpl->mResizeScale * POSE_CCN_DECREASE_FACTOR[(int)mPoseModel]);
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Pose extractor blob and layer
upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spNmsCaffe->Reshape({upImpl->spHeatMapsBlob.get()},
{upImpl->spPeaksBlob.get()}, POSE_MAX_PEAKS[(int)mPoseModel]);
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Pose extractor blob and layer
upImpl->spPoseBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
upImpl->spBodyPartConnectorCaffe->Reshape({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
{upImpl->spPoseBlob.get()});
cudaCheck(__LINE__, __FUNCTION__, __FILE__);
// Logging
log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
......@@ -127,7 +148,7 @@ namespace op
}
void PoseExtractorCaffe::forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize,
const std::vector<double>& scaleRatios)
const std::vector<double>& scaleInputToNetInputs)
{
try
{
......@@ -137,10 +158,21 @@ namespace op
error("Empty inputNetData.", __LINE__, __FUNCTION__, __FILE__);
// 1. Caffe deep network
upImpl->spNetCaffe->forwardPass(inputNetData.getConstPtr()); // ~80ms
upImpl->spNetCaffe->forwardPass(inputNetData); // ~80ms
// Reshape blobs if required
if (requiredReshapePoseExtractorCaffe(upImpl->mNetInputSize4D, inputNetData.getSize()))
{
upImpl->mNetInputSize4D = inputNetData.getSize();
mNetOutputSize = Point<int>{upImpl->mNetInputSize4D[3], upImpl->mNetInputSize4D[2]};
reshapePoseExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spNmsCaffe,
upImpl->spBodyPartConnectorCaffe, upImpl->spCaffeNetOutputBlob,
upImpl->spHeatMapsBlob, upImpl->spPeaksBlob, upImpl->spPoseBlob,
scaleInputToNetInputs[0], mPoseModel);
}
// 2. Resize heat maps + merge different scales
const std::vector<float> floatScaleRatios(scaleRatios.begin(), scaleRatios.end());
const std::vector<float> floatScaleRatios(scaleInputToNetInputs.begin(), scaleInputToNetInputs.end());
upImpl->spResizeAndMergeCaffe->setScaleRatios(floatScaleRatios);
#ifdef USE_CUDA
upImpl->spResizeAndMergeCaffe->Forward_gpu({upImpl->spCaffeNetOutputBlob.get()}, // ~5ms
......@@ -159,17 +191,15 @@ namespace op
error("NmsCaffe CPU version not implemented yet.", __LINE__, __FUNCTION__, __FILE__);
#endif
// Get scale net to output
// Get scale net to output (i.e. image input)
const auto scaleProducerToNetInput = resizeGetScaleFactor(inputDataSize, mNetOutputSize);
const Point<int> netSize{intRound(scaleProducerToNetInput*inputDataSize.x),
intRound(scaleProducerToNetInput*inputDataSize.y)};
if (mOutputSize.x > 0 && mOutputSize.y > 0)
mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, mOutputSize)};
else
mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, inputDataSize)};
mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, inputDataSize)};
// 4. Connecting body parts
upImpl->spBodyPartConnectorCaffe->setScaleNetToOutput(mScaleNetToOutput);
// upImpl->spBodyPartConnectorCaffe->setScaleNetToOutput(1);
upImpl->spBodyPartConnectorCaffe->setInterMinAboveThreshold(
(int)get(PoseProperty::ConnectInterMinAboveThreshold)
);
......@@ -178,14 +208,19 @@ namespace op
upImpl->spBodyPartConnectorCaffe->setMinSubsetScore((float)get(PoseProperty::ConnectMinSubsetScore));
// GPU version not implemented yet
upImpl->spBodyPartConnectorCaffe->Forward_cpu({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
mPoseKeypoints);
// upImpl->spBodyPartConnectorCaffe->Forward_gpu({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
// {upImpl->spPoseBlob.get()}, mPoseKeypoints);
// #ifdef USE_CUDA
// upImpl->spBodyPartConnectorCaffe->Forward_gpu({upImpl->spHeatMapsBlob.get(),
// upImpl->spPeaksBlob.get()},
// {upImpl->spPoseBlob.get()}, mPoseKeypoints);
// #else
upImpl->spBodyPartConnectorCaffe->Forward_cpu({upImpl->spHeatMapsBlob.get(),
upImpl->spPeaksBlob.get()},
mPoseKeypoints);
// #endif
#else
UNUSED(inputNetData);
UNUSED(inputDataSize);
UNUSED(scaleRatios);
UNUSED(scaleInputToNetInputs);
#endif
}
catch (const std::exception& e)
......
......@@ -5,6 +5,7 @@
#include <openpose/pose/poseParameters.hpp>
#include <openpose/pose/renderPose.hpp>
#include <openpose/utilities/cuda.hpp>
#include <openpose/utilities/keypoint.hpp>
#include <openpose/pose/poseGpuRenderer.hpp>
namespace op
......@@ -61,6 +62,7 @@ namespace op
std::pair<int, std::string> PoseGpuRenderer::renderPose(Array<float>& outputData,
const Array<float>& poseKeypoints,
const float scaleInputToOutput,
const float scaleNetToOutput)
{
try
......@@ -83,9 +85,14 @@ namespace op
// Draw poseKeypoints
if (elementRendered == 0)
{
// Rescale keypoints to output size
auto poseKeypointsRescaled = poseKeypoints.clone();
scaleKeypoints(poseKeypointsRescaled, scaleInputToOutput);
// Render keypoints
if (!poseKeypoints.empty())
cudaMemcpy(pGpuPose,
poseKeypoints.getConstPtr(), numberPeople * numberBodyParts * 3 * sizeof(float),
poseKeypointsRescaled.getConstPtr(),
numberPeople * numberBodyParts * 3 * sizeof(float),
cudaMemcpyHostToDevice);
renderPoseKeypointsGpu(*spGpuMemory, mPoseModel, numberPeople, frameSize, pGpuPose,
mRenderThreshold, mShowGooglyEyes, mBlendOriginalFrame,
......@@ -104,7 +111,7 @@ namespace op
elementRenderedName = mPartIndexToName.at(elementRendered-1);
renderPoseHeatMapGpu(*spGpuMemory, mPoseModel, frameSize,
spPoseExtractor->getHeatMapGpuConstPtr(),
heatMapSize, scaleNetToOutput, elementRendered,
heatMapSize, scaleNetToOutput * scaleInputToOutput, elementRendered,
(mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
}
// Draw PAFs (Part Affinity Fields)
......@@ -113,7 +120,7 @@ namespace op
elementRenderedName = "Heatmaps";
renderPoseHeatMapsGpu(*spGpuMemory, mPoseModel, frameSize,
spPoseExtractor->getHeatMapGpuConstPtr(),
heatMapSize, scaleNetToOutput,
heatMapSize, scaleNetToOutput * scaleInputToOutput,
(mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
}
// Draw PAFs (Part Affinity Fields)
......@@ -122,7 +129,7 @@ namespace op
elementRenderedName = "PAFs (Part Affinity Fields)";
renderPosePAFsGpu(*spGpuMemory, mPoseModel, frameSize,
spPoseExtractor->getHeatMapGpuConstPtr(),
heatMapSize, scaleNetToOutput,
heatMapSize, scaleNetToOutput * scaleInputToOutput,
(mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
}
// Draw affinity between 2 body parts
......@@ -134,7 +141,7 @@ namespace op
elementRenderedName = elementRenderedName.substr(0, elementRenderedName.find("("));
renderPosePAFGpu(*spGpuMemory, mPoseModel, frameSize,
spPoseExtractor->getHeatMapGpuConstPtr(),
heatMapSize, scaleNetToOutput, affinityPartMapped,
heatMapSize, scaleNetToOutput * scaleInputToOutput, affinityPartMapped,
(mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
}
}
......
......@@ -5,7 +5,8 @@
namespace op
{
const std::string errorMessage = "The Array<float> is not a RGB image. This function is only for array of dimension: [sizeA x sizeB x 3].";
const std::string errorMessage = "The Array<float> is not a RGB image. This function is only for array of"
" dimension: [sizeA x sizeB x 3].";
float getDistance(const Array<float>& keypoints, const int person, const int elementA, const int elementB)
{
......@@ -29,7 +30,8 @@ namespace op
{
// Security checks
if (keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions())
error("keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions().", __LINE__, __FUNCTION__, __FILE__);
error("keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions().",
__LINE__, __FUNCTION__, __FILE__);
for (auto dimension = 1u ; dimension < keypointsA.getNumberDimensions() ; dimension++)
if (keypointsA.getSize(dimension) != keypointsB.getSize(dimension))
error("keypointsA.getSize() != keypointsB.getSize().", __LINE__, __FUNCTION__, __FILE__);
......@@ -96,7 +98,8 @@ namespace op
}
}
void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY)
void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX,
const float offsetY)
{
try
{
......@@ -127,8 +130,9 @@ namespace op
}
}
void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints, const std::vector<unsigned int>& pairs,
const std::vector<float> colors, const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle,
void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints,
const std::vector<unsigned int>& pairs, const std::vector<float> colors,
const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle,
const float threshold)
{
try
......@@ -160,12 +164,15 @@ namespace op
// Keypoints
for (auto person = 0 ; person < keypoints.getSize(0) ; person++)
{
const auto personRectangle = getKeypointsRectangle(keypoints, person, numberKeypoints, thresholdRectangle);
const auto personRectangle = getKeypointsRectangle(keypoints, person, numberKeypoints,
thresholdRectangle);
if (personRectangle.area() > 0)
{
const auto ratioAreas = fastMin(1.f, fastMax(personRectangle.width/(float)width, personRectangle.height/(float)height));
const auto ratioAreas = fastMin(1.f, fastMax(personRectangle.width/(float)width,
personRectangle.height/(float)height));
// Size-dependent variables
const auto thicknessRatio = fastMax(intRound(std::sqrt(area)*thicknessCircleRatio * ratioAreas), 2);
const auto thicknessRatio = fastMax(intRound(std::sqrt(area)
* thicknessCircleRatio * ratioAreas), 2);
// Negative thickness in cv::circle means that a filled circle is to be drawn.
const auto thicknessCircle = (ratioAreas > 0.05 ? thicknessRatio : -1);
const auto thicknessLine = intRound(thicknessRatio * thicknessLineRatioWRTCircle);
......@@ -200,7 +207,8 @@ namespace op
const cv::Scalar color{colors[colorIndex % numberColors],
colors[(colorIndex+1) % numberColors],
colors[(colorIndex+2) % numberColors]};
const cv::Point center{intRound(keypoints[faceIndex]), intRound(keypoints[faceIndex+1])};
const cv::Point center{intRound(keypoints[faceIndex]),
intRound(keypoints[faceIndex+1])};
cv::circle(frameR, center, radius, color[0], thicknessCircle, lineType, shift);
cv::circle(frameG, center, radius, color[1], thicknessCircle, lineType, shift);
cv::circle(frameB, center, radius, color[2], thicknessCircle, lineType, shift);
......@@ -216,7 +224,8 @@ namespace op
}
}
Rectangle<float> getKeypointsRectangle(const Array<float>& keypoints, const int person, const int numberKeypoints, const float threshold)
Rectangle<float> getKeypointsRectangle(const Array<float>& keypoints, const int person, const int numberKeypoints,
const float threshold)
{
try
{
......
......@@ -147,8 +147,8 @@ namespace op
{
try
{
const auto ratioWidth = targetSize.x / (double)initialSize.x;
const auto ratioHeight = targetSize.y / (double)initialSize.y;
const auto ratioWidth = (targetSize.x - 1) / (double)(initialSize.x - 1);
const auto ratioHeight = (targetSize.y - 1) / (double)(initialSize.y - 1);
return fastMin(ratioWidth, ratioHeight);
}
catch (const std::exception& e)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册