Net resolution can be dynamically changed

bc55f86c · gineshidalgo99 · 00438f47 · bc55f86c · bc55f86c · bc55f86c
39 changed file
--- a/doc/demo_overview.md
+++ b/doc/demo_overview.md
@@ -108,8 +108,6 @@ We enumerate some of the most important flags, check the `Flags Detailed Descrip
 - `--part_to_show`: Prediction channel to visualize.
 - `--no_display`: Display window not opened. Useful for servers and/or to slightly speed up OpenPose.
 - `--num_gpu 2 --num_gpu_start 1`: Parallelize over this number of GPUs starting by the desired device id. By default it uses all the available GPUs.
- `--net_resolution 656x368: For HD input (default value).
- `--net_resolution 496x368: For VGA input.
 - `--model_pose MPI`: Model to use, affects number keypoints, speed and accuracy.
 - `--logging_level 3`: Logging messages threshold, range [0,255]: 0 will output any message & 255 will output none. Current messages in the range [1-4], 1 for low priority messages and 4 for important ones.

@@ -145,7 +143,7 @@ Each flag is divided into flag name, default value, and description.
 4. OpenPose Body Pose
 - DEFINE_bool(body_disable,               false,          "Disable body keypoint detection. Option only possible for faster (but less accurate) face keypoint detection.");
 - DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), `MPI_4_layers` (15 keypoints, even faster but less accurate).");
- DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is decreased, the speed increases. For maximum speed-accuracy balance, it should keep the closest aspect ratio possible to the images or videos to be processed. Using `-1` in any of the dimensions, OP will choose the optimal resolution depending on the other value introduced by the user. E.g. the default `-1x368` is equivalent to `656x368` in 16:9 videos, e.g. full HD (1980x1080) and HD (1280x720) resolutions.");
+- DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is decreased, the speed increases. For maximum speed-accuracy balance, it should keep the closest aspect ratio possible to the images or videos to be processed. Using `-1` in any of the dimensions, OP will choose the optimal resolution depending on the other value introduced by the user. E.g. the default `-1x368` is equivalent to `656x368` in 16:9 videos, e.g. full HD (1980x1080) and HD (1280x720) resolutions.");
 - DEFINE_int32(scale_number,              1,              "Number of scales to average.");
 - DEFINE_double(scale_gap,                0.3,            "Scale gap between scales. No effect unless scale_number > 1. Initial scale is always 1. If you want to change the initial scale, you actually want to multiply the `net_resolution` by your desired initial scale.");
 - DEFINE_bool(heatmaps_add_parts,         false,          "If true, it will add the body part heatmaps to the final op::Datum::poseHeatMaps array, and analogously face & hand heatmaps to op::Datum::faceHeatMaps & op::Datum::handHeatMaps (program speed will decrease). Not required for our library, enable it only if you intend to process this information later. If more than one `add_heatmaps_X` flag is enabled, it will place then in sequential memory order: body parts + bkg + PAFs. It will follow the order on POSE_BODY_PART_MAPPING in `include/openpose/pose/poseParameters.hpp`.");

--- a/doc/release_notes.md
+++ b/doc/release_notes.md
@@ -113,7 +113,7 @@ OpenPose Library - Release Notes



-## Current version (future OpenPose 1.2.0)
+## Current version (future OpenPose 2.0.0alpha)
 1. Main improvements:
    1. Added IP camera support.
    2. Output images can have the input size, OpenPose able to change its size for each image and not required fixed size anymore.
@@ -127,15 +127,18 @@ OpenPose Library - Release Notes
    7. COCO JSON file outputs 0 as score for non-detected keypoints.
    8. Added example for OpenPose for user asynchronous output and cleaned all `tutorial_wrapper/` examples.
    9. Added `-1` option for `net_resolution` in order to auto-select the best possible aspect ratio given the user input.
-    10. Added example to add functionality to OpenPose.
+    10. Net resolution can be dynamically changed (e.g. for images with different size).
+    11. Added example to add functionality to OpenPose.
 2. Functions or parameters renamed:
-    1. OpenPose able to change its size and initial size:
+    1. OpenPose able to change its size and initial size dynamically:
        1. Flag `resolution` renamed as `output_resolution`.
        2. FrameDisplayer, GuiInfoAdder and Gui constructors arguments modified (gui module).
        3. OpOutputToCvMat constructor removed (core module).
        4. New Renders classes to split GpuRenderers from CpuRenderers.
        5. Etc.
-    2. `CPU_ONLY` changed by `USE_CUDA` to keep format.
+    2. OpenPose able to change its net resolution size dynamically:
+        1. Changed several functions on `core/`, `pose/`, `face/`, and `hand/` modules.
+    3. `CPU_ONLY` changed by `USE_CUDA` to keep format.
 3. Main bugs fixed:
    1. Ubuntu installer script now works even if Python pip was not installed previously.
    2. Flags to set first and last frame as well as jumping frames backward and forward now works on image directory reader.
--- a/examples/openpose/openpose.cpp
+++ b/examples/openpose/openpose.cpp
@@ -72,7 +72,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/examples/tests/pose_accuracy_coco_test.sh
+++ b/examples/tests/pose_accuracy_coco_test.sh
@@ -21,7 +21,8 @@ JSON_FOLDER=../evaluation/coco_val_jsons/
 OP_BIN=./build/examples/openpose/openpose.bin

    # 1 scale
-$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558 --output_resolution "1280x720"
+$OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1.json --no_display --render_pose 0 --frame_last 3558
+# --output_resolution "1280x720"

 #     # 3 scales
 # $OP_BIN --image_dir $IMAGE_FOLDER --write_coco_json ${JSON_FOLDER}1_3.json --no_display --render_pose 0 --scale_number 3 --scale_gap 0.25 --frame_last 3558 --output_resolution "1280x720"

--- a/examples/tests/wrapperHandFromJsonTest.hpp
+++ b/examples/tests/wrapperHandFromJsonTest.hpp
@@ -232,11 +232,6 @@ namespace op
            // Re-scale pose if desired
            if (wrapperStructPose.keypointScale != ScaleMode::InputResolution)
                error("Only wrapperStructPose.keypointScale == ScaleMode::InputResolution.", __LINE__, __FUNCTION__, __FILE__);
-            if (finalOutputSize != producerSize)
-            {
-                auto keypointScaler = std::make_shared<KeypointScaler>(ScaleMode::InputResolution);
-                mPostProcessingWs.emplace_back(std::make_shared<WKeypointScaler<TDatumsPtr>>(keypointScaler));
-            }

            mOutputWs.clear();
            // Write people pose data on disk (json format)

--- a/examples/tutorial_add_module/1_custom_post_processing.cpp
+++ b/examples/tutorial_add_module/1_custom_post_processing.cpp
@@ -82,7 +82,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/examples/tutorial_pose/1_extract_from_image.cpp
+++ b/examples/tutorial_pose/1_extract_from_image.cpp
@@ -35,7 +35,7 @@ DEFINE_string(image_path,               "examples/media/COCO_val2014_00000000019
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
 DEFINE_string(model_folder,             "models/",      "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
@@ -75,8 +75,6 @@ int openPoseTutorialPose1()
    const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
    // netInputSize
    const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
-    // netOutputSize
-    const auto netOutputSize = netInputSize;
    // poseModel
    const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
    // Check no contradictory flags enabled
@@ -93,9 +91,8 @@ int openPoseTutorialPose1()
    op::ScaleAndSizeExtractor scaleAndSizeExtractor(netInputSize, outputSize, FLAGS_scale_number, FLAGS_scale_gap);
    op::CvMatToOpInput cvMatToOpInput;
    op::CvMatToOpOutput cvMatToOpOutput;
-    op::PoseExtractorCaffe poseExtractorCaffe{netInputSize, netOutputSize, outputSize, FLAGS_scale_number, poseModel,
-                                              FLAGS_model_folder, FLAGS_num_gpu_start, {}, op::ScaleMode::ZeroToOne,
-                                              enableGoogleLogging};
+    op::PoseExtractorCaffe poseExtractorCaffe{poseModel, FLAGS_model_folder,
+                                              FLAGS_num_gpu_start, {}, op::ScaleMode::ZeroToOne, enableGoogleLogging};
    op::PoseCpuRenderer poseRenderer{poseModel, (float)FLAGS_render_threshold, !FLAGS_disable_blending,
                                     (float)FLAGS_alpha_pose};
    op::OpOutputToCvMat opOutputToCvMat;
@@ -125,7 +122,7 @@ int openPoseTutorialPose1()
    poseExtractorCaffe.forwardPass(netInputArray, imageSize, scaleInputToNetInputs);
    const auto poseKeypoints = poseExtractorCaffe.getPoseKeypoints();
    // Step 5 - Render poseKeypoints
-    poseRenderer.renderPose(outputArray, poseKeypoints);
+    poseRenderer.renderPose(outputArray, poseKeypoints, scaleInputToOutput);
    // Step 6 - OpenPose output format to cv::Mat
    auto outputImage = opOutputToCvMat.formatToCvMat(outputArray);


--- a/examples/tutorial_pose/2_extract_pose_or_heatmat_from_image.cpp
+++ b/examples/tutorial_pose/2_extract_pose_or_heatmat_from_image.cpp
@@ -35,7 +35,7 @@ DEFINE_string(image_path,               "examples/media/COCO_val2014_00000000019
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
 DEFINE_string(model_folder,             "models/",      "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"
@@ -80,8 +80,6 @@ int openPoseTutorialPose2()
    const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
    // netInputSize
    const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
-    // netOutputSize
-    const auto netOutputSize = netInputSize;
    // poseModel
    const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
    // Check no contradictory flags enabled
@@ -99,8 +97,8 @@ int openPoseTutorialPose2()
    op::CvMatToOpInput cvMatToOpInput;
    op::CvMatToOpOutput cvMatToOpOutput;
    auto poseExtractorPtr = std::make_shared<op::PoseExtractorCaffe>(
-        netInputSize, netOutputSize, outputSize, FLAGS_scale_number, poseModel, FLAGS_model_folder,
-        FLAGS_num_gpu_start, std::vector<op::HeatMapType>{}, op::ScaleMode::ZeroToOne, enableGoogleLogging
+        poseModel, FLAGS_model_folder, FLAGS_num_gpu_start, std::vector<op::HeatMapType>{}, op::ScaleMode::ZeroToOne,
+        enableGoogleLogging
    );
    op::PoseGpuRenderer poseGpuRenderer{poseModel, poseExtractorPtr, (float)FLAGS_render_threshold,
                                        !FLAGS_disable_blending, (float)FLAGS_alpha_pose, (float)FLAGS_alpha_heatmap};

--- a/examples/tutorial_wrapper/1_user_asynchronous_output.cpp
+++ b/examples/tutorial_wrapper/1_user_asynchronous_output.cpp
@@ -72,7 +72,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/examples/tutorial_wrapper/2_user_synchronous.cpp
+++ b/examples/tutorial_wrapper/2_user_synchronous.cpp
@@ -55,7 +55,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/examples/tutorial_wrapper/3_user_asynchronous.cpp
+++ b/examples/tutorial_wrapper/3_user_asynchronous.cpp
@@ -55,7 +55,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/examples_beta/openpose3d/openpose3d.cpp
+++ b/examples_beta/openpose3d/openpose3d.cpp
@@ -55,7 +55,7 @@ DEFINE_bool(body_disable,               false,          "Disable body keypoint d
                                                        " keypoint detection.");
 DEFINE_string(model_pose,               "COCO",         "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), "
                                                        "`MPI_4_layers` (15 keypoints, even faster but less accurate).");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
+DEFINE_string(net_resolution,           "-1x368",       "Multiples of 16. If it is increased, the accuracy potentially increases. If it is"
                                                        " decreased, the speed increases. For maximum speed-accuracy balance, it should keep the"
                                                        " closest aspect ratio possible to the images or videos to be processed. Using `-1` in"
                                                        " any of the dimensions, OP will choose the optimal aspect ratio depending on the user's"

--- a/include/openpose/core/keypointScaler.hpp
+++ b/include/openpose/core/keypointScaler.hpp
@@ -11,9 +11,11 @@ namespace op
    public:
        explicit KeypointScaler(const ScaleMode scaleMode);

-        void scale(Array<float>& arrayToScale, const double scaleInputToOutput, const double scaleNetToOutput, const Point<int>& producerSize) const;
+        void scale(Array<float>& arrayToScale, const double scaleInputToOutput, const double scaleNetToOutput,
+                   const Point<int>& producerSize) const;

-        void scale(std::vector<Array<float>>& arraysToScale, const double scaleInputToOutput, const double scaleNetToOutput, const Point<int>& producerSize) const;
+        void scale(std::vector<Array<float>>& arraysToScale, const double scaleInputToOutput,
+                   const double scaleNetToOutput, const Point<int>& producerSize) const;

    private:
        const ScaleMode mScaleMode;

--- a/include/openpose/core/net.hpp
+++ b/include/openpose/core/net.hpp
@@ -10,13 +10,7 @@ namespace op
    public:
        virtual void initializationOnThread() = 0;

-        // Alternative a) getInputDataCpuPtr or getInputDataGpuPtr + forwardPass()
-        virtual float* getInputDataCpuPtr() const = 0;
-
-        virtual float* getInputDataGpuPtr() const = 0;
-
-        // Alternative b)
-        virtual void forwardPass(const float* const inputData = nullptr) const = 0;
+        virtual void forwardPass(const Array<float>& inputData) const = 0;
    };
 }


--- a/include/openpose/core/netCaffe.hpp
+++ b/include/openpose/core/netCaffe.hpp
@@ -9,22 +9,14 @@ namespace op
    class OP_API NetCaffe : public Net
    {
    public:
-        NetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
-                 const std::string& caffeTrainedModel, const int gpuId = 0,
-                 const bool enableGoogleLogging = true,
-                 const std::string& lastBlobName = "net_output");
+        NetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId = 0,
+                 const bool enableGoogleLogging = true, const std::string& lastBlobName = "net_output");

        virtual ~NetCaffe();

        void initializationOnThread();

-        // Alternative a) getInputDataCpuPtr or getInputDataGpuPtr + forwardPass
-        float* getInputDataCpuPtr() const;
-
-        float* getInputDataGpuPtr() const;
-
-        // Alternative b)
-        void forwardPass(const float* const inputNetData = nullptr) const;
+        void forwardPass(const Array<float>& inputNetData) const;

        boost::shared_ptr<caffe::Blob<float>> getOutputBlob() const;


--- a/include/openpose/pose/poseCpuRenderer.hpp
+++ b/include/openpose/pose/poseCpuRenderer.hpp
@@ -18,6 +18,7 @@ namespace op
                        const float alphaHeatMap = POSE_DEFAULT_ALPHA_HEAT_MAP);

        std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                               const float scaleInputToOutput,
                                               const float scaleNetToOutput = -1.f);

    private:

--- a/include/openpose/pose/poseExtractor.hpp
+++ b/include/openpose/pose/poseExtractor.hpp
@@ -12,7 +12,7 @@ namespace op
    class OP_API PoseExtractor
    {
    public:
-        PoseExtractor(const Point<int>& netOutputSize, const Point<int>& outputSize, const PoseModel poseModel,
+        PoseExtractor(const PoseModel poseModel,
                      const std::vector<HeatMapType>& heatMapTypes = {},
                      const ScaleMode heatMapScale = ScaleMode::ZeroToOne);

@@ -45,8 +45,7 @@ namespace op

    protected:
        const PoseModel mPoseModel;
-        const Point<int> mNetOutputSize;
-        const Point<int> mOutputSize;
+        Point<int> mNetOutputSize;
        Array<float> mPoseKeypoints;
        float mScaleNetToOutput;


--- a/include/openpose/pose/poseExtractorCaffe.hpp
+++ b/include/openpose/pose/poseExtractorCaffe.hpp
@@ -10,9 +10,7 @@ namespace op
    class OP_API PoseExtractorCaffe : public PoseExtractor
    {
    public:
-        PoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
-                           const Point<int>& outputSize, const int scaleNumber, const PoseModel poseModel,
-                           const std::string& modelFolder, const int gpuId,
+        PoseExtractorCaffe(const PoseModel poseModel, const std::string& modelFolder, const int gpuId,
                           const std::vector<HeatMapType>& heatMapTypes = {},
                           const ScaleMode heatMapScale = ScaleMode::ZeroToOne,
                           const bool enableGoogleLogging = true);
@@ -22,7 +20,7 @@ namespace op
        void netInitializationOnThread();

        void forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize,
-                         const std::vector<double>& scaleRatios = {1.f});
+                         const std::vector<double>& scaleInputToNetInputs = {1.f});

        const float* getHeatMapCpuConstPtr() const;


--- a/include/openpose/pose/poseGpuRenderer.hpp
+++ b/include/openpose/pose/poseGpuRenderer.hpp
@@ -24,6 +24,7 @@ namespace op
        void initializationOnThread();

        std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                               const float scaleInputToOutput,
                                               const float scaleNetToOutput = -1.f);

    private:

--- a/include/openpose/pose/poseRenderer.hpp
+++ b/include/openpose/pose/poseRenderer.hpp
@@ -13,7 +13,9 @@ namespace op

        virtual void initializationOnThread(){};

-        virtual std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints, const float scaleNetToOutput = -1.f) = 0;
+        virtual std::pair<int, std::string> renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                                       const float scaleInputToOutput,
+                                                       const float scaleNetToOutput = -1.f) = 0;

    protected:
        const PoseModel mPoseModel;

--- a/include/openpose/pose/wPoseRenderer.hpp
+++ b/include/openpose/pose/wPoseRenderer.hpp
@@ -57,7 +57,9 @@ namespace op
                const auto profilerKey = Profiler::timerInit(__LINE__, __FUNCTION__, __FILE__);
                // Render people pose
                for (auto& tDatum : *tDatums)
-                    tDatum.elementRendered = spPoseRenderer->renderPose(tDatum.outputData, tDatum.poseKeypoints, (float)tDatum.scaleNetToOutput);
+                    tDatum.elementRendered = spPoseRenderer->renderPose(tDatum.outputData, tDatum.poseKeypoints,
+                                                                        (float)tDatum.scaleInputToOutput,
+                                                                        (float)tDatum.scaleNetToOutput);
                // Profiling speed
                Profiler::timerEnd(profilerKey);
                Profiler::printAveragedTimeMsOnIterationX(profilerKey, __LINE__, __FUNCTION__, __FILE__);

--- a/include/openpose/wrapper/wrapper.hpp
+++ b/include/openpose/wrapper/wrapper.hpp
@@ -576,28 +576,6 @@ namespace op
                if (finalOutputSize.x == -1 || finalOutputSize.y == -1)
                    finalOutputSize = producerSize;
            }
-            // Set poseNetInputSize if -1 used
-            Point<int> poseNetInputSize = wrapperStructPose.netInputSize;
-            if (poseNetInputSize.x == -1 && poseNetInputSize.y == -1)
-                error("Net input size cannot be -1x-1.", __LINE__, __FUNCTION__, __FILE__);
-            else if (poseNetInputSize.x == -1 || poseNetInputSize.y == -1)
-            {
-                if (producerSize.x <= 0 || producerSize.y <= 0)
-                    error("Net resolution cannot be -1 for image_dir, only for video, webcam, and IP camera.",
-                          __LINE__, __FUNCTION__, __FILE__);
-                else if (poseNetInputSize.x == -1)
-                    poseNetInputSize.x = 16 * intRound(
-                        poseNetInputSize.y * producerSize.x / (float) producerSize.y / 16.f
-                    );
-                else // if (poseNetInputSize.y == -1)
-                    poseNetInputSize.y = 16 * intRound(
-                        poseNetInputSize.x * producerSize.y / (float) producerSize.x / 16.f
-                    );
-            }
-            // Security checks
-            if ((poseNetInputSize.x > 0 && poseNetInputSize.x % 16 != 0)
-                || (poseNetInputSize.y > 0 && poseNetInputSize.y % 16 != 0))
-                error("Net input resolution must be multiples of 16.", __LINE__, __FUNCTION__, __FILE__);

            // Producer
            if (wrapperStructInput.producerSharedPtr != nullptr)
@@ -613,7 +591,8 @@ namespace op

            // Get input scales and sizes
            const auto scaleAndSizeExtractor = std::make_shared<ScaleAndSizeExtractor>(
-                poseNetInputSize, finalOutputSize, wrapperStructPose.scalesNumber, wrapperStructPose.scaleGap
+                wrapperStructPose.netInputSize, finalOutputSize, wrapperStructPose.scalesNumber,
+                wrapperStructPose.scaleGap
            );
            spWScaleAndSizeExtractor = std::make_shared<WScaleAndSizeExtractor<TDatumsPtr>>(scaleAndSizeExtractor);

@@ -627,7 +606,6 @@ namespace op
            }

            // Pose estimators & renderers
-            const Point<int>& poseNetOutputSize = poseNetInputSize;
            std::vector<std::shared_ptr<PoseExtractor>> poseExtractors;
            std::vector<std::shared_ptr<PoseGpuRenderer>> poseGpuRenderers;
            std::shared_ptr<PoseCpuRenderer> poseCpuRenderer;
@@ -639,7 +617,6 @@ namespace op
                // Pose estimators
                for (auto gpuId = 0; gpuId < gpuNumber; gpuId++)
                    poseExtractors.emplace_back(std::make_shared<PoseExtractorCaffe>(
-                        poseNetInputSize, poseNetOutputSize, finalOutputSize, wrapperStructPose.scalesNumber,
                        wrapperStructPose.poseModel, modelFolder, gpuId + gpuNumberStart,
                        wrapperStructPose.heatMapTypes, wrapperStructPose.heatMapScale,
                        wrapperStructPose.enableGoogleLogging
@@ -846,14 +823,14 @@ namespace op
                mPostProcessingWs.emplace_back(std::make_shared<WOpOutputToCvMat<TDatumsPtr>>(opOutputToCvMat));
            }
            // Re-scale pose if desired
-            // If desired scale is not the current output
-            if (wrapperStructPose.keypointScale != ScaleMode::OutputResolution
-                // and desired scale is not input when size(output) = size(input)
-                && !(wrapperStructPose.keypointScale == ScaleMode::InputResolution &&
+            // If desired scale is not the current input
+            if (wrapperStructPose.keypointScale != ScaleMode::InputResolution
+                // and desired scale is not output when size(input) = size(output)
+                && !(wrapperStructPose.keypointScale == ScaleMode::OutputResolution &&
                     (finalOutputSize == producerSize || finalOutputSize.x <= 0 || finalOutputSize.y <= 0))
-                // and desired scale is not net output when size(output) = size(net output)
+                // and desired scale is not net output when size(input) = size(net output)
                && !(wrapperStructPose.keypointScale == ScaleMode::NetOutputResolution
-                     && finalOutputSize == poseNetOutputSize))
+                     && producerSize == wrapperStructPose.netInputSize))
            {
                // Then we must rescale the keypoints
                auto keypointScaler = std::make_shared<KeypointScaler>(wrapperStructPose.keypointScale);

--- a/src/openpose/core/keypointScaler.cpp
+++ b/src/openpose/core/keypointScaler.cpp
@@ -27,31 +27,33 @@ namespace op
    {
        try
        {
-            if (mScaleMode != ScaleMode::OutputResolution)
+            if (mScaleMode != ScaleMode::InputResolution)
            {
-                // InputResolution
-                if (mScaleMode == ScaleMode::InputResolution)
+                // OutputResolution
+                if (mScaleMode == ScaleMode::OutputResolution)
+                {
                    for (auto& arrayToScale : arrayToScalesToScale)
-                        scaleKeypoints(arrayToScale, float(1./scaleInputToOutput));
+                        scaleKeypoints(arrayToScale, float(scaleInputToOutput));
+                }
                // NetOutputResolution
                else if (mScaleMode == ScaleMode::NetOutputResolution)
+                {
                    for (auto& arrayToScale : arrayToScalesToScale)
                        scaleKeypoints(arrayToScale, float(1./scaleNetToOutput));
+                }
                // [0,1]
                else if (mScaleMode == ScaleMode::ZeroToOne)
                {
-                    const auto scale = float(1./scaleInputToOutput);
-                    const auto scaleX = scale / ((float)producerSize.x - 1.f);
-                    const auto scaleY = scale / ((float)producerSize.y - 1.f);
+                    const auto scaleX = 1.f / ((float)producerSize.x - 1.f);
+                    const auto scaleY = 1.f / ((float)producerSize.y - 1.f);
                    for (auto& arrayToScale : arrayToScalesToScale)
                        scaleKeypoints(arrayToScale, scaleX, scaleY);
                }
                // [-1,1]
                else if (mScaleMode == ScaleMode::PlusMinusOne)
                {
-                    const auto scale = float(2./scaleInputToOutput);
-                    const auto scaleX = (scale / ((float)producerSize.x - 1.f));
-                    const auto scaleY = (scale / ((float)producerSize.y - 1.f));
+                    const auto scaleX = (2.f / ((float)producerSize.x - 1.f));
+                    const auto scaleY = (2.f / ((float)producerSize.y - 1.f));
                    const auto offset = -1.f;
                    for (auto& arrayToScale : arrayToScalesToScale)
                        scaleKeypoints(arrayToScale, scaleX, scaleY, offset, offset);

--- a/src/openpose/core/netCaffe.cpp
+++ b/src/openpose/core/netCaffe.cpp
@@ -19,26 +19,21 @@ namespace op
        #ifdef USE_CAFFE
            // Init with constructor
            const int mGpuId;
-            const std::array<int, 4> mNetInputSize4D;
-            const unsigned long mNetInputMemory;
            const std::string mCaffeProto;
            const std::string mCaffeTrainedModel;
            const std::string mLastBlobName;
+            std::vector<int> mNetInputSize4D;
            // Init with thread
            std::unique_ptr<caffe::Net<float>> upCaffeNet;
            boost::shared_ptr<caffe::Blob<float>> spOutputBlob;

-            ImplNetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
-                         const std::string& caffeTrainedModel, const int gpuId,
+            ImplNetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId,
                         const bool enableGoogleLogging, const std::string& lastBlobName) :
                mGpuId{gpuId},
-                // mNetInputSize4D{netInputSize4D}, // This line crashes on some devices with old G++
-                mNetInputSize4D{netInputSize4D[0], netInputSize4D[1], netInputSize4D[2], netInputSize4D[3]},
-                mNetInputMemory{sizeof(float) * std::accumulate(mNetInputSize4D.begin(), mNetInputSize4D.end(), 1,
-                                                                std::multiplies<int>())},
                mCaffeProto{caffeProto},
                mCaffeTrainedModel{caffeTrainedModel},
-                mLastBlobName{lastBlobName}
+                mLastBlobName{lastBlobName},
+                mNetInputSize4D{0,0,0,0}
            {
                const std::string message{".\nPossible causes:\n\t1. Not downloading the OpenPose trained models."
                                          "\n\t2. Not running OpenPose from the same directory where the `model`"
@@ -62,11 +57,40 @@ namespace op
        #endif
    };

-    NetCaffe::NetCaffe(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto,
-                       const std::string& caffeTrainedModel, const int gpuId,
+    #ifdef USE_CAFFE
+        inline void reshapeNetCaffe(caffe::Net<float>* caffeNet, const std::vector<int>& dimensions)
+        {
+            try
+            {
+                caffeNet->blobs()[0]->Reshape(dimensions);
+                caffeNet->Reshape();
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
+
+        inline bool requiredReshapeNetCaffe(const std::vector<int>& dimensionsA, const std::vector<int>& dimensionsB)
+        {
+            try
+            {
+                return (dimensionsA[0] != dimensionsB[0] || dimensionsA[1] != dimensionsB[1]
+                        || dimensionsA[2] != dimensionsB[2] || dimensionsA[3] != dimensionsB[3]);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+                return false;
+            }
+        }
+    #endif
+
+    NetCaffe::NetCaffe(const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId,
                       const bool enableGoogleLogging, const std::string& lastBlobName)
        #ifdef USE_CAFFE
-            : upImpl{new ImplNetCaffe{netInputSize4D, caffeProto, caffeTrainedModel, gpuId, enableGoogleLogging,
+            : upImpl{new ImplNetCaffe{caffeProto, caffeTrainedModel, gpuId, enableGoogleLogging,
                                      lastBlobName}}
        #endif
    {
@@ -98,13 +122,14 @@ namespace op
        {
            #ifdef USE_CAFFE
                // Initialize net
-                caffe::Caffe::set_mode(caffe::Caffe::GPU);
-                caffe::Caffe::SetDevice(upImpl->mGpuId);
+                #ifdef USE_CUDA
+                    caffe::Caffe::set_mode(caffe::Caffe::GPU);
+                    caffe::Caffe::SetDevice(upImpl->mGpuId);
+                #else
+                    caffe::Caffe::set_mode(caffe::Caffe::CPU);
+                #endif
                upImpl->upCaffeNet.reset(new caffe::Net<float>{upImpl->mCaffeProto, caffe::TEST});
                upImpl->upCaffeNet->CopyTrainedLayersFrom(upImpl->mCaffeTrainedModel);
-                upImpl->upCaffeNet->blobs()[0]->Reshape({upImpl->mNetInputSize4D[0], upImpl->mNetInputSize4D[1],
-                                                         upImpl->mNetInputSize4D[2], upImpl->mNetInputSize4D[3]});
-                upImpl->upCaffeNet->Reshape();
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
                // Set spOutputBlob
                upImpl->spOutputBlob = upImpl->upCaffeNet->blob_by_name(upImpl->mLastBlobName);
@@ -120,58 +145,32 @@ namespace op
        }
    }

-    float* NetCaffe::getInputDataCpuPtr() const
-    {
-        try
-        {
-            #ifdef USE_CAFFE
-                return upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
-            #else
-                return nullptr;
-            #endif
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-            return nullptr;
-        }
-    }
-
-    float* NetCaffe::getInputDataGpuPtr() const
-    {
-        try
-        {
-            #ifdef USE_CAFFE
-                return upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
-            #else
-                return nullptr;
-            #endif
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-            return nullptr;
-        }
-    }
-
-    void NetCaffe::forwardPass(const float* const inputData) const
+    void NetCaffe::forwardPass(const Array<float>& inputData) const
    {
        try
        {
            #ifdef USE_CAFFE
-                // Copy frame data to GPU memory
-                if (inputData != nullptr)
+                // Security checks
+                if (inputData.empty())
+                    error("The Array inputData cannot be empty.", __LINE__, __FUNCTION__, __FILE__);
+                if (inputData.getNumberDimensions() != 4 || inputData.getSize(1) != 3)
+                    error("The Array inputData must have 4 dimensions: [batch size, 3 (RGB), height, width].",
+                          __LINE__, __FUNCTION__, __FILE__);
+                // Reshape Caffe net if required
+                if (requiredReshapeNetCaffe(upImpl->mNetInputSize4D, inputData.getSize()))
                {
-                    #ifdef USE_CUDA
-                        auto* gpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
-                        cudaMemcpy(gpuImagePtr, inputData, upImpl->mNetInputMemory, cudaMemcpyHostToDevice);
-                    #else
-                        auto* cpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
-                        std::copy(inputData,
-                                  inputData + upImpl->mNetInputMemory/sizeof(float),
-                                  cpuImagePtr);
-                    #endif
+                    upImpl->mNetInputSize4D = inputData.getSize();
+                    reshapeNetCaffe(upImpl->upCaffeNet.get(), inputData.getSize());
                }
+                // Copy frame data to GPU memory
+                #ifdef USE_CUDA
+                    auto* gpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_gpu_data();
+                    cudaMemcpy(gpuImagePtr, inputData.getConstPtr(), inputData.getVolume() * sizeof(float),
+                               cudaMemcpyHostToDevice);
+                #else
+                    auto* cpuImagePtr = upImpl->upCaffeNet->blobs().at(0)->mutable_cpu_data();
+                    std::copy(inputData.getConstPtr(), inputData.getConstPtr() + inputData.getVolume(), cpuImagePtr);
+                #endif
                // Perform deep network forward pass
                upImpl->upCaffeNet->ForwardFrom(0);
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);

--- a/src/openpose/core/resizeAndMergeBase.cu
+++ b/src/openpose/core/resizeAndMergeBase.cu
@@ -7,8 +7,8 @@ namespace op
    const auto THREADS_PER_BLOCK_1D = 16u;

    template <typename T>
-    __global__ void resizeKernel(T* targetPtr, const T* const sourcePtr, const int sourceWidth, const int sourceHeight, const int targetWidth,
-                                 const int targetHeight)
+    __global__ void resizeKernel(T* targetPtr, const T* const sourcePtr, const int sourceWidth, const int sourceHeight,
+                                 const int targetWidth, const int targetHeight)
    {
        const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
        const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -20,13 +20,15 @@ namespace op
            const T xSource = (x + 0.5f) / scaleWidth - 0.5f;
            const T ySource = (y + 0.5f) / scaleHeight - 0.5f;

-            targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight, sourceWidth);
+            targetPtr[y*targetWidth+x] = bicubicInterpolate(sourcePtr, xSource, ySource, sourceWidth, sourceHeight,
+                                                            sourceWidth);
        }
    }

    template <typename T>
-    __global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const int sourceNumOffset, const int num, const T* scaleInputToNetInputs,
-                                         const int sourceWidth, const int sourceHeight, const int targetWidth, const int targetHeight)
+    __global__ void resizeKernelAndMerge(T* targetPtr, const T* const sourcePtr, const int sourceNumOffset,
+                                         const int num, const T* scaleInputToNetInputs, const int sourceWidth,
+                                         const int sourceHeight, const int targetWidth, const int targetHeight)
    {
        const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
        const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -70,7 +72,8 @@ namespace op
            const auto targetWidth = targetSize[3];

            const dim3 threadsPerBlock{THREADS_PER_BLOCK_1D, THREADS_PER_BLOCK_1D};
-            const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x), getNumberCudaBlocks(targetHeight, threadsPerBlock.y)};
+            const dim3 numBlocks{getNumberCudaBlocks(targetWidth, threadsPerBlock.x),
+                                 getNumberCudaBlocks(targetHeight, threadsPerBlock.y)};
            const auto sourceChannelOffset = sourceHeight * sourceWidth;
            const auto targetChannelOffset = targetWidth * targetHeight;

@@ -85,7 +88,8 @@ namespace op
                        const auto offset = offsetBase + c;
                        resizeKernel<<<numBlocks, threadsPerBlock>>>(targetPtr + offset * targetChannelOffset,
                                                                     sourcePtr + offset * sourceChannelOffset,
-                                                                     sourceWidth, sourceHeight, targetWidth, targetHeight);
+                                                                     sourceWidth, sourceHeight, targetWidth,
+                                                                     targetHeight);
                    }
                }
            }
@@ -94,20 +98,25 @@ namespace op
            {
                // If scale_number > 1 --> scaleInputToNetInputs must be set
                if (scaleInputToNetInputs.size() != num)
-                    error("The scale ratios size must be equal than the number of scales.", __LINE__, __FUNCTION__, __FILE__);
+                    error("The scale ratios size must be equal than the number of scales.",
+                          __LINE__, __FUNCTION__, __FILE__);
                const auto maxScales = 10;
                if (scaleInputToNetInputs.size() > maxScales)
-                    error("The maximum number of scales is " + std::to_string(maxScales) + ".", __LINE__, __FUNCTION__, __FILE__);
+                    error("The maximum number of scales is " + std::to_string(maxScales) + ".",
+                          __LINE__, __FUNCTION__, __FILE__);
                // Copy scaleInputToNetInputs
                T* scaleInputToNetInputsPtr;
                cudaMalloc((void**)&scaleInputToNetInputsPtr, maxScales * sizeof(T));
-                cudaMemcpy(scaleInputToNetInputsPtr, scaleInputToNetInputs.data(), scaleInputToNetInputs.size() * sizeof(T), cudaMemcpyHostToDevice);
+                cudaMemcpy(scaleInputToNetInputsPtr, scaleInputToNetInputs.data(),
+                           scaleInputToNetInputs.size() * sizeof(T), cudaMemcpyHostToDevice);
                // Perform resize + merging
                const auto sourceNumOffset = channels * sourceChannelOffset;
                for (auto c = 0 ; c < channels ; c++)
                    resizeKernelAndMerge<<<numBlocks, threadsPerBlock>>>(targetPtr + c * targetChannelOffset,
-                                                                         sourcePtr + c * sourceChannelOffset, sourceNumOffset,
-                                                                         num, scaleInputToNetInputsPtr, sourceWidth, sourceHeight, targetWidth, targetHeight);
+                                                                         sourcePtr + c * sourceChannelOffset,
+                                                                         sourceNumOffset, num,
+                                                                         scaleInputToNetInputsPtr, sourceWidth,
+                                                                         sourceHeight, targetWidth, targetHeight);
                // Free memory
                cudaFree(scaleInputToNetInputsPtr);
            }
@@ -120,8 +129,10 @@ namespace op
        }
    }

-    template void resizeAndMergeGpu(float* targetPtr, const float* const sourcePtr, const std::array<int, 4>& targetSize,
-                                    const std::array<int, 4>& sourceSize, const std::vector<float>& scaleInputToNetInputs);
-    template void resizeAndMergeGpu(double* targetPtr, const double* const sourcePtr, const std::array<int, 4>& targetSize,
-                                    const std::array<int, 4>& sourceSize, const std::vector<double>& scaleInputToNetInputs);
+    template void resizeAndMergeGpu(float* targetPtr, const float* const sourcePtr,
+                                    const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize,
+                                    const std::vector<float>& scaleInputToNetInputs);
+    template void resizeAndMergeGpu(double* targetPtr, const double* const sourcePtr,
+                                    const std::array<int, 4>& targetSize, const std::array<int, 4>& sourceSize,
+                                    const std::vector<double>& scaleInputToNetInputs);
 }
--- a/src/openpose/core/resizeAndMergeCaffe.cpp
+++ b/src/openpose/core/resizeAndMergeCaffe.cpp
@@ -9,7 +9,7 @@ namespace op
 {
    template <typename T>
    ResizeAndMergeCaffe<T>::ResizeAndMergeCaffe() :
-        mScaleRatios{1}
+        mScaleRatios{T(1)}
    {
        try
        {
@@ -158,7 +158,8 @@ namespace op
    }

    template <typename T>
-    void ResizeAndMergeCaffe<T>::Backward_gpu(const std::vector<caffe::Blob<T>*>& top, const std::vector<bool>& propagate_down,
+    void ResizeAndMergeCaffe<T>::Backward_gpu(const std::vector<caffe::Blob<T>*>& top,
+                                              const std::vector<bool>& propagate_down,
                                              const std::vector<caffe::Blob<T>*>& bottom)
    {
        try

--- a/src/openpose/core/scaleAndSizeExtractor.cpp
+++ b/src/openpose/core/scaleAndSizeExtractor.cpp
@@ -37,8 +37,25 @@ namespace op
            // Security checks
            if (inputResolution.area() <= 0)
                error("Wrong input element (empty cvInputData).", __LINE__, __FUNCTION__, __FILE__);
-            // scaleRatios & sizes - Reescale keeping aspect ratio
-            std::vector<double> scaleRatios(mScaleNumber, 1.f);
+            // Set poseNetInputSize
+            auto poseNetInputSize = mNetInputResolution;
+            if (poseNetInputSize.x <= 0 || poseNetInputSize.y <= 0)
+            {
+                // Security checks
+                if (poseNetInputSize.x <= 0 && poseNetInputSize.y <= 0)
+                    error("Only 1 of the dimensions of net input resolution can be <= 0.",
+                          __LINE__, __FUNCTION__, __FILE__);
+                if (poseNetInputSize.x <= 0)
+                    poseNetInputSize.x = 16 * intRound(
+                        poseNetInputSize.y * inputResolution.x / (float) inputResolution.y / 16.f
+                    );
+                else // if (poseNetInputSize.y <= 0)
+                    poseNetInputSize.y = 16 * intRound(
+                        poseNetInputSize.x * inputResolution.y / (float) inputResolution.x / 16.f
+                    );
+            }
+            // scaleInputToNetInputs & sizes - Reescale keeping aspect ratio
+            std::vector<double> scaleInputToNetInputs(mScaleNumber, 1.f);
            std::vector<Point<int>> sizes(mScaleNumber);
            for (auto i = 0; i < mScaleNumber; i++)
            {
@@ -47,13 +64,13 @@ namespace op
                    error("All scales must be in the range [0, 1], i.e. 0 <= 1-scale_number*scale_gap <= 1",
                          __LINE__, __FUNCTION__, __FILE__);

-                const auto targetWidth = fastTruncate(intRound(mNetInputResolution.x * currentScale) / 16 * 16, 1,
-                                                      mNetInputResolution.x);
-                const auto targetHeight = fastTruncate(intRound(mNetInputResolution.y * currentScale) / 16 * 16, 1,
-                                                       mNetInputResolution.y);
+                const auto targetWidth = fastTruncate(intRound(poseNetInputSize.x * currentScale) / 16 * 16, 1,
+                                                      poseNetInputSize.x);
+                const auto targetHeight = fastTruncate(intRound(poseNetInputSize.y * currentScale) / 16 * 16, 1,
+                                                       poseNetInputSize.y);
                const Point<int> targetSize{targetWidth, targetHeight};
-                scaleRatios[i] = resizeGetScaleFactor(inputResolution, targetSize);
-                sizes[i] = mNetInputResolution;
+                scaleInputToNetInputs[i] = resizeGetScaleFactor(inputResolution, targetSize);
+                sizes[i] = poseNetInputSize;
            }
            // scaleInputToOutput - Scale between input and desired output size
            Point<int> outputResolution;
@@ -71,7 +88,7 @@ namespace op
                scaleInputToOutput = 1.;
            }
            // Return result
-            return std::make_tuple(scaleRatios, sizes, scaleInputToOutput, outputResolution);
+            return std::make_tuple(scaleInputToNetInputs, sizes, scaleInputToOutput, outputResolution);
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/face/faceExtractor.cpp
+++ b/src/openpose/face/faceExtractor.cpp
@@ -6,7 +6,7 @@ namespace op
    FaceExtractor::FaceExtractor(const Point<int>& netInputSize, const Point<int>& netOutputSize,
                                 const std::vector<HeatMapType>& heatMapTypes, const ScaleMode heatMapScale) :
        mNetOutputSize{netOutputSize},
-        mFaceImageCrop{mNetOutputSize.area()*3},
+        mFaceImageCrop{{1, 3, mNetOutputSize.y, mNetOutputSize.x}},
        mHeatMapScaleMode{heatMapScale},
        mHeatMapTypes{heatMapTypes}
    {

--- a/src/openpose/face/faceExtractorCaffe.cpp
+++ b/src/openpose/face/faceExtractorCaffe.cpp
@@ -16,6 +16,7 @@ namespace op
    struct FaceExtractorCaffe::ImplFaceExtractorCaffe
    {
        #if defined USE_CAFFE && defined USE_CUDA
+            bool netInitialized;
            std::shared_ptr<NetCaffe> spNetCaffe;
            std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
            std::shared_ptr<MaximumCaffe<float>> spMaximumCaffe;
@@ -24,11 +25,9 @@ namespace op
            std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob;
            std::shared_ptr<caffe::Blob<float>> spPeaksBlob;

-            ImplFaceExtractorCaffe(const Point<int>& netOutputSize,
-                                   const std::string& modelFolder, const int gpuId,
-                                   const bool enableGoogleLogging) :
-                spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, netOutputSize.y, netOutputSize.x},
-                                                      modelFolder + FACE_PROTOTXT, modelFolder + FACE_TRAINED_MODEL,
+            ImplFaceExtractorCaffe(const std::string& modelFolder, const int gpuId, const bool enableGoogleLogging) :
+                netInitialized{false},
+                spNetCaffe{std::make_shared<NetCaffe>(modelFolder + FACE_PROTOTXT, modelFolder + FACE_TRAINED_MODEL,
                                                      gpuId, enableGoogleLogging)},
                spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
                spMaximumCaffe{std::make_shared<MaximumCaffe<float>>()}
@@ -69,6 +68,29 @@ namespace op
                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
            }
        }
+
+        inline void reshapeFaceExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
+                                              std::shared_ptr<MaximumCaffe<float>>& maximumCaffe,
+                                              boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& peaksBlob)
+        {
+            try
+            {
+                // HeatMaps extractor blob and layer
+                const bool mergeFirstDimension = true;
+                resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
+                                             FACE_CCN_DECREASE_FACTOR, mergeFirstDimension);
+                // Pose extractor blob and layer
+                maximumCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()});
+                // Cuda check
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
    #endif

    FaceExtractorCaffe::FaceExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
@@ -77,7 +99,7 @@ namespace op
                                           const ScaleMode heatMapScale, const bool enableGoogleLogging) :
        FaceExtractor{netInputSize, netOutputSize, heatMapTypes, heatMapScale}
        #if defined USE_CAFFE && defined USE_CUDA
-        , upImpl{new ImplFaceExtractorCaffe{mNetOutputSize, modelFolder, gpuId, enableGoogleLogging}}
+        , upImpl{new ImplFaceExtractorCaffe{modelFolder, gpuId, enableGoogleLogging}}
        #endif
    {
        try
@@ -110,20 +132,13 @@ namespace op
            #if defined USE_CAFFE && defined USE_CUDA
                // Logging
                log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
-                // Caffe net
+                // Initialize Caffe net
                upImpl->spNetCaffe->initializationOnThread();
-                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // HeatMaps extractor blob and layer
+                // Initialize blobs
+                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                const bool mergeFirstDimension = true;
-                upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()},
-                                                       {upImpl->spHeatMapsBlob.get()},
-                                                       FACE_CCN_DECREASE_FACTOR, mergeFirstDimension);
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // Pose extractor blob and layer
                upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                upImpl->spMaximumCaffe->Reshape({upImpl->spHeatMapsBlob.get()}, {upImpl->spPeaksBlob.get()});
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
                // Logging
                log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
@@ -207,11 +222,16 @@ namespace op
                            // cv::imshow("faceImage" + std::to_string(person), faceImage);

                            // 1. Caffe deep network
-                            auto* inputDataGpuPtr = upImpl->spNetCaffe->getInputDataGpuPtr();
-                            cudaMemcpy(inputDataGpuPtr, mFaceImageCrop.getPtr(),
-                                       mNetOutputSize.area() * 3 * sizeof(float),
-                                       cudaMemcpyHostToDevice);
-                            upImpl->spNetCaffe->forwardPass();
+                            upImpl->spNetCaffe->forwardPass(mFaceImageCrop);
+
+                            // Reshape blobs
+                            if (!upImpl->netInitialized)
+                            {
+                                upImpl->netInitialized = true;
+                                reshapeFaceExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spMaximumCaffe,
+                                                          upImpl->spCaffeNetOutputBlob, upImpl->spHeatMapsBlob,
+                                                          upImpl->spPeaksBlob);
+                            }

                            // 2. Resize heat maps + merge different scales
                            #ifdef USE_CUDA

--- a/src/openpose/filestream/cocoJsonSaver.cpp
+++ b/src/openpose/filestream/cocoJsonSaver.cpp
@@ -75,9 +75,9 @@ namespace op
                for (auto bodyPart = 0u ; bodyPart < indexesInCocoOrder.size() ; bodyPart++)
                {
                    const auto finalIndex = 3*(person*numberBodyParts + indexesInCocoOrder.at(bodyPart));
-                    mJsonOfstream.plainText(poseKeypoints[finalIndex]);
+                    mJsonOfstream.plainText(poseKeypoints[finalIndex] + 0.5f);
                    mJsonOfstream.comma();
-                    mJsonOfstream.plainText(poseKeypoints[finalIndex+1]);
+                    mJsonOfstream.plainText(poseKeypoints[finalIndex+1] + 0.5f);
                    mJsonOfstream.comma();
                    mJsonOfstream.plainText((poseKeypoints[finalIndex+2] > 0.f ? 1 : 0));
                    if (bodyPart < indexesInCocoOrder.size() - 1u)

--- a/src/openpose/hand/handExtractor.cpp
+++ b/src/openpose/hand/handExtractor.cpp
@@ -8,7 +8,7 @@ namespace op
                                 const std::vector<HeatMapType>& heatMapTypes, const ScaleMode heatMapScale) :
        mMultiScaleNumberAndRange{std::make_pair(numberScales, rangeScales)},
        mNetOutputSize{netOutputSize},
-        mHandImageCrop{mNetOutputSize.area()*3},
+        mHandImageCrop{{1, 3, mNetOutputSize.y, mNetOutputSize.x}},
        mHeatMapScaleMode{heatMapScale},
        mHeatMapTypes{heatMapTypes}
    {

--- a/src/openpose/hand/handExtractorCaffe.cpp
+++ b/src/openpose/hand/handExtractorCaffe.cpp
@@ -17,6 +17,7 @@ namespace op
    struct HandExtractorCaffe::ImplHandExtractorCaffe
    {
        #if defined USE_CAFFE && defined USE_CUDA
+            bool netInitialized;
            std::shared_ptr<NetCaffe> spNetCaffe;
            std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
            std::shared_ptr<MaximumCaffe<float>> spMaximumCaffe;
@@ -25,11 +26,10 @@ namespace op
            std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob;
            std::shared_ptr<caffe::Blob<float>> spPeaksBlob;

-            ImplHandExtractorCaffe(const Point<int>& netOutputSize,
-                                   const std::string& modelFolder, const int gpuId,
+            ImplHandExtractorCaffe(const std::string& modelFolder, const int gpuId,
                                   const bool enableGoogleLogging) :
-                spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, netOutputSize.y, netOutputSize.x},
-                                                      modelFolder + HAND_PROTOTXT, modelFolder + HAND_TRAINED_MODEL,
+                netInitialized{false},
+                spNetCaffe{std::make_shared<NetCaffe>(modelFolder + HAND_PROTOTXT, modelFolder + HAND_TRAINED_MODEL,
                                                      gpuId, enableGoogleLogging)},
                spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
                spMaximumCaffe{std::make_shared<MaximumCaffe<float>>()}
@@ -154,6 +154,29 @@ namespace op
                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
            }
        }
+
+        inline void reshapeFaceExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
+                                              std::shared_ptr<MaximumCaffe<float>>& maximumCaffe,
+                                              boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& peaksBlob)
+        {
+            try
+            {
+                // HeatMaps extractor blob and layer
+                const bool mergeFirstDimension = true;
+                resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
+                                             HAND_CCN_DECREASE_FACTOR, mergeFirstDimension);
+                // Pose extractor blob and layer
+                maximumCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()});
+                // Cuda check
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
    #endif

    HandExtractorCaffe::HandExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
@@ -164,7 +187,7 @@ namespace op
                                           const bool enableGoogleLogging) :
        HandExtractor{netInputSize, netOutputSize, numberScales, rangeScales, heatMapTypes, heatMapScale}
        #if defined USE_CAFFE && defined USE_CUDA
-        , upImpl{new ImplHandExtractorCaffe{mNetOutputSize, modelFolder, gpuId, enableGoogleLogging}}
+        , upImpl{new ImplHandExtractorCaffe{modelFolder, gpuId, enableGoogleLogging}}
        #endif
    {
        try
@@ -199,20 +222,13 @@ namespace op
            #if defined USE_CAFFE && defined USE_CUDA
                // Logging
                log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
-                // Caffe net
+                // Initialize Caffe net
                upImpl->spNetCaffe->initializationOnThread();
-                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // HeatMaps extractor blob and layer
+                // Initialize blobs
+                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                const bool mergeFirstDimension = true;
-                upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()},
-                                                       {upImpl->spHeatMapsBlob.get()},
-                                                       HAND_CCN_DECREASE_FACTOR, mergeFirstDimension);
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // Pose extractor blob and layer
                upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                upImpl->spMaximumCaffe->Reshape({upImpl->spHeatMapsBlob.get()}, {upImpl->spPeaksBlob.get()});
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
                // Logging
                log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
@@ -369,12 +385,17 @@ namespace op
        try
        {
            #if defined USE_CAFFE && defined USE_CUDA
-                // Deep net
-                // 1. Caffe deep network
-                auto* inputDataGpuPtr = upImpl->spNetCaffe->getInputDataGpuPtr();
-                cudaMemcpy(inputDataGpuPtr, mHandImageCrop.getConstPtr(), mNetOutputSize.area() * 3 * sizeof(float),
-                           cudaMemcpyHostToDevice);
-                upImpl->spNetCaffe->forwardPass();
+                // 1. Deep net
+                upImpl->spNetCaffe->forwardPass(mHandImageCrop);
+
+                // Reshape blobs
+                if (!upImpl->netInitialized)
+                {
+                    upImpl->netInitialized = true;
+                    reshapeFaceExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spMaximumCaffe,
+                                              upImpl->spCaffeNetOutputBlob, upImpl->spHeatMapsBlob,
+                                              upImpl->spPeaksBlob);
+                }

                // 2. Resize heat maps + merge different scales
                #ifdef USE_CUDA

--- a/src/openpose/pose/bodyPartConnectorBase.cpp
+++ b/src/openpose/pose/bodyPartConnectorBase.cpp
@@ -6,9 +6,10 @@
 namespace op
 {
    template <typename T>
-    void connectBodyPartsCpu(Array<T>& poseKeypoints, const T* const heatMapPtr, const T* const peaksPtr, const PoseModel poseModel,
-                             const Point<int>& heatMapSize, const int maxPeaks, const int interMinAboveThreshold,
-                             const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor)
+    void connectBodyPartsCpu(Array<T>& poseKeypoints, const T* const heatMapPtr, const T* const peaksPtr,
+                             const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks,
+                             const int interMinAboveThreshold, const T interThreshold, const int minSubsetCnt,
+                             const T minSubsetScore, const T scaleFactor)
    {
        try
        {
@@ -18,7 +19,8 @@ namespace op
            const auto numberBodyParts = POSE_NUMBER_BODY_PARTS[(int)poseModel];
            const auto numberBodyPartPairs = bodyPartPairs.size() / 2;

-            std::vector<std::pair<std::vector<int>, double>> subset;    // Vector<int> = Each body part + body parts counter; double = subsetScore
+            // Vector<int> = Each body part + body parts counter; double = subsetScore
+            std::vector<std::pair<std::vector<int>, double>> subset;
            const auto subsetCounterIndex = numberBodyParts;
            const auto subsetSize = numberBodyParts+1;

@@ -59,9 +61,12 @@ namespace op
                                if (!num)
                                {
                                    std::vector<int> rowVector(subsetSize, 0);
-                                    rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2; //store the index
-                                    rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
-                                    const auto subsetScore = candidateB[i*3+2]; //second last number in each row is the total score
+                                    // Store the index
+                                    rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2;
+                                    // Last number in each row is the parts number of that person
+                                    rowVector[subsetCounterIndex] = 1;
+                                    const auto subsetScore = candidateB[i*3+2];
+                                    // Second last number in each row is the total score
                                    subset.emplace_back(std::make_pair(rowVector, subsetScore));
                                }
                            }
@@ -71,14 +76,18 @@ namespace op
                            for (auto i = 1; i <= nB; i++)
                            {
                                std::vector<int> rowVector(subsetSize, 0);
-                                rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2; //store the index
-                                rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
-                                const auto subsetScore = candidateB[i*3+2]; //second last number in each row is the total score
+                                // Store the index
+                                rowVector[ bodyPartB ] = bodyPartB*peaksOffset + i*3 + 2;
+                                // Last number in each row is the parts number of that person
+                                rowVector[subsetCounterIndex] = 1;
+                                // Second last number in each row is the total score
+                                const auto subsetScore = candidateB[i*3+2];
                                subset.emplace_back(std::make_pair(rowVector, subsetScore));
                            }
                        }
                        else
-                            error("Unknown model, cast to int = " + std::to_string((int)poseModel), __LINE__, __FUNCTION__, __FILE__);
+                            error("Unknown model, cast to int = " + std::to_string((int)poseModel),
+                                  __LINE__, __FUNCTION__, __FILE__);
                    }
                    else // if (nA != 0 && nB == 0)
                    {
@@ -101,9 +110,12 @@ namespace op
                                if (!num)
                                {
                                    std::vector<int> rowVector(subsetSize, 0);
-                                    rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2; //store the index
-                                    rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
-                                    const auto subsetScore = candidateA[i*3+2]; //second last number in each row is the total score
+                                    // Store the index
+                                    rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2;
+                                    // Last number in each row is the parts number of that person
+                                    rowVector[subsetCounterIndex] = 1;
+                                    // Second last number in each row is the total score
+                                    const auto subsetScore = candidateA[i*3+2];
                                    subset.emplace_back(std::make_pair(rowVector, subsetScore));
                                }
                            }
@@ -113,14 +125,18 @@ namespace op
                            for (auto i = 1; i <= nA; i++)
                            {
                                std::vector<int> rowVector(subsetSize, 0);
-                                rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2; //store the index
-                                rowVector[subsetCounterIndex] = 1; //last number in each row is the parts number of that person
-                                const auto subsetScore = candidateA[i*3+2]; //second last number in each row is the total score
+                                // Store the index
+                                rowVector[ bodyPartA ] = bodyPartA*peaksOffset + i*3 + 2;
+                                // Last number in each row is the parts number of that person
+                                rowVector[subsetCounterIndex] = 1;
+                                // Second last number in each row is the total score
+                                const auto subsetScore = candidateA[i*3+2];
                                subset.emplace_back(std::make_pair(rowVector, subsetScore));
                            }
                        }
                        else
-                            error("Unknown model, cast to int = " + std::to_string((int)poseModel), __LINE__, __FUNCTION__, __FILE__);
+                            error("Unknown model, cast to int = " + std::to_string((int)poseModel),
+                                  __LINE__, __FUNCTION__, __FILE__);
                    }
                }
                else // if (nA != 0 && nB != 0)
@@ -216,9 +232,10 @@ namespace op
                        }
                    }
                    // Add ears connections (in case person is looking to opposite direction to camera)
-                    else if (((poseModel == PoseModel::COCO_18 || poseModel == PoseModel::BODY_18) && (pairIndex==17 || pairIndex==18))
-                                || (poseModel == PoseModel::BODY_19 && (pairIndex==18 || pairIndex==19))
-                                || (poseModel == PoseModel::BODY_23 && (pairIndex==22 || pairIndex==23)))
+                    else if (((poseModel == PoseModel::COCO_18
+                                || poseModel == PoseModel::BODY_18) && (pairIndex==17 || pairIndex==18))
+                             || (poseModel == PoseModel::BODY_19 && (pairIndex==18 || pairIndex==19))
+                             || (poseModel == PoseModel::BODY_23 && (pairIndex==22 || pairIndex==23)))
                    {
                        for (const auto& connectionKI : connectionK)
                        {
@@ -291,7 +308,8 @@ namespace op
                        break;
                }
                else if (subsetCounter < 1)
-                    error("Bad subsetCounter. Bug in this function if this happens.", __LINE__, __FUNCTION__, __FILE__);
+                    error("Bad subsetCounter. Bug in this function if this happens.",
+                          __LINE__, __FUNCTION__, __FILE__);
            }

            // Fill and return poseKeypoints
@@ -327,10 +345,16 @@ namespace op
        }
    }

-    template void connectBodyPartsCpu(Array<float>& poseKeypoints, const float* const heatMapPtr, const float* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
-                                      const int maxPeaks, const int interMinAboveThreshold, const float interThreshold, const int minSubsetCnt,
-                                      const float minSubsetScore, const float scaleFactor);
-    template void connectBodyPartsCpu(Array<double>& poseKeypoints, const double* const heatMapPtr, const double* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
-                                      const int maxPeaks, const int interMinAboveThreshold, const double interThreshold, const int minSubsetCnt,
-                                      const double minSubsetScore, const double scaleFactor);
+    template void connectBodyPartsCpu(Array<float>& poseKeypoints, const float* const heatMapPtr,
+                                      const float* const peaksPtr, const PoseModel poseModel,
+                                      const Point<int>& heatMapSize, const int maxPeaks,
+                                      const int interMinAboveThreshold, const float interThreshold,
+                                      const int minSubsetCnt, const float minSubsetScore,
+                                      const float scaleFactor);
+    template void connectBodyPartsCpu(Array<double>& poseKeypoints, const double* const heatMapPtr,
+                                      const double* const peaksPtr, const PoseModel poseModel,
+                                      const Point<int>& heatMapSize, const int maxPeaks,
+                                      const int interMinAboveThreshold, const double interThreshold,
+                                      const int minSubsetCnt, const double minSubsetScore,
+                                      const double scaleFactor);
 }
--- a/src/openpose/pose/poseCpuRenderer.cpp
+++ b/src/openpose/pose/poseCpuRenderer.cpp
 #include <openpose/pose/renderPose.hpp>
+#include <openpose/utilities/keypoint.hpp>
 #include <openpose/pose/poseCpuRenderer.hpp>

 namespace op
@@ -13,6 +14,7 @@ namespace op

    std::pair<int, std::string> PoseCpuRenderer::renderPose(Array<float>& outputData,
                                                            const Array<float>& poseKeypoints,
+                                                            const float scaleInputToOutput,
                                                            const float scaleNetToOutput)
    {
        try
@@ -25,7 +27,14 @@ namespace op
            std::string elementRenderedName;
            // Draw poseKeypoints
            if (elementRendered == 0)
-                renderPoseKeypointsCpu(outputData, poseKeypoints, mPoseModel, mRenderThreshold, mBlendOriginalFrame);
+            {
+                // Rescale keypoints to output size
+                auto poseKeypointsRescaled = poseKeypoints.clone();
+                scaleKeypoints(poseKeypointsRescaled, scaleInputToOutput);
+                // Render keypoints
+                renderPoseKeypointsCpu(outputData, poseKeypointsRescaled, mPoseModel, mRenderThreshold,
+                                       mBlendOriginalFrame);
+            }
            // Draw heat maps / PAFs
            else
            {

--- a/src/openpose/pose/poseExtractor.cpp
+++ b/src/openpose/pose/poseExtractor.cpp
@@ -43,12 +43,10 @@ namespace op
        }
    }

-    PoseExtractor::PoseExtractor(const Point<int>& netOutputSize, const Point<int>& outputSize,
-                                 const PoseModel poseModel, const std::vector<HeatMapType>& heatMapTypes,
+    PoseExtractor::PoseExtractor(const PoseModel poseModel, const std::vector<HeatMapType>& heatMapTypes,
                                 const ScaleMode heatMapScale) :
        mPoseModel{poseModel},
-        mNetOutputSize{netOutputSize},
-        mOutputSize{outputSize},
+        mNetOutputSize{0, 0},
        mHeatMapTypes{heatMapTypes},
        mHeatMapScaleMode{heatMapScale}
    {

--- a/src/openpose/pose/poseExtractorCaffe.cpp
+++ b/src/openpose/pose/poseExtractorCaffe.cpp
@@ -17,7 +17,7 @@ namespace op
    struct PoseExtractorCaffe::ImplPoseExtractorCaffe
    {
        #ifdef USE_CAFFE
-            const float mResizeScale;
+            std::vector<int> mNetInputSize4D;
            std::shared_ptr<NetCaffe> spNetCaffe;
            std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeCaffe;
            std::shared_ptr<NmsCaffe<float>> spNmsCaffe;
@@ -28,13 +28,10 @@ namespace op
            std::shared_ptr<caffe::Blob<float>> spPeaksBlob;
            std::shared_ptr<caffe::Blob<float>> spPoseBlob;

-            ImplPoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
-                                   const int scaleNumber, const PoseModel poseModel, const int gpuId,
+            ImplPoseExtractorCaffe(const PoseModel poseModel, const int gpuId,
                                   const std::string& modelFolder, const bool enableGoogleLogging) :
-                mResizeScale{netOutputSize.x / (float)netInputSize.x},
-                spNetCaffe{std::make_shared<NetCaffe>(std::array<int,4>{scaleNumber, 3, (int)netInputSize.y,
-                                                      (int)netInputSize.x},
-                                                      modelFolder + POSE_PROTOTXT[(int)poseModel],
+                mNetInputSize4D{0,0,0,0},
+                spNetCaffe{std::make_shared<NetCaffe>(modelFolder + POSE_PROTOTXT[(int)poseModel],
                                                      modelFolder + POSE_TRAINED_MODEL[(int)poseModel], gpuId,
                                                      enableGoogleLogging)},
                spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
@@ -45,32 +42,66 @@ namespace op
        #endif
    };

-    PoseExtractorCaffe::PoseExtractorCaffe(const Point<int>& netInputSize, const Point<int>& netOutputSize,
-                                           const Point<int>& outputSize, const int scaleNumber,
-                                           const PoseModel poseModel, const std::string& modelFolder,
+    #ifdef USE_CAFFE
+        inline void reshapePoseExtractorCaffe(std::shared_ptr<ResizeAndMergeCaffe<float>>& resizeAndMergeCaffe,
+                                              std::shared_ptr<NmsCaffe<float>>& nmsCaffe,
+                                              std::shared_ptr<BodyPartConnectorCaffe<float>>& bodyPartConnectorCaffe,
+                                              boost::shared_ptr<caffe::Blob<float>>& caffeNetOutputBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& heatMapsBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& peaksBlob,
+                                              std::shared_ptr<caffe::Blob<float>>& poseBlob,
+                                              const float scaleInputToNetInput,
+                                              const PoseModel poseModel)
+        {
+            try
+            {
+                // HeatMaps extractor blob and layer
+                UNUSED(scaleInputToNetInput);
+                resizeAndMergeCaffe->Reshape({caffeNetOutputBlob.get()}, {heatMapsBlob.get()},
+                                             POSE_CCN_DECREASE_FACTOR[(int)poseModel]);
+                // Pose extractor blob and layer
+                nmsCaffe->Reshape({heatMapsBlob.get()}, {peaksBlob.get()}, POSE_MAX_PEAKS[(int)poseModel]);
+                // Pose extractor blob and layer
+                bodyPartConnectorCaffe->Reshape({heatMapsBlob.get(), peaksBlob.get()}, {poseBlob.get()});
+                // Cuda check
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
+
+        inline bool requiredReshapePoseExtractorCaffe(const std::vector<int>& dimensionsA,
+                                                      const std::vector<int>& dimensionsB)
+        {
+            try
+            {
+                return (dimensionsA[0] != dimensionsB[0] || dimensionsA[1] != dimensionsB[1]
+                        || dimensionsA[2] != dimensionsB[2] || dimensionsA[3] != dimensionsB[3]);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+                return false;
+            }
+        }
+    #endif
+
+    PoseExtractorCaffe::PoseExtractorCaffe(const PoseModel poseModel, const std::string& modelFolder,
                                           const int gpuId, const std::vector<HeatMapType>& heatMapTypes,
                                           const ScaleMode heatMapScale, const bool enableGoogleLogging) :
-        PoseExtractor{netOutputSize, outputSize, poseModel, heatMapTypes, heatMapScale}
+        PoseExtractor{poseModel, heatMapTypes, heatMapScale}
        #ifdef USE_CAFFE
-        , upImpl{new ImplPoseExtractorCaffe{netInputSize, netOutputSize, scaleNumber, poseModel,
-                                            gpuId, modelFolder, enableGoogleLogging}}
+        , upImpl{new ImplPoseExtractorCaffe{poseModel, gpuId, modelFolder, enableGoogleLogging}}
        #endif
    {
        try
        {
            #ifdef USE_CAFFE
-                const auto resizeScale = mNetOutputSize.x / (float)netInputSize.x;
-                const auto resizeScaleCheck = resizeScale / (mNetOutputSize.y/(float)netInputSize.y);
-                if (1+1e-6 < resizeScaleCheck || resizeScaleCheck < 1-1e-6)
-                    error("Net input and output size must be proportional. resizeScaleCheck = "
-                          + std::to_string(resizeScaleCheck), __LINE__, __FUNCTION__, __FILE__);
                // Layers parameters
                upImpl->spBodyPartConnectorCaffe->setPoseModel(mPoseModel);
            #else
-                UNUSED(netInputSize);
-                UNUSED(netOutputSize);
-                UNUSED(outputSize);
-                UNUSED(scaleNumber);
                UNUSED(poseModel);
                UNUSED(modelFolder);
                UNUSED(gpuId);
@@ -97,24 +128,14 @@ namespace op
            #ifdef USE_CAFFE
                // Logging
                log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
-                // Caffe net
+                // Initialize Caffe net
                upImpl->spNetCaffe->initializationOnThread();
-                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // HeatMaps extractor blob and layer
+                // Initialize blobs
+                upImpl->spCaffeNetOutputBlob = upImpl->spNetCaffe->getOutputBlob();
                upImpl->spHeatMapsBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                upImpl->spResizeAndMergeCaffe->Reshape({upImpl->spCaffeNetOutputBlob.get()}, {upImpl->spHeatMapsBlob.get()},
-                                                        upImpl->mResizeScale * POSE_CCN_DECREASE_FACTOR[(int)mPoseModel]);
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // Pose extractor blob and layer
                upImpl->spPeaksBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                upImpl->spNmsCaffe->Reshape({upImpl->spHeatMapsBlob.get()},
-                                            {upImpl->spPeaksBlob.get()}, POSE_MAX_PEAKS[(int)mPoseModel]);
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                // Pose extractor blob and layer
                upImpl->spPoseBlob = {std::make_shared<caffe::Blob<float>>(1,1,1,1)};
-                upImpl->spBodyPartConnectorCaffe->Reshape({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
-                                                          {upImpl->spPoseBlob.get()});
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
                // Logging
                log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
@@ -127,7 +148,7 @@ namespace op
    }

    void PoseExtractorCaffe::forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize,
-                                         const std::vector<double>& scaleRatios)
+                                         const std::vector<double>& scaleInputToNetInputs)
    {
        try
        {
@@ -137,10 +158,21 @@ namespace op
                    error("Empty inputNetData.", __LINE__, __FUNCTION__, __FILE__);

                // 1. Caffe deep network
-                upImpl->spNetCaffe->forwardPass(inputNetData.getConstPtr());                                    // ~80ms
+                upImpl->spNetCaffe->forwardPass(inputNetData);                                                  // ~80ms
+
+                // Reshape blobs if required
+                if (requiredReshapePoseExtractorCaffe(upImpl->mNetInputSize4D, inputNetData.getSize()))
+                {
+                    upImpl->mNetInputSize4D = inputNetData.getSize();
+                    mNetOutputSize = Point<int>{upImpl->mNetInputSize4D[3], upImpl->mNetInputSize4D[2]};
+                    reshapePoseExtractorCaffe(upImpl->spResizeAndMergeCaffe, upImpl->spNmsCaffe,
+                                              upImpl->spBodyPartConnectorCaffe, upImpl->spCaffeNetOutputBlob,
+                                              upImpl->spHeatMapsBlob, upImpl->spPeaksBlob, upImpl->spPoseBlob,
+                                              scaleInputToNetInputs[0], mPoseModel);
+                }

                // 2. Resize heat maps + merge different scales
-                const std::vector<float> floatScaleRatios(scaleRatios.begin(), scaleRatios.end());
+                const std::vector<float> floatScaleRatios(scaleInputToNetInputs.begin(), scaleInputToNetInputs.end());
                upImpl->spResizeAndMergeCaffe->setScaleRatios(floatScaleRatios);
                #ifdef USE_CUDA
                    upImpl->spResizeAndMergeCaffe->Forward_gpu({upImpl->spCaffeNetOutputBlob.get()},            // ~5ms
@@ -159,17 +191,15 @@ namespace op
                    error("NmsCaffe CPU version not implemented yet.", __LINE__, __FUNCTION__, __FILE__);
                #endif

-                // Get scale net to output
+                // Get scale net to output (i.e. image input)
                const auto scaleProducerToNetInput = resizeGetScaleFactor(inputDataSize, mNetOutputSize);
                const Point<int> netSize{intRound(scaleProducerToNetInput*inputDataSize.x),
                                         intRound(scaleProducerToNetInput*inputDataSize.y)};
-                if (mOutputSize.x > 0 && mOutputSize.y > 0)
-                    mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, mOutputSize)};
-                else
-                    mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, inputDataSize)};
+                mScaleNetToOutput = {(float)resizeGetScaleFactor(netSize, inputDataSize)};

                // 4. Connecting body parts
                upImpl->spBodyPartConnectorCaffe->setScaleNetToOutput(mScaleNetToOutput);
+                // upImpl->spBodyPartConnectorCaffe->setScaleNetToOutput(1);
                upImpl->spBodyPartConnectorCaffe->setInterMinAboveThreshold(
                    (int)get(PoseProperty::ConnectInterMinAboveThreshold)
                );
@@ -178,14 +208,19 @@ namespace op
                upImpl->spBodyPartConnectorCaffe->setMinSubsetScore((float)get(PoseProperty::ConnectMinSubsetScore));

                // GPU version not implemented yet
-                upImpl->spBodyPartConnectorCaffe->Forward_cpu({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
-                                                               mPoseKeypoints);
-                // upImpl->spBodyPartConnectorCaffe->Forward_gpu({upImpl->spHeatMapsBlob.get(), upImpl->spPeaksBlob.get()},
-                //                                               {upImpl->spPoseBlob.get()}, mPoseKeypoints);
+                // #ifdef USE_CUDA
+                //     upImpl->spBodyPartConnectorCaffe->Forward_gpu({upImpl->spHeatMapsBlob.get(),
+                //                                                    upImpl->spPeaksBlob.get()},
+                //                                                   {upImpl->spPoseBlob.get()}, mPoseKeypoints);
+                // #else
+                    upImpl->spBodyPartConnectorCaffe->Forward_cpu({upImpl->spHeatMapsBlob.get(),
+                                                                   upImpl->spPeaksBlob.get()},
+                                                                  mPoseKeypoints);
+                // #endif
            #else
                UNUSED(inputNetData);
                UNUSED(inputDataSize);
-                UNUSED(scaleRatios);
+                UNUSED(scaleInputToNetInputs);
            #endif
        }
        catch (const std::exception& e)

--- a/src/openpose/pose/poseGpuRenderer.cpp
+++ b/src/openpose/pose/poseGpuRenderer.cpp
@@ -5,6 +5,7 @@
 #include <openpose/pose/poseParameters.hpp>
 #include <openpose/pose/renderPose.hpp>
 #include <openpose/utilities/cuda.hpp>
+#include <openpose/utilities/keypoint.hpp>
 #include <openpose/pose/poseGpuRenderer.hpp>

 namespace op
@@ -61,6 +62,7 @@ namespace op

    std::pair<int, std::string> PoseGpuRenderer::renderPose(Array<float>& outputData,
                                                            const Array<float>& poseKeypoints,
+                                                            const float scaleInputToOutput,
                                                            const float scaleNetToOutput)
    {
        try
@@ -83,9 +85,14 @@ namespace op
                    // Draw poseKeypoints
                    if (elementRendered == 0)
                    {
+                        // Rescale keypoints to output size
+                        auto poseKeypointsRescaled = poseKeypoints.clone();
+                        scaleKeypoints(poseKeypointsRescaled, scaleInputToOutput);
+                        // Render keypoints
                        if (!poseKeypoints.empty())
                            cudaMemcpy(pGpuPose,
-                                       poseKeypoints.getConstPtr(), numberPeople * numberBodyParts * 3 * sizeof(float),
+                                       poseKeypointsRescaled.getConstPtr(),
+                                       numberPeople * numberBodyParts * 3 * sizeof(float),
                                       cudaMemcpyHostToDevice);
                        renderPoseKeypointsGpu(*spGpuMemory, mPoseModel, numberPeople, frameSize, pGpuPose,
                                               mRenderThreshold, mShowGooglyEyes, mBlendOriginalFrame,
@@ -104,7 +111,7 @@ namespace op
                            elementRenderedName = mPartIndexToName.at(elementRendered-1);
                            renderPoseHeatMapGpu(*spGpuMemory, mPoseModel, frameSize,
                                                 spPoseExtractor->getHeatMapGpuConstPtr(),
-                                                 heatMapSize, scaleNetToOutput, elementRendered,
+                                                 heatMapSize, scaleNetToOutput * scaleInputToOutput, elementRendered,
                                                 (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
                        }
                        // Draw PAFs (Part Affinity Fields)
@@ -113,7 +120,7 @@ namespace op
                            elementRenderedName = "Heatmaps";
                            renderPoseHeatMapsGpu(*spGpuMemory, mPoseModel, frameSize,
                                                  spPoseExtractor->getHeatMapGpuConstPtr(),
-                                                  heatMapSize, scaleNetToOutput,
+                                                  heatMapSize, scaleNetToOutput * scaleInputToOutput,
                                                  (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
                        }
                        // Draw PAFs (Part Affinity Fields)
@@ -122,7 +129,7 @@ namespace op
                            elementRenderedName = "PAFs (Part Affinity Fields)";
                            renderPosePAFsGpu(*spGpuMemory, mPoseModel, frameSize,
                                              spPoseExtractor->getHeatMapGpuConstPtr(),
-                                              heatMapSize, scaleNetToOutput,
+                                              heatMapSize, scaleNetToOutput * scaleInputToOutput,
                                              (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
                        }
                        // Draw affinity between 2 body parts
@@ -134,7 +141,7 @@ namespace op
                            elementRenderedName = elementRenderedName.substr(0, elementRenderedName.find("("));
                            renderPosePAFGpu(*spGpuMemory, mPoseModel, frameSize,
                                             spPoseExtractor->getHeatMapGpuConstPtr(),
-                                             heatMapSize, scaleNetToOutput, affinityPartMapped,
+                                             heatMapSize, scaleNetToOutput * scaleInputToOutput, affinityPartMapped,
                                             (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
                        }
                    }

--- a/src/openpose/utilities/keypoint.cpp
+++ b/src/openpose/utilities/keypoint.cpp
@@ -5,7 +5,8 @@

 namespace op
 {
-    const std::string errorMessage = "The Array<float> is not a RGB image. This function is only for array of dimension: [sizeA x sizeB x 3].";
+    const std::string errorMessage = "The Array<float> is not a RGB image. This function is only for array of"
+                                     " dimension: [sizeA x sizeB x 3].";

    float getDistance(const Array<float>& keypoints, const int person, const int elementA, const int elementB)
    {
@@ -29,7 +30,8 @@ namespace op
        {
            // Security checks
            if (keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions())
-                error("keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions().", __LINE__, __FUNCTION__, __FILE__);
+                error("keypointsA.getNumberDimensions() != keypointsB.getNumberDimensions().",
+                      __LINE__, __FUNCTION__, __FILE__);
            for (auto dimension = 1u ; dimension < keypointsA.getNumberDimensions() ; dimension++)
                if (keypointsA.getSize(dimension) != keypointsB.getSize(dimension))
                    error("keypointsA.getSize() != keypointsB.getSize().", __LINE__, __FUNCTION__, __FILE__);
@@ -96,7 +98,8 @@ namespace op
        }
    }

-    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY)
+    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX,
+                        const float offsetY)
    {
        try
        {
@@ -127,8 +130,9 @@ namespace op
        }
    }

-    void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints, const std::vector<unsigned int>& pairs,
-                            const std::vector<float> colors, const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle,
+    void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints,
+                            const std::vector<unsigned int>& pairs, const std::vector<float> colors,
+                            const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle,
                            const float threshold)
    {
        try
@@ -160,12 +164,15 @@ namespace op
                // Keypoints
                for (auto person = 0 ; person < keypoints.getSize(0) ; person++)
                {
-                    const auto personRectangle = getKeypointsRectangle(keypoints, person, numberKeypoints, thresholdRectangle);
+                    const auto personRectangle = getKeypointsRectangle(keypoints, person, numberKeypoints,
+                                                                       thresholdRectangle);
                    if (personRectangle.area() > 0)
                    {
-                        const auto ratioAreas = fastMin(1.f, fastMax(personRectangle.width/(float)width, personRectangle.height/(float)height));
+                        const auto ratioAreas = fastMin(1.f, fastMax(personRectangle.width/(float)width,
+                                                                     personRectangle.height/(float)height));
                        // Size-dependent variables
-                        const auto thicknessRatio = fastMax(intRound(std::sqrt(area)*thicknessCircleRatio * ratioAreas), 2);
+                        const auto thicknessRatio = fastMax(intRound(std::sqrt(area)
+                                                                     * thicknessCircleRatio * ratioAreas), 2);
                        // Negative thickness in cv::circle means that a filled circle is to be drawn.
                        const auto thicknessCircle = (ratioAreas > 0.05 ? thicknessRatio : -1);
                        const auto thicknessLine = intRound(thicknessRatio * thicknessLineRatioWRTCircle);
@@ -200,7 +207,8 @@ namespace op
                                const cv::Scalar color{colors[colorIndex % numberColors],
                                                       colors[(colorIndex+1) % numberColors],
                                                       colors[(colorIndex+2) % numberColors]};
-                                const cv::Point center{intRound(keypoints[faceIndex]), intRound(keypoints[faceIndex+1])};
+                                const cv::Point center{intRound(keypoints[faceIndex]),
+                                                       intRound(keypoints[faceIndex+1])};
                                cv::circle(frameR, center, radius, color[0], thicknessCircle, lineType, shift);
                                cv::circle(frameG, center, radius, color[1], thicknessCircle, lineType, shift);
                                cv::circle(frameB, center, radius, color[2], thicknessCircle, lineType, shift);
@@ -216,7 +224,8 @@ namespace op
        }
    }

-    Rectangle<float> getKeypointsRectangle(const Array<float>& keypoints, const int person, const int numberKeypoints, const float threshold)
+    Rectangle<float> getKeypointsRectangle(const Array<float>& keypoints, const int person, const int numberKeypoints,
+                                           const float threshold)
    {
        try
        {

--- a/src/openpose/utilities/openCv.cpp
+++ b/src/openpose/utilities/openCv.cpp
@@ -147,8 +147,8 @@ namespace op
    {
        try
        {
-            const auto ratioWidth = targetSize.x / (double)initialSize.x;
-            const auto ratioHeight = targetSize.y / (double)initialSize.y;
+            const auto ratioWidth = (targetSize.x - 1) / (double)(initialSize.x - 1);
+            const auto ratioHeight = (targetSize.y - 1) / (double)(initialSize.y - 1);
            return fastMin(ratioWidth, ratioHeight);
        }
        catch (const std::exception& e)