未验证 提交 5608758a 编写于 作者: W Wang Feng 提交者: GitHub

feat(detection): Add Faster-RCNN in detection (#28)

上级 03625d2f
......@@ -47,7 +47,3 @@ jobs:
exit $pylint_ret
fi
echo "All lint steps passed!"
- name: Import hubconf check
run: |
python -c "import hubconf"
......@@ -75,9 +75,10 @@ export PYTHONPATH=/path/to/models:$PYTHONPATH
目标检测同样是计算机视觉中的常见任务,我们提供了一个经典的目标检测模型[retinanet](./official/vision/detection),这个模型在**COCO验证集**上的测试结果如下:
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.0 |
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.0 |
| faster-rcnn-fpn-res50-1x-800size | 37.3 |
### 图像分割
......
......@@ -28,10 +28,15 @@ from official.nlp.bert.model import (
wwm_cased_L_24_H_1024_A_16,
)
from official.vision.detection.faster_rcnn_fpn_res50_coco_1x_800size import (
faster_rcnn_fpn_res50_coco_1x_800size,
)
from official.vision.detection.retinanet_res50_coco_1x_800size import (
retinanet_res50_coco_1x_800size,
)
from official.vision.detection.models import RetinaNet
from official.vision.detection.models import FasterRCNN, RetinaNet
from official.vision.detection.tools.test import DetEvaluator
from official.vision.segmentation.deeplabv3plus import (
......
# Megengine RetinaNet
# Megengine Detection Models
## 介绍
本目录包含了采用MegEngine实现的经典[RetinaNet](https://arxiv.org/pdf/1708.02002>)网络结构,同时提供了在COCO2017数据集上的完整训练和测试代码。
本目录包含了采用MegEngine实现的经典网络结构,包括[RetinaNet](https://arxiv.org/pdf/1708.02002>)[Faster R-CNN with FPN](https://arxiv.org/pdf/1612.03144.pdf),同时提供了在COCO2017数据集上的完整训练和测试代码。
网络的性能在COCO2017验证集上的测试结果如下:
网络的性能在COCO2017数据集上的测试结果如下:
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | speed<br>(8gpu) | speed<br>(1gpu) |
| --- | --- | --- | --- | --- | --- |
| retinanet-res50-coco-1x-800size | 36.0 | 2 | 2080ti | 2.27(it/s) | 3.7(it/s) |
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | trainging speed<br>(8gpu) | training speed<br>(1gpu) |
| --- | --- | --- | --- | --- | --- |
| retinanet-res50-coco-1x-800size | 36.0 | 2 | 2080Ti | 2.27(it/s) | 3.7(it/s) |
| faster-rcnn-fpn-res50-coco-1x-800size | 37.3 | 2 | 2080Ti | 1.9(it/s) | 3.1(it/s) |
* MegEngine v0.4.0
## 如何使用
模型训练好之后,可以通过如下命令测试单张图片:
以RetinaNet为例,模型训练好之后,可以通过如下命令测试单张图片:
```bash
python3 tools/inference.py -f retinanet_res50_coco_1x_800size.py \
......@@ -60,17 +61,33 @@ python3 tools/train.py -f retinanet_res50_coco_1x_800size.py \
`tools/train.py`提供了灵活的命令行选项,包括:
- `-f`, 所需要训练的网络结构描述文件。
- `-f`, 所需要训练的网络结构描述文件。可以是RetinaNet、Faster R-CNN等.
- `-n`, 用于训练的devices(gpu)数量,默认使用所有可用的gpu.
- `-w`, 预训练的backbone网络权重的路径。
- `--batch_size`,训练时采用的`batch size`, 默认2,表示每张卡训2张图。
- `--dataset-dir`, COCO2017数据集的上级目录,默认`/data/datasets`
默认情况下模型会存在 `log-of-retinanet_res50_1x_800size`目录下。
默认情况下模型会存在 `log-of-模型名`目录下。
5. 编译可能需要的lib
GPU NMS位于tools下的GPU NMS文件夹下面,我们需要进入tools文件夹下进行编译.
首先需要找到MegEngine编译的头文件所在路径,可以通过命令
```bash
python3 -c "import megengine as mge; print(mge.__file__)"
```
将输出结果中__init__.py之前的部分复制(以MegEngine结尾),将其赋值给shell变量MGE,接下来,运行如下命令进行编译。
```bash
cd tools
nvcc -I $MGE/_internal/include -shared -o lib_nms.so -Xcompiler "-fno-strict-aliasing -fPIC" gpu_nms/nms.cu
```
## 如何测试
训练的过程中,可以通过如下命令测试模型在`COCO2017`验证集的性能:
得到训练完保存的模型之后,可以通过tools下的test.py文件测试模型在`COCO2017`验证集的性能:
```bash
python3 tools/test.py -f retinanet_res50_coco_1x_800size.py \
......@@ -89,5 +106,6 @@ python3 tools/test.py -f retinanet_res50_coco_1x_800size.py \
## 参考文献
- [Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002) Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár. Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.
- [Microsoft COCO: Common Objects in Context](https://arxiv.org/pdf/1405.0312.pdf) Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Dollár, Piotr and Zitnick, C Lawrence
Lin T Y, Maire M, Belongie S, et al. European conference on computer vision. Springer, Cham, 2014: 740-755.
- [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/pdf/1506.01497.pdf) S. Ren, K. He, R. Girshick, and J. Sun. In: Neural Information Processing Systems(NIPS)(2015).
- [Feature Pyramid Networks for Object Detection](https://arxiv.org/pdf/1612.03144.pdf) T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 936-944, doi: 10.1109/CVPR.2017.106.
- [Microsoft COCO: Common Objects in Context](https://arxiv.org/pdf/1405.0312.pdf) Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Dollár, Piotr and Zitnick, C Lawrence, Lin T Y, Maire M, Belongie S, et al. European conference on computer vision. Springer, Cham, 2014: 740-755.
# -*- coding: utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from megengine import hub
from official.vision.detection import models
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_ec2e80b9_res50_1x_800size_37dot3.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
`"FPN" <https://arxiv.org/abs/1612.03144>`_
`"COCO" <https://arxiv.org/abs/1405.0312>`_
"""
return models.FasterRCNN(models.FasterRCNNConfig(), batch_size=batch_size, **kwargs)
Net = models.FasterRCNN
Cfg = models.FasterRCNNConfig
# -*- coding: utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from megengine import hub
from official.vision.detection import models
class CustomFasterRCNNFPNConfig(models.FasterRCNNConfig):
def __init__(self):
super().__init__()
self.resnet_norm = "SyncBN"
self.fpn_norm = "SyncBN"
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_cf5c020b_res50_1x_800size_syncbn_37dot6.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
`"FPN" <https://arxiv.org/abs/1612.03144>`_
`"COCO" <https://arxiv.org/abs/1405.0312>`_
`"SyncBN" <https://arxiv.org/abs/1711.07240>`_
"""
return models.FasterRCNN(CustomFasterRCNNFPNConfig(), batch_size=batch_size, **kwargs)
Net = models.FasterRCNN
Cfg = CustomFasterRCNNFPNConfig
......@@ -22,7 +22,7 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
# This file has been modified by Megvii ("Megvii Modifications").
# All Megvii Modifications are Copyright (C) 2014-2019 Megvii Inc. All rights reserved.
# All Megvii Modifications are Copyright (C) 2014-2020 Megvii Inc. All rights reserved.
# ---------------------------------------------------------------------
from collections import namedtuple
......
......@@ -22,7 +22,7 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
# This file has been modified by Megvii ("Megvii Modifications").
# All Megvii Modifications are Copyright (C) 2014-2019 Megvii Inc. All rights reserved.
# All Megvii Modifications are Copyright (C) 2014-2020 Megvii Inc. All rights reserved.
# ---------------------------------------------------------------------
import megengine.module as M
import numpy as np
......
......@@ -10,7 +10,10 @@ from .anchor import *
from .box_utils import *
from .fpn import *
from .loss import *
from .pooler import *
from .rcnn import *
from .retinanet import *
from .rpn import *
_EXCLUDE = {}
__all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")]
# -*- coding: utf-8 -*-
# Copyright 2018-2019 Open-MMLab.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ---------------------------------------------------------------------
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
......@@ -20,10 +6,6 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
# This file has been modified by Megvii ("Megvii Modifications").
# All Megvii Modifications are Copyright (C) 2014-2019 Megvii Inc. All rights reserved.
# ---------------------------------------------------------------------
from abc import ABCMeta, abstractmethod
import megengine.functional as F
......@@ -132,8 +114,7 @@ class DefaultAnchorGenerator(BaseAnchorGenerator):
[flatten_shift_x, flatten_shift_y, flatten_shift_x, flatten_shift_y, ],
axis=1,
)
if self.offset > 0:
centers = centers + self.offset * stride
centers = centers + self.offset * self.base_size
return centers
def get_anchors_by_feature(self, featmap, stride):
......
......@@ -112,12 +112,12 @@ class BoxCoder(BoxCoderBase, metaclass=ABCMeta):
pred_y2 = pred_ctr_y + 0.5 * pred_height
pred_box = self._concat_new_axis(pred_x1, pred_y1, pred_x2, pred_y2, 2)
pred_box = pred_box.reshape(pred_box.shape[0], -1)
pred_box = pred_box.reshape(pred_box.shapeof(0), -1)
return pred_box
def get_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
def get_iou(boxes1: Tensor, boxes2: Tensor, return_ignore=False) -> Tensor:
"""
Given two lists of boxes of size N and M,
compute the IoU (intersection over union)
......@@ -132,10 +132,10 @@ def get_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
"""
box = boxes1
gt = boxes2
target_shape = (boxes1.shape[0], boxes2.shapeof()[0], 4)
target_shape = (boxes1.shapeof(0), boxes2.shapeof(0), 4)
b_box = F.add_axis(boxes1, 1).broadcast(*target_shape)
b_gt = F.add_axis(boxes2, 0).broadcast(*target_shape)
b_gt = F.add_axis(boxes2[:, :4], 0).broadcast(*target_shape)
iw = F.minimum(b_box[:, :, 2], b_gt[:, :, 2]) - F.maximum(
b_box[:, :, 0], b_gt[:, :, 0]
......@@ -148,7 +148,7 @@ def get_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
area_box = (box[:, 2] - box[:, 0]) * (box[:, 3] - box[:, 1])
area_gt = (gt[:, 2] - gt[:, 0]) * (gt[:, 3] - gt[:, 1])
area_target_shape = (box.shape[0], gt.shapeof()[0])
area_target_shape = (box.shapeof(0), gt.shapeof(0))
b_area_box = F.add_axis(area_box, 1).broadcast(*area_target_shape)
b_area_gt = F.add_axis(area_gt, 0).broadcast(*area_target_shape)
......@@ -156,20 +156,34 @@ def get_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
union = b_area_box + b_area_gt - inter
overlaps = F.maximum(inter / union, 0)
if return_ignore:
overlaps_ignore = F.maximum(inter / b_area_box, 0)
gt_ignore_mask = F.add_axis((gt[:, 4] == -1), 0).broadcast(*area_target_shape)
overlaps *= (1 - gt_ignore_mask)
overlaps_ignore *= gt_ignore_mask
return overlaps, overlaps_ignore
return overlaps
def get_clipped_box(boxes, hw):
""" Clip the boxes into the image region."""
# x1 >=0
box_x1 = F.maximum(F.minimum(boxes[:, 0::4], hw[1]), 0)
box_x1 = F.clamp(boxes[:, 0::4], lower=0, upper=hw[1])
# y1 >=0
box_y1 = F.maximum(F.minimum(boxes[:, 1::4], hw[0]), 0)
box_y1 = F.clamp(boxes[:, 1::4], lower=0, upper=hw[0])
# x2 < im_info[1]
box_x2 = F.maximum(F.minimum(boxes[:, 2::4], hw[1]), 0)
box_x2 = F.clamp(boxes[:, 2::4], lower=0, upper=hw[1])
# y2 < im_info[0]
box_y2 = F.maximum(F.minimum(boxes[:, 3::4], hw[0]), 0)
box_y2 = F.clamp(boxes[:, 3::4], lower=0, upper=hw[0])
clip_box = F.concat([box_x1, box_y1, box_x2, box_y2], axis=1)
return clip_box
def filter_boxes(boxes, size=0):
width = boxes[:, 2] - boxes[:, 0]
height = boxes[:, 3] - boxes[:, 1]
keep = (width > size) * (height > size)
return keep
......@@ -22,7 +22,7 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
# This file has been modified by Megvii ("Megvii Modifications").
# All Megvii Modifications are Copyright (C) 2014-2019 Megvii Inc. All rights reserved.
# All Megvii Modifications are Copyright (C) 2014-2020 Megvii Inc. All rights reserved.
# ---------------------------------------------------------------------
import math
from typing import List
......@@ -47,6 +47,8 @@ class FPN(M.Module):
out_channels: int = 256,
norm: str = "",
top_block: M.Module = None,
strides=[8, 16, 32],
channels=[512, 1024, 2048],
):
"""
Args:
......@@ -63,8 +65,8 @@ class FPN(M.Module):
"""
super(FPN, self).__init__()
in_strides = [8, 16, 32]
in_channels = [512, 1024, 2048]
in_strides = strides
in_channels = channels
use_bias = norm == ""
self.lateral_convs = list()
......@@ -148,33 +150,50 @@ class FPN(M.Module):
top_block_in_feature = results[
self._out_features.index(self.top_block.in_feature)
]
results.extend(self.top_block(top_block_in_feature, results[-1]))
results.extend(self.top_block(top_block_in_feature))
return dict(zip(self._out_features, results))
def output_shape(self):
return {
name: layers.ShapeSpec(channels=self._out_feature_channels[name],)
name: layers.ShapeSpec(
channels=self._out_feature_channels[name],
stride=self._out_feature_strides[name],
)
for name in self._out_features
}
class FPNP6(M.Module):
"""
used in FPN, generate a downsampled P6 feature from P5.
"""
def __init__(self, in_feature="p5"):
super().__init__()
self.num_levels = 1
self.in_feature = in_feature
def forward(self, x):
return [F.max_pool2d(x, kernel_size=1, stride=2, padding=0)]
class LastLevelP6P7(M.Module):
"""
This module is used in RetinaNet to generate extra layers, P6 and P7 from
C5 feature.
"""
def __init__(self, in_channels: int, out_channels: int):
def __init__(self, in_channels: int, out_channels: int, in_feature="res5"):
super().__init__()
self.num_levels = 2
self.in_feature = "res5"
if in_feature == "p5":
assert in_channels == out_channels
self.in_feature = in_feature
self.p6 = M.Conv2d(in_channels, out_channels, 3, 2, 1)
self.p7 = M.Conv2d(out_channels, out_channels, 3, 2, 1)
self.use_P5 = in_channels == out_channels
def forward(self, c5, p5=None):
x = p5 if self.use_P5 else c5
def forward(self, x):
p6 = self.p6(x)
p7 = self.p7(F.relu(p6))
return [p6, p7]
......@@ -6,11 +6,9 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import megengine as mge
import megengine.functional as F
import numpy as np
from megengine.core import tensor, Tensor
from megengine.core import Tensor
def get_focal_loss(
......@@ -112,7 +110,8 @@ def get_smooth_l1_loss(
if norm_type == "fg":
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(fg_mask.sum(), 1)
elif norm_type == "all":
raise NotImplementedError
all_mask = (label != ignore_label)
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(all_mask.sum(), 1)
else:
raise NotImplementedError
......@@ -151,5 +150,19 @@ def get_smooth_l1_base(
abs_x = F.abs(x)
in_loss = 0.5 * x ** 2 * sigma2
out_loss = abs_x - 0.5 / sigma2
loss = F.where(abs_x < cond_point, in_loss, out_loss)
in_mask = abs_x < cond_point
out_mask = 1 - in_mask
loss = in_loss * in_mask + out_loss * out_mask
return loss
def softmax_loss(score, label, ignore_label=-1):
max_score = F.zero_grad(score.max(axis=1, keepdims=True))
score -= max_score
log_prob = score - F.log(F.exp(score).sum(axis=1, keepdims=True))
mask = (label != ignore_label)
vlabel = label * mask
loss = -(F.indexing_one_hot(log_prob, vlabel.astype("int32"), 1) * mask).sum()
loss = loss / F.maximum(mask.sum(), 1)
return loss
# -*- coding:utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import math
import numpy as np
import megengine as mge
import megengine.functional as F
def roi_pool(
rpn_fms, rois, stride, pool_shape, roi_type='roi_align',
):
assert len(stride) == len(rpn_fms)
canonical_level = 4
canonical_box_size = 224
min_level = math.log2(stride[0])
max_level = math.log2(stride[-1])
num_fms = len(rpn_fms)
box_area = (rois[:, 3] - rois[:, 1]) * (rois[:, 4] - rois[:, 2])
level_assignments = F.floor(
canonical_level + F.log(box_area.sqrt() / canonical_box_size) / np.log(2)
)
level_assignments = F.minimum(level_assignments, max_level)
level_assignments = F.maximum(level_assignments, min_level)
level_assignments = level_assignments - min_level
# avoid empty assignment
level_assignments = F.concat(
[level_assignments, mge.tensor(np.arange(num_fms, dtype=np.int32))],
)
rois = F.concat([rois, mge.zeros((num_fms, rois.shapeof(-1)))])
pool_list, inds_list = [], []
for i in range(num_fms):
mask = (level_assignments == i)
_, inds = F.cond_take(mask == 1, mask)
level_rois = rois.ai[inds]
if roi_type == 'roi_pool':
pool_fm = F.roi_pooling(
rpn_fms[i], level_rois, pool_shape,
mode='max', scale=1.0/stride[i]
)
elif roi_type == 'roi_align':
pool_fm = F.roi_align(
rpn_fms[i], level_rois, pool_shape, mode='average',
spatial_scale=1.0/stride[i], sample_points=2, aligned=True
)
pool_list.append(pool_fm)
inds_list.append(inds)
fm_order = F.concat(inds_list, axis=0)
fm_order = F.argsort(fm_order.reshape(1, -1))[1].reshape(-1)
pool_feature = F.concat(pool_list, axis=0)
pool_feature = pool_feature.ai[fm_order][:-num_fms]
return pool_feature
# -*- coding:utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import megengine as mge
import megengine.functional as F
import megengine.module as M
from official.vision.detection import layers
class RCNN(M.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.box_coder = layers.BoxCoder(
reg_mean=cfg.bbox_normalize_means,
reg_std=cfg.bbox_normalize_stds
)
# roi head
self.in_features = cfg.rcnn_in_features
self.stride = cfg.rcnn_stride
self.pooling_method = cfg.pooling_method
self.pooling_size = cfg.pooling_size
self.fc1 = M.Linear(256 * self.pooling_size[0] * self.pooling_size[1], 1024)
self.fc2 = M.Linear(1024, 1024)
for l in [self.fc1, self.fc2]:
M.init.normal_(l.weight, std=0.01)
M.init.fill_(l.bias, 0)
# box predictor
self.pred_cls = M.Linear(1024, cfg.num_classes + 1)
self.pred_delta = M.Linear(1024, (cfg.num_classes + 1) * 4)
M.init.normal_(self.pred_cls.weight, std=0.01)
M.init.normal_(self.pred_delta.weight, std=0.001)
for l in [self.pred_cls, self.pred_delta]:
M.init.fill_(l.bias, 0)
def forward(self, fpn_fms, rcnn_rois, im_info=None, gt_boxes=None):
rcnn_rois, labels, bbox_targets = self.get_ground_truth(rcnn_rois, im_info, gt_boxes)
fpn_fms = [fpn_fms[x] for x in self.in_features]
pool_features = layers.roi_pool(
fpn_fms, rcnn_rois, self.stride,
self.pooling_size, self.pooling_method,
)
flatten_feature = F.flatten(pool_features, start_axis=1)
roi_feature = F.relu(self.fc1(flatten_feature))
roi_feature = F.relu(self.fc2(roi_feature))
pred_cls = self.pred_cls(roi_feature)
pred_delta = self.pred_delta(roi_feature)
if self.training:
# loss for classification
loss_rcnn_cls = layers.softmax_loss(pred_cls, labels)
# loss for regression
pred_delta = pred_delta.reshape(-1, self.cfg.num_classes + 1, 4)
vlabels = labels.reshape(-1, 1).broadcast((labels.shapeof(0), 4))
pred_delta = F.indexing_one_hot(pred_delta, vlabels, axis=1)
loss_rcnn_loc = layers.get_smooth_l1_loss(
pred_delta, bbox_targets, labels,
self.cfg.rcnn_smooth_l1_beta,
norm_type="all",
)
loss_dict = {
'loss_rcnn_cls': loss_rcnn_cls,
'loss_rcnn_loc': loss_rcnn_loc
}
return loss_dict
else:
# slice 1 for removing background
pred_scores = F.softmax(pred_cls, axis=1)[:, 1:]
pred_delta = pred_delta[:, 4:].reshape(-1, 4)