未验证 提交 da3bf2b4 编写于 作者: J Jianfeng Wang 提交者: GitHub

feat(detection): several enhancement and reformat (#43)

* feat(detection): several enhancement and reformat

* feat(detection): use multi-scale training

* chore(detection): update weights and results
上级 0023ce55
__pycache__/
*log*/
*.so
......@@ -73,12 +73,12 @@ export PYTHONPATH=/path/to/models:$PYTHONPATH
### 目标检测
目标检测同样是计算机视觉中的常见任务,我们提供了一个经典的目标检测模型[retinanet](./official/vision/detection),这个模型在**COCO验证集**上的测试结果如下:
目标检测同样是计算机视觉中的常见任务,我们提供了两个经典的目标检测模型[Retinanet](./official/vision/detection/model/retinanet)[Faster R-CNN](./official/vision/detection/model/faster_rcnn),这两个模型在**COCO验证集**上的测试结果如下:
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.0 |
| faster-rcnn-fpn-res50-1x-800size | 37.3 |
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.4 |
| faster-rcnn-res50-1x-800size | 38.8 |
### 图像分割
......
from official.nlp.bert.model import (
cased_L_12_H_768_A_12,
cased_L_24_H_1024_A_16,
chinese_L_12_H_768_A_12,
multi_cased_L_12_H_768_A_12,
uncased_L_12_H_768_A_12,
uncased_L_24_H_1024_A_16,
wwm_cased_L_24_H_1024_A_16,
wwm_uncased_L_24_H_1024_A_16,
)
from official.quantization.models import quantized_resnet18
from official.vision.classification.resnet.model import (
BasicBlock,
Bottleneck,
......@@ -16,54 +27,22 @@ from official.vision.classification.shufflenet.model import (
shufflenet_v2_x1_5,
shufflenet_v2_x2_0,
)
from official.nlp.bert.model import (
uncased_L_12_H_768_A_12,
cased_L_12_H_768_A_12,
uncased_L_24_H_1024_A_16,
cased_L_24_H_1024_A_16,
chinese_L_12_H_768_A_12,
multi_cased_L_12_H_768_A_12,
wwm_uncased_L_24_H_1024_A_16,
wwm_cased_L_24_H_1024_A_16,
)
from official.vision.detection.faster_rcnn_fpn_res50_coco_1x_800size import (
faster_rcnn_fpn_res50_coco_1x_800size,
)
from official.vision.detection.faster_rcnn_fpn_res50_coco_1x_800size_syncbn import (
faster_rcnn_fpn_res50_coco_1x_800size_syncbn,
)
from official.vision.detection.retinanet_res50_coco_1x_800size import (
from official.vision.detection.configs import (
faster_rcnn_res50_coco_1x_800size,
faster_rcnn_res50_coco_1x_800size_syncbn,
retinanet_res50_coco_1x_800size,
)
from official.vision.detection.retinanet_res50_coco_1x_800size_syncbn import (
retinanet_res50_coco_1x_800size_syncbn,
)
# TODO: need pretrained weights
# from official.vision.detection.retinanet_res50_objects365_1x_800size import (
# retinanet_res50_objects365_1x_800size,
# )
# from official.vision.detection.retinanet_res50_voc_1x_800size import (
# retinanet_res50_voc_1x_800size,
# )
from official.vision.detection.models import FasterRCNN, RetinaNet
from official.vision.detection.tools.test import DetEvaluator
from official.vision.segmentation.deeplabv3plus import (
deeplabv3plus_res101,
DeepLabV3Plus,
)
from official.vision.detection.tools.utils import DetEvaluator
from official.vision.keypoints.inference import KeypointEvaluator
from official.vision.keypoints.models import (
mspn_4stage,
simplebaseline_res50,
simplebaseline_res101,
simplebaseline_res152,
mspn_4stage
)
from official.vision.keypoints.inference import KeypointEvaluator
from official.quantization.models import quantized_resnet18
from official.vision.segmentation.deeplabv3plus import (
DeepLabV3Plus,
deeplabv3plus_res101,
)
......@@ -2,14 +2,16 @@
## 介绍
本目录包含了采用MegEngine实现的经典网络结构,包括[RetinaNet](https://arxiv.org/pdf/1708.02002>)[Faster R-CNN with FPN](https://arxiv.org/pdf/1612.03144.pdf)等,同时提供了在COCO2017数据集上的完整训练和测试代码。
本目录包含了采用MegEngine实现的经典网络结构,包括[RetinaNet](https://arxiv.org/pdf/1708.02002>)[Faster R-CNN](https://arxiv.org/pdf/1612.03144.pdf)等,同时提供了在COCO2017数据集上的完整训练和测试代码。
网络的性能在COCO2017数据集上的测试结果如下:
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | trainging speed<br>(8gpu) | training speed<br>(1gpu) |
| --- | --- | --- | --- | --- | --- |
| retinanet-res50-coco-1x-800size | 36.0 | 2 | 2080Ti | 2.27(it/s) | 3.7(it/s) |
| faster-rcnn-fpn-res50-coco-1x-800size | 37.3 | 2 | 2080Ti | 1.9(it/s) | 3.1(it/s) |
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | trainging speed<br>(8gpu) |
| --- | :---: | :---: | :---: | :---: |
| retinanet-res50-coco-1x-800size | 36.4 | 2 | 2080Ti | 3.1(it/s) |
| retinanet-res50-coco-1x-800size-syncbn | 37.1 | 2 | 2080Ti | 1.7(it/s) |
| faster-rcnn-res50-coco-1x-800size | 38.8 | 2 | 2080Ti | 3.3(it/s) |
| faster-rcnn-res50-coco-1x-800size-syncbn | 39.3 | 2 | 2080Ti | 1.8(it/s) |
* MegEngine v0.4.0
......@@ -18,16 +20,16 @@
以RetinaNet为例,模型训练好之后,可以通过如下命令测试单张图片:
```bash
python3 tools/inference.py -f retinanet_res50_coco_1x_800size.py \
python3 tools/inference.py -f configs/retinanet_res50_coco_1x_800size.py \
-w /path/to/retinanet_weights.pkl
-i ../../assets/cat.jpg \
-m /path/to/retinanet_weights.pkl
```
`tools/inference.py`的命令行选项如下:
- `-f`, 测试的网络结构描述文件。
- `-m`, 网络结构文件所对应的训练权重, 可以从顶部的表格中下载训练好的检测器权重。
- `-i`, 需要测试的样例图片。
- `-w`, 网络结构文件所对应的训练权重, 可以从顶部的表格中下载训练好的检测器权重。
使用默认图片和默认模型测试的结果见下图:
......@@ -53,10 +55,7 @@ python3 tools/inference.py -f retinanet_res50_coco_1x_800size.py \
4. 开始训练:
```bash
python3 tools/train.py -f retinanet_res50_coco_1x_800size.py \
-n 8 \
--batch_size 2 \
-w /path/to/pretrain.pkl
python3 tools/train.py -f configs/retinanet_res50_coco_1x_800size.py -n 8
```
`tools/train.py`提供了灵活的命令行选项,包括:
......@@ -64,8 +63,8 @@ python3 tools/train.py -f retinanet_res50_coco_1x_800size.py \
- `-f`, 所需要训练的网络结构描述文件。可以是RetinaNet、Faster R-CNN等.
- `-n`, 用于训练的devices(gpu)数量,默认使用所有可用的gpu.
- `-w`, 预训练的backbone网络权重的路径。
- `--batch_size`,训练时采用的`batch size`, 默认2,表示每张卡训2张图。
- `--dataset-dir`, COCO2017数据集的上级目录,默认`/data/datasets`
- `-b`,训练时采用的`batch size`, 默认2,表示每张卡训2张图。
- `-d`, COCO2017数据集的上级目录,默认`/data/datasets`
默认情况下模型会存在 `log-of-模型名`目录下。
......@@ -90,18 +89,16 @@ nvcc -I $MGE/_internal/include -shared -o lib_nms.so -Xcompiler "-fno-strict-ali
在得到训练完保存的模型之后,可以通过tools下的test.py文件测试模型在`COCO2017`验证集的性能:
```bash
python3 tools/test.py -f retinanet_res50_coco_1x_800size.py \
-n 8 \
--model /path/to/retinanet_weights.pt \
--dataset_dir /data/datasets
python3 tools/test.py -f configs/retinanet_res50_coco_1x_800size.py -n 8 \
-w /path/to/retinanet_weights.pt \
```
`tools/test.py`的命令行选项如下:
- `-f`, 所需要测试的网络结构描述文件。
- `-n`, 用于测试的devices(gpu)数量,默认1;
- `--model`, 需要测试的模型;可以从顶部的表格中下载训练好的检测器权重, 也可以用自行训练好的权重。
- `--dataset_dir`,COCO2017数据集的上级目录,默认`/data/datasets`
- `-w`, 需要测试的模型;可以从顶部的表格中下载训练好的检测器权重, 也可以用自行训练好的权重。
- `-d`,COCO2017数据集的上级目录,默认`/data/datasets`
## 参考文献
......
from .faster_rcnn_res50_coco_1x_800size import faster_rcnn_res50_coco_1x_800size
from .faster_rcnn_res50_coco_1x_800size_syncbn import faster_rcnn_res50_coco_1x_800size_syncbn
from .retinanet_res50_coco_1x_800size import retinanet_res50_coco_1x_800size
from .retinanet_res50_coco_1x_800size_syncbn import retinanet_res50_coco_1x_800size_syncbn
_EXCLUDE = {}
__all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")]
......@@ -13,9 +13,9 @@ from official.vision.detection import models
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_ec2e80b9_res50_1x_800size_37dot3.pkl"
"faster_rcnn_res50_coco_1x_800size_38dot8_5e195d80.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size(batch_size=1, **kwargs):
def faster_rcnn_res50_coco_1x_800size(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
......
......@@ -11,7 +11,7 @@ from megengine import hub
from official.vision.detection import models
class CustomFasterRCNNFPNConfig(models.FasterRCNNConfig):
class CustomFasterRCNNConfig(models.FasterRCNNConfig):
def __init__(self):
super().__init__()
......@@ -22,9 +22,9 @@ class CustomFasterRCNNFPNConfig(models.FasterRCNNConfig):
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_cf5c020b_res50_1x_800size_syncbn_37dot6.pkl"
"faster_rcnn_res50_coco_1x_800size_syncbn_39dot3_09b99bce.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
def faster_rcnn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
......@@ -32,8 +32,8 @@ def faster_rcnn_fpn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
`"COCO" <https://arxiv.org/abs/1405.0312>`_
`"SyncBN" <https://arxiv.org/abs/1711.07240>`_
"""
return models.FasterRCNN(CustomFasterRCNNFPNConfig(), batch_size=batch_size, **kwargs)
return models.FasterRCNN(CustomFasterRCNNConfig(), batch_size=batch_size, **kwargs)
Net = models.FasterRCNN
Cfg = CustomFasterRCNNFPNConfig
Cfg = CustomFasterRCNNConfig
......@@ -13,7 +13,7 @@ from official.vision.detection import models
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"retinanet_d3f58dce_res50_1x_800size_36dot0.pkl"
"retinanet_res50_coco_1x_800size_36dot4_b782a619.pkl"
)
def retinanet_res50_coco_1x_800size(batch_size=1, **kwargs):
r"""
......
......@@ -6,6 +6,7 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from megengine import hub
from official.vision.detection import models
......@@ -19,6 +20,10 @@ class CustomRetinaNetConfig(models.RetinaNetConfig):
self.backbone_freeze_at = 0
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"retinanet_res50_coco_1x_800size_syncbn_37dot1_35cedcdf.pkl"
)
def retinanet_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
r"""
RetinaNet with SyncBN trained from COCO dataset.
......
......@@ -63,14 +63,13 @@ def get_norm(norm, out_channels=None):
Returns:
M.Module or None: the normalization layer
"""
if isinstance(norm, str):
if len(norm) == 0:
return None
norm = {
"BN": M.BatchNorm2d,
"SyncBN": M.SyncBatchNorm,
"FrozenBN": FrozenBatchNorm2d
}[norm]
if norm is None:
return None
norm = {
"BN": M.BatchNorm2d,
"SyncBN": M.SyncBatchNorm,
"FrozenBN": FrozenBatchNorm2d,
}[norm]
if out_channels is not None:
return norm(out_channels)
else:
......
......@@ -42,14 +42,14 @@ class DefaultAnchorGenerator(BaseAnchorGenerator):
def __init__(
self,
base_size=8,
anchor_scales: np.ndarray = np.array([2, 3, 4]),
anchor_ratios: np.ndarray = np.array([0.5, 1, 2]),
anchor_scales: list = [2, 3, 4],
anchor_ratios: list = [0.5, 1, 2],
offset: float = 0,
):
super().__init__()
self.base_size = base_size
self.anchor_scales = anchor_scales
self.anchor_ratios = anchor_ratios
self.anchor_scales = np.array(anchor_scales)
self.anchor_ratios = np.array(anchor_ratios)
self.offset = offset
def _whctrs(self, anchor):
......@@ -111,7 +111,7 @@ class DefaultAnchorGenerator(BaseAnchorGenerator):
flatten_shift_y = F.add_axis(broad_shift_y.reshape(-1), 1)
centers = F.concat(
[flatten_shift_x, flatten_shift_y, flatten_shift_x, flatten_shift_y, ],
[flatten_shift_x, flatten_shift_y, flatten_shift_x, flatten_shift_y,],
axis=1,
)
centers = centers + self.offset * self.base_size
......
......@@ -8,6 +8,8 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from abc import ABCMeta, abstractmethod
import numpy as np
import megengine.functional as F
from megengine.core import Tensor
......@@ -29,15 +31,19 @@ class BoxCoderBase(metaclass=ABCMeta):
class BoxCoder(BoxCoderBase, metaclass=ABCMeta):
def __init__(self, reg_mean=None, reg_std=None):
def __init__(
self,
reg_mean=[0.0, 0.0, 0.0, 0.0],
reg_std=[1.0, 1.0, 1.0, 1.0],
):
"""
Args:
reg_mean(np.ndarray): [x0_mean, x1_mean, y0_mean, y1_mean] or None
reg_std(np.ndarray): [x0_std, x1_std, y0_std, y1_std] or None
"""
self.reg_mean = reg_mean[None, :] if reg_mean is not None else None
self.reg_std = reg_std[None, :] if reg_std is not None else None
self.reg_mean = np.array(reg_mean)[None, :]
self.reg_std = np.array(reg_std)[None, :]
super().__init__()
@staticmethod
......@@ -82,17 +88,13 @@ class BoxCoder(BoxCoderBase, metaclass=ABCMeta):
target_dh = F.log(gt_height / bbox_height)
target = self._concat_new_axis(target_dx, target_dy, target_dw, target_dh)
if self.reg_mean is not None:
target -= self.reg_mean
if self.reg_std is not None:
target /= self.reg_std
target -= self.reg_mean
target /= self.reg_std
return target
def decode(self, anchors: Tensor, deltas: Tensor) -> Tensor:
if self.reg_std is not None:
deltas *= self.reg_std
if self.reg_mean is not None:
deltas += self.reg_mean
deltas *= self.reg_std
deltas += self.reg_mean
(
anchor_width,
......@@ -158,7 +160,7 @@ def get_iou(boxes1: Tensor, boxes2: Tensor, return_ignore=False) -> Tensor:
if return_ignore:
overlaps_ignore = F.maximum(inter / b_area_box, 0)
gt_ignore_mask = F.add_axis((gt[:, 4] == -1), 0).broadcast(*area_target_shape)
overlaps *= (1 - gt_ignore_mask)
overlaps *= 1 - gt_ignore_mask
overlaps_ignore *= gt_ignore_mask
return overlaps, overlaps_ignore
......
......@@ -45,7 +45,7 @@ class FPN(M.Module):
bottom_up: M.Module,
in_features: List[str],
out_channels: int = 256,
norm: str = "",
norm: str = None,
top_block: M.Module = None,
strides=[8, 16, 32],
channels=[512, 1024, 2048],
......
......@@ -53,7 +53,9 @@ def get_focal_loss(
neg_part = score ** gamma * F.log(F.clamp(1 - score, 1e-8))
pos_loss = -(label == class_range) * pos_part * alpha
neg_loss = -(label != class_range) * (label != ignore_label) * neg_part * (1 - alpha)
neg_loss = (
-(label != class_range) * (label != ignore_label) * neg_part * (1 - alpha)
)
loss = pos_loss + neg_loss
if norm_type == "fg":
......@@ -69,10 +71,9 @@ def get_smooth_l1_loss(
pred_bbox: Tensor,
gt_bbox: Tensor,
label: Tensor,
sigma: int = 3,
beta: int = 1,
background: int = 0,
ignore_label: int = -1,
fix_smooth_l1: bool = False,
norm_type: str = "fg",
) -> Tensor:
r"""Smooth l1 loss used in RetinaNet.
......@@ -84,14 +85,12 @@ def get_smooth_l1_loss(
the ground-truth bbox with the shape of :math:`(B, A, 4)`
label (Tensor):
the assigned label of boxes with shape of :math:`(B, A)`
sigma (int):
beta (int):
the parameter of smooth l1 loss. Default: 1
background (int):
the value of background class. Default: 0
ignore_label (int):
the value of ignore class. Default: -1
fix_smooth_l1 (bool):
is to use huber loss, default is False to use original smooth-l1
norm_type (str): current support 'fg', 'all', 'none':
'fg': loss will be normalized by number of fore-ground samples
'all': loss will be normalized by number of all samples
......@@ -105,11 +104,11 @@ def get_smooth_l1_loss(
fg_mask = (label != background) * (label != ignore_label)
losses = get_smooth_l1_base(pred_bbox, gt_bbox, sigma, is_fix=fix_smooth_l1)
losses = get_smooth_l1_base(pred_bbox, gt_bbox, beta)
if norm_type == "fg":
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(fg_mask.sum(), 1)
elif norm_type == "all":
all_mask = (label != ignore_label)
all_mask = label != ignore_label
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(all_mask.sum(), 1)
else:
raise NotImplementedError
......@@ -118,7 +117,7 @@ def get_smooth_l1_loss(
def get_smooth_l1_base(
pred_bbox: Tensor, gt_bbox: Tensor, sigma: float, is_fix: bool = False,
pred_bbox: Tensor, gt_bbox: Tensor, beta: float,
):
r"""
......@@ -127,34 +126,24 @@ def get_smooth_l1_base(
the predicted bbox with the shape of :math:`(N, 4)`
gt_bbox (Tensor):
the ground-truth bbox with the shape of :math:`(N, 4)`
sigma (int):
beta (int):
the parameter of smooth l1 loss.
is_fix (bool):
is to use huber loss, default is False to use original smooth-l1
Returns:
the calculated smooth l1 loss.
"""
if is_fix:
sigma = 1 / sigma
cond_point = sigma
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
in_loss = 0.5 * x ** 2
out_loss = sigma * abs_x - 0.5 * sigma ** 2
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
if beta < 1e-5:
loss = abs_x
else:
sigma2 = sigma ** 2
cond_point = 1 / sigma2
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
in_loss = 0.5 * x ** 2 * sigma2
out_loss = abs_x - 0.5 / sigma2
# FIXME: F.where cannot handle 0-shape tensor yet
# loss = F.where(abs_x < cond_point, in_loss, out_loss)
in_mask = abs_x < cond_point
out_mask = 1 - in_mask
loss = in_loss * in_mask + out_loss * out_mask
in_loss = 0.5 * x ** 2 / beta
out_loss = abs_x - 0.5 * beta
# FIXME: F.where cannot handle 0-shape tensor yet
# loss = F.where(abs_x < beta, in_loss, out_loss)
in_mask = abs_x < beta
loss = in_loss * in_mask + out_loss * (1 - in_mask)
return loss
......@@ -162,7 +151,7 @@ def softmax_loss(score, label, ignore_label=-1):
max_score = F.zero_grad(score.max(axis=1, keepdims=True))
score -= max_score
log_prob = score - F.log(F.exp(score).sum(axis=1, keepdims=True))
mask = (label != ignore_label)
mask = label != ignore_label
vlabel = label * mask
loss = -(F.indexing_one_hot(log_prob, vlabel.astype("int32"), 1) * mask).sum()
loss = loss / F.maximum(mask.sum(), 1)
......
......@@ -15,7 +15,7 @@ import megengine.functional as F
def roi_pool(
rpn_fms, rois, stride, pool_shape, roi_type='roi_align',
rpn_fms, rois, stride, pool_shape, roi_type="roi_align",
):
assert len(stride) == len(rpn_fms)
canonical_level = 4
......@@ -40,18 +40,22 @@ def roi_pool(
pool_list, inds_list = [], []
for i in range(num_fms):
mask = (level_assignments == i)
mask = level_assignments == i
_, inds = F.cond_take(mask == 1, mask)
level_rois = rois.ai[inds]
if roi_type == 'roi_pool':
if roi_type == "roi_pool":
pool_fm = F.roi_pooling(
rpn_fms[i], level_rois, pool_shape,
mode='max', scale=1.0/stride[i]
rpn_fms[i], level_rois, pool_shape, mode="max", scale=1.0 / stride[i]
)
elif roi_type == 'roi_align':
elif roi_type == "roi_align":
pool_fm = F.roi_align(
rpn_fms[i], level_rois, pool_shape, mode='average',
spatial_scale=1.0/stride[i], sample_points=2, aligned=True
rpn_fms[i],
level_rois,
pool_shape,
mode="average",
spatial_scale=1.0 / stride[i],
sample_points=2,
aligned=True,
)
pool_list.append(pool_fm)
inds_list.append(inds)
......
......@@ -14,14 +14,10 @@ from official.vision.detection import layers
class RCNN(M.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.box_coder = layers.BoxCoder(
reg_mean=cfg.rcnn_reg_mean,
reg_std=cfg.rcnn_reg_std
)
self.box_coder = layers.BoxCoder(cfg.rcnn_reg_mean, cfg.rcnn_reg_std)
# roi head
self.in_features = cfg.rcnn_in_features
......@@ -44,12 +40,13 @@ class RCNN(M.Module):
M.init.fill_(l.bias, 0)
def forward(self, fpn_fms, rcnn_rois, im_info=None, gt_boxes=None):
rcnn_rois, labels, bbox_targets = self.get_ground_truth(rcnn_rois, im_info, gt_boxes)
rcnn_rois, labels, bbox_targets = self.get_ground_truth(
rcnn_rois, im_info, gt_boxes
)
fpn_fms = [fpn_fms[x] for x in self.in_features]
pool_features = layers.roi_pool(
fpn_fms, rcnn_rois, self.stride,
self.pooling_size, self.pooling_method,
fpn_fms, rcnn_rois, self.stride, self.pooling_size, self.pooling_method,
)
flatten_feature = F.flatten(pool_features, start_axis=1)
roi_feature = F.relu(self.fc1(flatten_feature))
......@@ -67,14 +64,13 @@ class RCNN(M.Module):
pred_delta = F.indexing_one_hot(pred_delta, vlabels, axis=1)
loss_rcnn_loc = layers.get_smooth_l1_loss(
pred_delta, bbox_targets, labels,
pred_delta,
bbox_targets,
labels,
self.cfg.rcnn_smooth_l1_beta,
norm_type="all",
)
loss_dict = {
'loss_rcnn_cls': loss_rcnn_cls,
'loss_rcnn_loc': loss_rcnn_loc
}
loss_dict = {"loss_rcnn_cls": loss_rcnn_cls, "loss_rcnn_loc": loss_rcnn_loc}
return loss_dict
else:
# slice 1 for removing background
......@@ -82,7 +78,9 @@ class RCNN(M.Module):
pred_delta = pred_delta[:, 4:].reshape(-1, 4)
target_shape = (rcnn_rois.shapeof(0), self.cfg.num_classes, 4)
# rois (N, 4) -> (N, 1, 4) -> (N, 80, 4) -> (N * 80, 4)
base_rois = F.add_axis(rcnn_rois[:, 1:5], 1).broadcast(target_shape).reshape(-1, 4)
base_rois = (
F.add_axis(rcnn_rois[:, 1:5], 1).broadcast(target_shape).reshape(-1, 4)
)
pred_bbox = self.box_coder.decode(base_rois, pred_delta)
return pred_bbox, pred_scores
......@@ -101,7 +99,7 @@ class RCNN(M.Module):
batch_inds = mge.ones((gt_boxes_per_img.shapeof(0), 1)) * bid
# if config.proposal_append_gt:
gt_rois = F.concat([batch_inds, gt_boxes_per_img[:, :4]], axis=1)
batch_roi_mask = (rpn_rois[:, 0] == bid)
batch_roi_mask = rpn_rois[:, 0] == bid
_, batch_roi_inds = F.cond_take(batch_roi_mask == 1, batch_roi_mask)
# all_rois : [batch_id, x1, y1, x2, y2]