iMet2019 Chainer Starter (SEResNet152 + FocalLoss)

From: https://www.kaggle.com/ttahara/imet2019-chainer-starter-seresnet152-focalloss

Author: Tawara

Score: 0.537

Chainer Starter Kernel for iMet Collection 2019 - FGVC6

What is this kernel?

This is a baseline example using Chainer and ChainerCV.
I share this kernel mainly for practice of writing kernel, and for sharing some (maybe useful) information e.g. training settings.

Summary of model, training, and inference

base model: SEResNet152

  • pre-trained weights on ImageNet
  • obtain output of Global Average Pooling Layer (pool5) and feed it to Dense Layer
  • input shape: (ch, height, witdh) = (3, 128, 128)
  • preprocessing for images
    • subtract per-channel mean of all train images, and devide by 255 (after data augmentation)

training

  • fine-tuning all over the model, not freezing any layer
  • data augmentation
  • max epoch: 20
  • batch size: 128
  • optimizer: NesterovSGD
    • momentum = 0.9, weight decay = 1e-04
  • learning schedule: cosine anealing
    • max_lr = 0.01, min_lr = 0.0001
    • I ran only one cycle.
    • Note: decaying learning rate by epoch, not iteration (for simple implementation)
  • loss: Focal Loss
    • alpha = 0.5, gamma = 2
    • Note:
      • Loss for each sample is calculated by summation of focal loss for each class.
      • Loss for mini-batch is calculated by averaging loss for samples in it.
        • At first I calculated each sample's loss by averaging, but it didn't work well.
  • validation
    • make one validation set, not perform k-fold cross validation
    • randomly split, not considering target(attribute) frequency
      • train : valid = 4 : 1
    • check each epoch's f-beta score by threshold = 0.2

inference

  • not using TTA
  • using best threshold for validation set
    • Thresholds for all classes are same.

setup

import

In [1]:
from time import time
beginning_time = time()
In [2]:
import os
import gc
import json
import sys
import random
from glob import glob

from PIL import Image
from collections import OrderedDict
from joblib import Parallel, delayed
from tqdm._tqdm_notebook import tqdm_notebook

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from math import cos, pi
from sklearn.metrics import precision_recall_curve

import seaborn as sns
from matplotlib import pyplot as plt

tqdm_notebook.pandas()
%matplotlib inline
print(os.listdir("../input"))
['imet-2019-fgvc6', 'chainercv-seresnet']

import Chainer and ChainerCV

In [3]:
import chainer
from chainer import cuda, functions, links, datasets
from chainer import iterators, optimizers, training, reporter
from chainer import initializers, serializers

import chainercv
from chainercv import transforms
from chainercv.links.model.ssd import random_distort
from chainercv.links.model.ssd import resize_with_random_interpolation

print("chainercsv:", chainercv.__version__)
print("chainer:", chainer.__version__)
chainercsv: 0.12.0
chainer: 5.4.0

set data path

In [4]:
DATA_DIR = "../input/imet-2019-fgvc6"
PRETRAINED_MODEL_DIR = "../input/chainercv-seresnet"
In [5]:
print("../input")
for path in glob("../input/*"):
    print("\t|- {}/".format(path.split("/")[-1]))
    for fname in os.listdir(path):
        print("\t\t|-{}".format(fname))
../input
	|- imet-2019-fgvc6/
		|-test
		|-train
		|-train.csv
		|-labels.csv
		|-sample_submission.csv
	|- chainercv-seresnet/
		|-se_resnet50_imagenet_converted_2018_06_25.npz
		|-se_resnet152_imagenet_converted_2018_06_25.npz
		|-se_resnet101_imagenet_converted_2018_06_25.npz

set other imformation.

In [6]:
img_chmean_train = np.array([164.82181258, 155.93463791, 144.58968491], dtype="f")
# img_chmean_test = np.array([163.39652723, 154.7340003 , 143.86426686], dtype="f")

settings

In [7]:
seed = 1086
settings = OrderedDict(
    # # model setting.
    base_model="SEResNet152",
    n_class=1103,
    image_size=[128, 128],
    
    # # training setting.
    # valid_fold=0,
    max_epoch=20,
    batch_size=128,
    da_select=[
        "random_distort",
        "random_lr_flip",
        "random_rotate",
        "random_expand", "resize_with_random",
        "random_crop"
    ],
    learning_schedule="cosine",
    epoch_per_cycle=20,
    optimizer="NesterovAG",
    learning_rate=0.01,
    learning_rate_min=0.0001,
    momentum=0.9,
    weight_decay_rate=1e-04,
    loss_func="FocalLoss",
    alpha=0.5,
    gamma=2,
)
settings["pretrained_model_path"] = "{}/{}".format(
    PRETRAINED_MODEL_DIR,
    {
        "SEResNet50": "se_resnet50_imagenet_converted_2018_06_25.npz",
        "SEResNet101": "se_resnet101_imagenet_converted_2018_06_25.npz",
        "SEResNet152": "se_resnet152_imagenet_converted_2018_06_25.npz",
    }[settings["base_model"]])

classes and functions definition

model

  • CNN model
  • wapper for training
In [8]:
base_class = getattr(chainercv.links, settings["base_model"])

class FeatureExtractor(base_class):
    """image feture extractor based on pretrained model."""
    
    def __init__(self, pretrained_model_path, extract_layers=["pool5"]):
        """Initialze."""
        super(FeatureExtractor, self).__init__(pretrained_model=pretrained_model_path)
        self._pick = extract_layers
        self.remove_unused()
    
    def __call__(self, x):
        """Simply Forward."""
        h = x
        for name in self.layer_names:
            h = self[name](h)
        return h
    
class Ext2Linear(chainer.Chain):
    """Chain to feed output of Extractor to Fully Connect."""
    
    def __init__(self, n_class, extractor):
        """Initialize."""
        super(Ext2Linear, self).__init__()
        with self.init_scope():
            self.extractor = extractor
            self.fc = links.Linear(
                None, n_class, initialW=initializers.Normal(scale=0.01))

    def __call__(self, x):
        """Forward."""
        return self.fc(self.extractor(x))
In [9]:
class MultiLabelClassifier(links.Classifier):
    """Wrapper for multi label classification model."""
    
    def __init__(self, predictor, lossfun):
        """Initialize"""
        super(MultiLabelClassifier, self).__init__(predictor, lossfun)
        self.compute_accuracy = False
        self.f_beta = None
        self.metfun = self._fbeta_score
        
    def __call__(self, x, t):
        """Foward. calc loss and evaluation metric."""
        loss = super().__call__(x, t)
        self.f_beta = None
        self.f_beta = self.metfun(self.y, t)
        reporter.report({'f-beta': self.f_beta}, self)
        
        return loss
    
    def _fbeta_score(self, y_pred, t, beta=2, th=0.2, epsilon=1e-09):
        """
        calculate f-beta score.
        
        calculate f-bata score along **class-axis(axis=1)** and average them along sample-axis.
        """
        y_prob = functions.sigmoid(y_pred).data
        t_pred = (y_prob >= th).astype("i")
        true_pos = (t_pred * t).sum(axis=1)  # tp
        pred_pos = t_pred.sum(axis=1)  # tp + fp
        poss_pos = t.sum(axis=1)  # tp + fn
        precision = true_pos / (pred_pos + epsilon)
        recall = true_pos / (poss_pos + epsilon)
        f_beta_each_id = (1 + beta ** 2) * precision * recall / ((beta ** 2) * precision + recall + epsilon)
        return functions.mean(f_beta_each_id)

loss

  • define Focal Loss
In [10]:
class FocalLoss:
    """
    Function for Focal loss.
    
    calculates focal loss **for each class**, **sum up** them along class-axis, and **average** them along sample-axis.
    Take data point x and its logit y = model(x),
    using prob p = (p_0, ..., p_C)^T = sigmoid(y) and label t,
    focal loss for each class i caluculated by:
    
        loss_{i}(p, t) = - \alpha' + (1 - p'_i) ** \gamma * ln(p'_i),
    
    where
        \alpha' = { \alpha (t_i = 1)
                  { 1 - \alpha (t_i = 0)
         p'_i   = { p_i (t_i = 1)
                = ( 1 - p_i (t_i = 0)
    """

    def __init__(self, alpha=0.25, gamma=2):
        """Initialize."""
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        
    def __call__(self, y_pred, t, epsilon=1e-31):
        """
        Forward.
        
        p_dash = t * p + (1 - t) * (1 - p) = (1 - t) + (2 * t - 1) * p
        """
        p_dash = functions.clip(
            (1 - t) + (2 * t - 1) * functions.sigmoid(y_pred), epsilon, 1 - epsilon)
        alpha_dash = (1 - t) + (2 * t - 1) * self.alpha
        # # [y_pred: (bs, n_class), t: (bs: n_class)] => loss_by_sample_x_class: (bs, n_class)
        loss_by_sample_x_class = - alpha_dash * (1 - p_dash) ** self.gamma * functions.log(p_dash)
        # # loss_by_sample_x_class: (bs, n_class) => loss_by_sample: (bs, )
        loss_by_sample = functions.sum(loss_by_sample_x_class, axis=1)
        # # loss_by_sample: (bs,) => loss: (1, )
        return functions.mean(loss_by_sample)

datasets

  • image preprocessing functions
  • data augmentor class
In [11]:
def resize_pair(pair, size=settings["image_size"]):
    img, label = pair
    img = transforms.resize(img, size=size)
    return (img, label)

def scale_and_subtract_mean(pair, mean_value=img_chmean_train):
    img, label = pair
    img = (img - mean_value[:, None, None]) / 255.
    return (img, label)
In [12]:
class DataAugmentor():
    """DataAugmentor for Image Classification."""

    def __init__(
        self, image_size, image_mean,
        using_methods=["random_ud_flip","random_lr_flip", "random_90_rotate", "random_crop"]
    ):
        """Initialize."""
        self.img_size = image_size
        self.img_mean = image_mean
        self.img_mean_single = int(image_mean.mean())
        self.using_methods = using_methods
        
        self.func_dict = {
            "random_ud_flip": self._random_ud_flip,
            "random_lr_flip": self._random_lr_flip,
            "random_90_rotate": self._random_90_rotate,
            "random_rotate": self._random_rotate,
            "random_expand": self._random_expand,
            "resize_with_random": self._resize_with_random,
            "random_crop": self._random_crop,
            "random_distort": random_distort,
        }
        # # set da func by given order.
        self.da_funcs = [self.func_dict[um] for um in using_methods]
        
    def __call__(self, pair):
        """Forward"""
        img_arr, label = pair

        for func in self.da_funcs:
            img_arr = func(img_arr)
            
        return img_arr, label

    def _random_lr_flip(self, img_arr):
        """left-right flipping."""
        if np.random.randint(2):
            img_arr = img_arr[:, :, ::-1]
        return img_arr

    def _random_ud_flip(self, img_arr):
        """up-down flipping."""
        if np.random.randint(2):
            img_arr = img_arr[:, ::-1, :]
        return img_arr
    
    def _random_90_rotate(self, img_arr):
        """90 angle rotation."""
        if np.random.randint(2):
            img_arr = img_arr.transpose(0, 2, 1)[:, ::-1, :]
        return img_arr
    
    def _random_rotate(self, img_arr, max_angle=10):
        """random degree rotation."""
        angle = np.random.randint(-max_angle, max_angle + 1)
        if angle == 0:
            return img_arr
        return transforms.rotate(img_arr, angle, fill=self.img_mean_single)

    def _random_expand(self, img_arr):
        """random expansion"""
        if np.random.randint(2):
            return img_arr
        return transforms.random_expand(img_arr, fill=self.img_mean)

    def _resize_with_random(self, img_arr):
        """resize with random interpolation"""
        if img_arr.shape[-2:] == self.img_size:
            return img_arr
        return resize_with_random_interpolation(img_arr, self.img_size)
    
    def _random_crop(self, img_arr, rate=0.5):
        """Random Cropping."""
        crop_size = self.img_size
        resize_size = tuple(map(lambda x: int(x * 256 / 224), self.img_size))

        if np.random.randint(2):
            top = np.random.randint(0, resize_size[0] - crop_size[0])
            botom = top + crop_size[0]
            left = np.random.randint(0, resize_size[1] - crop_size[1])
            right = left + crop_size[1]
            img_arr = transforms.resize(img_arr, size=resize_size)[:, top:botom, left: right]

        return img_arr

training

  • trainer extention for cosine anealing
In [13]:
class CosineShift(chainer.training.extension.Extension):
    """
    Cosine Anealing.
    
    reference link: https://github.com/takedarts/resnetfamily/blob/master/src/mylib/training/extensions/cosine_shift.py
    """
    def __init__(self, attr, value, period, period_mult=1, optimizer=None):
        self._attr = attr
        self._value = value
        self._period = period
        self._period_mult = period_mult
        self._optimizer = optimizer

        if not hasattr(self._value, '__getitem__'):
            self._value = (self._value, 0)

    def initialize(self, trainer):
        self._update_value(trainer)

    def __call__(self, trainer):
        self._update_value(trainer)

    def _update_value(self, trainer):
        optimizer = self._optimizer or trainer.updater.get_optimizer('main')
        epoch = trainer.updater.epoch

        period_range = self._period
        period_start = 0
        period_end = period_range

        while period_end <= epoch:
            period_start = period_end
            period_range *= self._period_mult
            period_end += period_range

        n_max, n_min = self._value
        t_cur = epoch - period_start
        t_i = period_range
        value = n_min + 0.5 * (n_max - n_min) * (1 + cos((t_cur / t_i) * pi))

        setattr(optimizer, self._attr, value)

inference

  • perform predict
  • calculate f-beta score
  • find threshold
  • make pred_ids
In [14]:
def predict(model, val_iter, gpu_device=-1):
    val_pred_list = []
    val_label_list = []
    iter_num = 0
    epoch_test_start = time()

    while True:
        val_batch = val_iter.next()
        iter_num += 1
        print("\rtmp_valid_iteration: {:0>5}".format(iter_num), end="")
        feature_val, label_val = chainer.dataset.concat_examples(val_batch, gpu_device)

        # Forward the test data
        with chainer.no_backprop_mode() and chainer.using_config("train", False):
            prediction_val = model(feature_val)
            val_pred_list.append(prediction_val)
            val_label_list.append(label_val)
            prediction_val.unchain_backward()

        if val_iter.is_new_epoch:
            print(" => valid end: {:.2f} sec".format(time() - epoch_test_start))
            val_iter.epoch = 0
            val_iter.current_position = 0
            val_iter.is_new_epoch = False
            val_iter._pushed_position = None
            break

    val_pred_all = cuda.to_cpu(functions.concat(val_pred_list, axis=0).data)
    val_label_all = cuda.to_cpu(functions.concat(val_label_list, axis=0).data)
    return val_pred_all, val_label_all
In [15]:
def average_fbeta_score(y_prob, t, th=0.2, beta=2, epsilon=1e-09):
    t_pred = (y_prob >= th).astype(int)
    # # t_pred, t: (sample_num, n_class) => true_pps, predicted_pos, poss_pos : (sample_num,)
    true_pos = (t_pred * t).sum(axis=1)
    pred_pos = t_pred.sum(axis=1)
    poss_pos = t.sum(axis=1)

    p_arr = true_pos / (pred_pos + epsilon)
    r_arr = true_pos / (poss_pos + epsilon)
    # # p_arr, r_arr : (n_class,) => f_beta: (n_class)
    f_beta = (1 + beta ** 2) * p_arr * r_arr / ((beta ** 2) * p_arr + r_arr + epsilon)
    return f_beta.mean()

def search_best_threshold(y_prob, t, eval_func, search_range=[0.05, 0.95], interval=0.01):
    tmp_th = search_range[0]
    best_th = 0
    best_eval = -(10**9 + 7)
    while tmp_th < search_range[1]:
        eval_score = eval_func(y_prob, t, th=tmp_th)
        print(tmp_th, eval_score)
        if eval_score > best_eval:
            best_th = tmp_th
            best_eval = eval_score
        tmp_th += interval
    return best_th, best_eval

def make_pred_ids(test_pred, th):
    class_array = np.arange(test_pred.shape[1])
    test_cond = test_pred >= th
    pred_ids = [" ".join(map(str, class_array[cond])) for cond in test_cond]
    return pred_ids
In [16]:
print("[end of setup]: {:.3f}".format(time() - beginning_time))
[end of setup]: 2.784

prepare data

read data

In [17]:
train_df = pd.read_csv("{}/train.csv".format(DATA_DIR))
test_df = pd.read_csv("{}/sample_submission.csv".format(DATA_DIR))
labels_df = pd.read_csv("{}/labels.csv".format(DATA_DIR))

make datasets

label

In [18]:
%%time
train_attr_ohot = np.zeros((len(train_df), len(labels_df)), dtype="i")
for idx, attr_arr in enumerate(
    train_df.attribute_ids.str.split(" ").apply(lambda l: list(map(int, l))).values):
    train_attr_ohot[idx, attr_arr] = 1
CPU times: user 756 ms, sys: 408 ms, total: 1.16 s
Wall time: 1.16 s
In [19]:
print(train_attr_ohot.shape)
train_attr_ohot[:5,:20]
(109237, 1103)
Out[19]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]],
      dtype=int32)

labeled datasets

In [20]:
all_train_dataset = datasets.LabeledImageDataset(
    pairs=list(zip((train_df.id + ".png").tolist(), train_attr_ohot)),
    root="{}/train".format(DATA_DIR))

# # For test set, I set dummy label.
test_dataset = datasets.LabeledImageDataset(
    pairs=list(zip((test_df.id + ".png").tolist(), [-1] * len(test_df))),
    root="{}/test".format(DATA_DIR))

split all train into train and valid

Here, I split train randomly.

In [21]:
train_dataset, valid_dataset = datasets.split_dataset_random(
    all_train_dataset, first_size=int(len(all_train_dataset) * 0.8), seed=seed)
print("train set:", len(train_dataset))
print("valid set:", len(valid_dataset))
train set: 87389
valid set: 21848

set transforms and data augmentations

In [22]:
# # train set, including data augmentation
train_dataset = datasets.TransformDataset(train_dataset, resize_pair)
train_dataset = datasets.TransformDataset(
    train_dataset, DataAugmentor(
        image_size=settings["image_size"], image_mean=img_chmean_train,
        using_methods=settings["da_select"])
)
train_dataset = datasets.TransformDataset(train_dataset, scale_and_subtract_mean)
In [23]:
# # validt set.
valid_dataset = datasets.TransformDataset(valid_dataset, resize_pair)
valid_dataset = datasets.TransformDataset(valid_dataset, scale_and_subtract_mean)
# # test set.
test_dataset = datasets.TransformDataset(test_dataset, resize_pair)
test_dataset = datasets.TransformDataset(test_dataset, scale_and_subtract_mean)
In [24]:
print("[end of preparing data]: {:.3f}".format(time() - beginning_time))
[end of preparing data]: 4.224

make trainer

model

In [25]:
model = Ext2Linear(
    settings["n_class"], FeatureExtractor(settings["pretrained_model_path"]))
train_model = MultiLabelClassifier(
    model, lossfun=FocalLoss(settings["alpha"], settings["gamma"]))

optimizer

In [26]:
opt_class = getattr(optimizers, settings["optimizer"])
if settings["optimizer"] != "Adam":
    optimizer = opt_class(lr=settings["learning_rate"], momentum=settings["momentum"])
    optimizer.setup(train_model)
    optimizer.add_hook(chainer.optimizer.WeightDecay(settings["weight_decay_rate"]))
else:
    optmizer = opt_class(
        alpha=settings["learning_rate"], weight_decay_rate=settings["weight_decay_rate"])
    optimizer.setup(train_model)

iterator

In [27]:
train_iter = iterators.MultiprocessIterator(train_dataset, settings["batch_size"])
valid_iter = iterators.MultiprocessIterator(
    valid_dataset, settings["batch_size"], repeat=False, shuffle=False)

updater, trainer

In [28]:
updater = training.StandardUpdater(train_iter, optimizer, device=0)
trainer = training.trainer.Trainer(updater, stop_trigger=(settings["max_epoch"], "epoch"), out="training_result")

trainer extensions

In [29]:
logging_attributes = ["epoch", "main/loss", "val/main/loss", "val/main/f-beta", "elapsed_time", "lr"]

# # cosine anealing.
trainer.extend(
    CosineShift('lr', [settings["learning_rate"], settings["learning_rate_min"]], settings["epoch_per_cycle"], 1))
# # evaluator.
trainer.extend(
    training.extensions.Evaluator(valid_iter, optimizer.target, device=0), name='val',trigger=(1, 'epoch'))
# # log.
trainer.extend(training.extensions.observe_lr(), trigger=(1, 'epoch'))
trainer.extend(training.extensions.LogReport(logging_attributes), trigger=(1, 'epoch'))
# # standard output.
trainer.extend(training.extensions.PrintReport(logging_attributes), trigger=(1, 'epoch'))
trainer.extend(training.extensions.ProgressBar(update_interval=200))
# # plots.
trainer.extend(
    training.extensions.PlotReport(["main/loss", "val/main/loss"], "epoch", file_name="loss.png"), trigger=(1, "epoch"))
trainer.extend(
    training.extensions.PlotReport(["val/main/f-beta"], "epoch", file_name="fbeta_at02.png"), trigger=(1, "epoch"))
# # snapshot.
trainer.extend(
    training.extensions.snapshot(filename='snapshot_epoch_{.updater.epoch}.npz'), trigger=(10, 'epoch'))
In [30]:
print("[end of preparing trainer]: {:.3f}".format(time() - beginning_time))
[end of preparing trainer]: 8.577
In [31]:
gc.collect()
Out[31]:
0

training

In [32]:
%%time
trainer.run()
     total [..................................................]  1.46%
this epoch [##############....................................] 29.29%
       200 iter, 0 epoch / 20 epochs
       inf iters/sec. Estimated time to finish: 0:00:00.
     total [#.................................................]  2.93%
this epoch [#############################.....................] 58.59%
       400 iter, 0 epoch / 20 epochs
   0.88442 iters/sec. Estimated time to finish: 4:09:46.738194.
     total [##................................................]  4.39%
this epoch [###########################################.......] 87.88%
       600 iter, 0 epoch / 20 epochs
   0.88303 iters/sec. Estimated time to finish: 4:06:23.792147.
epoch       main/loss   val/main/loss  val/main/f-beta  elapsed_time  lr        
1           1.84346     1.84488        0.327402         905.784       0.01        
     total [##................................................]  5.86%
this epoch [########..........................................] 17.18%
       800 iter, 1 epoch / 20 epochs
   0.76245 iters/sec. Estimated time to finish: 4:40:59.461405.
     total [###...............................................]  7.32%
this epoch [#######################...........................] 46.47%
      1000 iter, 1 epoch / 20 epochs
    0.7927 iters/sec. Estimated time to finish: 4:26:03.743828.
     total [####..............................................]  8.79%
this epoch [#####################################.............] 75.77%
      1200 iter, 1 epoch / 20 epochs
   0.81183 iters/sec. Estimated time to finish: 4:15:41.314727.
2           1.89171     1.6726         0.360124         1775.38       0.00993906  
     total [#####.............................................] 10.25%
this epoch [##................................................]  5.06%
      1400 iter, 2 epoch / 20 epochs
   0.76693 iters/sec. Estimated time to finish: 4:26:18.678384.
     total [#####.............................................] 11.72%
this epoch [#################.................................] 34.35%
      1600 iter, 2 epoch / 20 epochs
     0.784 iters/sec. Estimated time to finish: 4:16:15.713714.
     total [######............................................] 13.18%
this epoch [###############################...................] 63.65%
      1800 iter, 2 epoch / 20 epochs
   0.79677 iters/sec. Estimated time to finish: 4:07:58.214008.
     total [#######...........................................] 14.65%
this epoch [##############################################....] 92.94%
      2000 iter, 2 epoch / 20 epochs
   0.80752 iters/sec. Estimated time to finish: 4:00:32.441552.
3           1.69786     1.60324        0.385831         2639.39       0.00975773  
     total [########..........................................] 16.11%
this epoch [###########.......................................] 22.24%
      2200 iter, 3 epoch / 20 epochs
   0.78216 iters/sec. Estimated time to finish: 4:04:04.675457.
     total [########..........................................] 17.58%
this epoch [#########################.........................] 51.53%
      2400 iter, 3 epoch / 20 epochs
   0.79227 iters/sec. Estimated time to finish: 3:56:45.359039.
     total [#########.........................................] 19.04%
this epoch [########################################..........] 80.83%
      2600 iter, 3 epoch / 20 epochs
   0.80055 iters/sec. Estimated time to finish: 3:50:08.717046.
4           1.50393     1.55912        0.397146         3500.68       0.00946048  
     total [##########........................................] 20.51%
this epoch [#####.............................................] 10.12%
      2800 iter, 4 epoch / 20 epochs
   0.78103 iters/sec. Estimated time to finish: 3:51:37.734289.
     total [##########........................................] 21.97%
this epoch [###################...............................] 39.41%
      3000 iter, 4 epoch / 20 epochs
   0.78927 iters/sec. Estimated time to finish: 3:44:59.149311.
     total [###########.......................................] 23.44%
this epoch [##################################................] 68.71%
      3200 iter, 4 epoch / 20 epochs
   0.79699 iters/sec. Estimated time to finish: 3:38:37.578367.
     total [############......................................] 24.90%
this epoch [#################################################.] 98.00%
      3400 iter, 4 epoch / 20 epochs
   0.80296 iters/sec. Estimated time to finish: 3:32:50.918295.
5           1.48012     1.5168         0.417717         4357.88       0.00905463  
     total [#############.....................................] 26.36%
this epoch [#############.....................................] 27.30%
      3600 iter, 5 epoch / 20 epochs
   0.78846 iters/sec. Estimated time to finish: 3:32:32.135156.
     total [#############.....................................] 27.83%
this epoch [############################......................] 56.59%
      3800 iter, 5 epoch / 20 epochs
   0.79438 iters/sec. Estimated time to finish: 3:26:45.352386.
     total [##############....................................] 29.29%
this epoch [##########################################........] 85.89%
      4000 iter, 5 epoch / 20 epochs
    0.7999 iters/sec. Estimated time to finish: 3:21:09.702168.
6           1.40321     1.51079        0.416442         5212.11       0.00855018  
     total [###############...................................] 30.76%
this epoch [#######...........................................] 15.18%
      4200 iter, 6 epoch / 20 epochs
   0.78765 iters/sec. Estimated time to finish: 3:20:03.422297.
     total [################..................................] 32.22%
this epoch [######################............................] 44.47%
      4400 iter, 6 epoch / 20 epochs
   0.79309 iters/sec. Estimated time to finish: 3:14:28.960258.
     total [################..................................] 33.69%
this epoch [####################################..............] 73.77%
      4600 iter, 6 epoch / 20 epochs
   0.79761 iters/sec. Estimated time to finish: 3:09:12.007681.
7           1.40264     1.47801        0.426562         6071.37       0.00795954  
     total [#################.................................] 35.15%
this epoch [#.................................................]  3.06%
      4800 iter, 7 epoch / 20 epochs
   0.78695 iters/sec. Estimated time to finish: 3:07:31.703903.
     total [##################................................] 36.62%
this epoch [################..................................] 32.36%
      5000 iter, 7 epoch / 20 epochs
   0.79154 iters/sec. Estimated time to finish: 3:02:13.767335.
     total [###################...............................] 38.08%
this epoch [##############################....................] 61.65%
      5200 iter, 7 epoch / 20 epochs
   0.79586 iters/sec. Estimated time to finish: 2:57:03.123424.
     total [###################...............................] 39.55%
this epoch [#############################################.....] 90.95%
      5400 iter, 7 epoch / 20 epochs
   0.79987 iters/sec. Estimated time to finish: 2:51:59.799786.
8           1.461       1.47439        0.439571         6927.31       0.00729725  
     total [####################..............................] 41.01%
this epoch [##########........................................] 20.24%
      5600 iter, 8 epoch / 20 epochs
   0.79064 iters/sec. Estimated time to finish: 2:49:47.402856.
     total [#####################.............................] 42.48%
this epoch [########################..........................] 49.53%
      5800 iter, 8 epoch / 20 epochs
    0.7944 iters/sec. Estimated time to finish: 2:44:47.318134.
     total [#####################.............................] 43.94%
this epoch [#######################################...........] 78.83%
      6000 iter, 8 epoch / 20 epochs
    0.7978 iters/sec. Estimated time to finish: 2:39:54.599433.
9           1.43095     1.46865        0.437263         7786.93       0.00657963  
     total [######################............................] 45.41%
this epoch [####..............................................]  8.12%
      6200 iter, 9 epoch / 20 epochs
   0.78942 iters/sec. Estimated time to finish: 2:37:23.025393.
     total [#######################...........................] 46.87%
this epoch [##################................................] 37.42%
      6400 iter, 9 epoch / 20 epochs
    0.7928 iters/sec. Estimated time to finish: 2:32:30.485840.
     total [########################..........................] 48.34%
this epoch [#################################.................] 66.71%
      6600 iter, 9 epoch / 20 epochs
   0.79604 iters/sec. Estimated time to finish: 2:27:42.075086.
     total [########################..........................] 49.80%
this epoch [################################################..] 96.01%
      6800 iter, 9 epoch / 20 epochs
   0.79916 iters/sec. Estimated time to finish: 2:22:57.123803.
10          1.358       1.45705        0.448294         8644.33       0.00582435  
     total [#########################.........................] 51.27%
this epoch [############......................................] 25.30%
      7000 iter, 10 epoch / 20 epochs
   0.78933 iters/sec. Estimated time to finish: 2:20:30.599141.
     total [##########################........................] 52.73%
this epoch [###########################.......................] 54.59%
      7200 iter, 10 epoch / 20 epochs
   0.79237 iters/sec. Estimated time to finish: 2:15:45.894590.
     total [###########################.......................] 54.19%
this epoch [#########################################.........] 83.89%
      7400 iter, 10 epoch / 20 epochs
   0.79517 iters/sec. Estimated time to finish: 2:11:05.674291.
11          1.23153     1.45948        0.459342         9532.5        0.00505     
     total [###########################.......................] 55.66%
this epoch [######............................................] 13.18%
      7600 iter, 11 epoch / 20 epochs
   0.78856 iters/sec. Estimated time to finish: 2:07:57.936234.
     total [############################......................] 57.12%
this epoch [#####################.............................] 42.48%
      7800 iter, 11 epoch / 20 epochs
   0.79156 iters/sec. Estimated time to finish: 2:03:16.239848.
     total [#############################.....................] 58.59%
this epoch [###################################...............] 71.77%
      8000 iter, 11 epoch / 20 epochs
    0.7943 iters/sec. Estimated time to finish: 1:58:38.920331.
12          1.2045      1.45797        0.456399         10388.7       0.00427565  
     total [##############################....................] 60.05%
this epoch [..................................................]  1.07%
      8200 iter, 12 epoch / 20 epochs
   0.78825 iters/sec. Estimated time to finish: 1:55:19.807969.
     total [##############################....................] 61.52%
this epoch [###############...................................] 30.36%
      8400 iter, 12 epoch / 20 epochs
   0.79085 iters/sec. Estimated time to finish: 1:50:44.174800.
     total [###############################...................] 62.98%
this epoch [#############################.....................] 59.66%
      8600 iter, 12 epoch / 20 epochs
   0.79365 iters/sec. Estimated time to finish: 1:46:08.724601.
     total [################################..................] 64.45%
this epoch [############################################......] 88.95%
      8800 iter, 12 epoch / 20 epochs
   0.79607 iters/sec. Estimated time to finish: 1:41:38.132018.
13          1.18797     1.45938        0.462914         11242         0.00352037  
     total [################################..................] 65.91%
this epoch [#########.........................................] 18.24%
      9000 iter, 13 epoch / 20 epochs
   0.79066 iters/sec. Estimated time to finish: 1:38:06.858230.
     total [#################################.................] 67.38%
this epoch [#######################...........................] 47.54%
      9200 iter, 13 epoch / 20 epochs
   0.79305 iters/sec. Estimated time to finish: 1:33:36.984995.
     total [##################################................] 68.84%
this epoch [######################################............] 76.83%
      9400 iter, 13 epoch / 20 epochs
   0.79539 iters/sec. Estimated time to finish: 1:29:08.963130.
14          1.1368      1.46993        0.467138         12096.3       0.00280275  
     total [###################################...............] 70.31%
this epoch [###...............................................]  6.13%
      9600 iter, 14 epoch / 20 epochs
   0.79032 iters/sec. Estimated time to finish: 1:25:30.239482.
     total [###################################...............] 71.77%
this epoch [#################.................................] 35.42%
      9800 iter, 14 epoch / 20 epochs
   0.79275 iters/sec. Estimated time to finish: 1:21:02.248166.
     total [####################################..............] 73.24%
this epoch [################################..................] 64.72%
     10000 iter, 14 epoch / 20 epochs
   0.79484 iters/sec. Estimated time to finish: 1:16:37.798799.
     total [#####################################.............] 74.70%
this epoch [###############################################...] 94.01%
     10200 iter, 14 epoch / 20 epochs
   0.79686 iters/sec. Estimated time to finish: 1:12:15.180262.
15          1.05475     1.47297        0.471066         12949.6       0.00214046  
     total [######################################............] 76.17%
this epoch [###########.......................................] 23.30%
     10400 iter, 15 epoch / 20 epochs
   0.79224 iters/sec. Estimated time to finish: 1:08:28.033704.
     total [######################################............] 77.63%
this epoch [##########################........................] 52.60%
     10600 iter, 15 epoch / 20 epochs
   0.79438 iters/sec. Estimated time to finish: 1:04:05.194803.
     total [#######################################...........] 79.09%
this epoch [########################################..........] 81.89%
     10800 iter, 15 epoch / 20 epochs
   0.79636 iters/sec. Estimated time to finish: 0:59:44.456415.
16          1.22934     1.4775         0.474661         13802.3       0.00154982  
     total [########################################..........] 80.56%
this epoch [#####.............................................] 11.19%
     11000 iter, 16 epoch / 20 epochs
   0.79198 iters/sec. Estimated time to finish: 0:55:51.773944.
     total [#########################################.........] 82.02%
this epoch [####################..............................] 40.48%
     11200 iter, 16 epoch / 20 epochs
   0.79395 iters/sec. Estimated time to finish: 0:51:31.535271.
     total [#########################################.........] 83.49%
this epoch [##################################................] 69.78%
     11400 iter, 16 epoch / 20 epochs
   0.79544 iters/sec. Estimated time to finish: 0:47:14.318238.
     total [##########################################........] 84.95%
this epoch [#################################################.] 99.07%
     11600 iter, 16 epoch / 20 epochs
   0.79686 iters/sec. Estimated time to finish: 0:42:58.280664.
17          0.976717    1.47861        0.477268         14673.6       0.00104537  
     total [###########################################.......] 86.42%
this epoch [##############....................................] 28.36%
     11800 iter, 17 epoch / 20 epochs
   0.79217 iters/sec. Estimated time to finish: 0:39:01.076442.
     total [###########################################.......] 87.88%
this epoch [############################......................] 57.66%
     12000 iter, 17 epoch / 20 epochs
   0.79361 iters/sec. Estimated time to finish: 0:34:44.804315.
     total [############################################......] 89.35%
this epoch [###########################################.......] 86.95%
     12200 iter, 17 epoch / 20 epochs
   0.79495 iters/sec. Estimated time to finish: 0:30:29.717776.
18          1.05242     1.48004        0.478888         15557.8       0.000639518  
     total [#############################################.....] 90.81%
this epoch [########..........................................] 16.25%
     12400 iter, 18 epoch / 20 epochs
   0.79043 iters/sec. Estimated time to finish: 0:26:27.157455.
     total [##############################################....] 92.28%
this epoch [######################............................] 45.54%
     12600 iter, 18 epoch / 20 epochs
   0.79171 iters/sec. Estimated time to finish: 0:22:11.967526.
     total [##############################################....] 93.74%
this epoch [#####################################.............] 74.84%
     12800 iter, 18 epoch / 20 epochs
    0.7929 iters/sec. Estimated time to finish: 0:17:57.724523.
19          1.03445     1.48134        0.47974          16447.6       0.00034227  
     total [###############################################...] 95.21%
this epoch [##................................................]  4.13%
     13000 iter, 19 epoch / 20 epochs
   0.78859 iters/sec. Estimated time to finish: 0:13:49.997890.
     total [################################################..] 96.67%
this epoch [################..................................] 33.42%
     13200 iter, 19 epoch / 20 epochs
   0.78983 iters/sec. Estimated time to finish: 0:09:35.482510.
     total [#################################################.] 98.14%
this epoch [###############################...................] 62.72%
     13400 iter, 19 epoch / 20 epochs
   0.79099 iters/sec. Estimated time to finish: 0:05:21.788387.
     total [#################################################.] 99.60%
this epoch [##############################################....] 92.01%
     13600 iter, 19 epoch / 20 epochs
   0.79205 iters/sec. Estimated time to finish: 0:01:08.848140.
20          1.00928     1.48085        0.479131         17341.1       0.000160943  
CPU times: user 2h 44min 50s, sys: 6min 41s, total: 2h 51min 32s
Wall time: 4h 49min 31s
In [33]:
# # save last model
trained_model = trainer.updater.get_optimizer('main').target.predictor
serializers.save_npz('{}/epoch{:0>3}.model'.format("training_result", settings["max_epoch"]), trained_model)

inference

find best thr for valid set

In [34]:
valid_iter = iterators.MultiprocessIterator(
    valid_dataset, settings["batch_size"], repeat=False, shuffle=False)
val_pred, val_label = predict(trained_model, valid_iter, gpu_device=0)
tmp_valid_iteration: 00171 => valid end: 115.13 sec
In [35]:
val_prob = functions.sigmoid(val_pred).data
best_th, best_fbeta = search_best_threshold(
    val_prob, val_label, eval_func=average_fbeta_score, search_range=[0.1, 0.9], interval=0.01)
print(best_th, best_fbeta)
0.1 0.29193848049275284
0.11 0.3167811898813328
0.12 0.3399869588996781
0.13 0.3619075727526271
0.14 0.3826219931021119
0.15000000000000002 0.402132982171192
0.16000000000000003 0.42017425206490605
0.17000000000000004 0.4370941835456019
0.18000000000000005 0.452668045496202
0.19000000000000006 0.4665315803522108
0.20000000000000007 0.4791239130020726
0.21000000000000008 0.49026957957106126
0.22000000000000008 0.5002811409500709
0.2300000000000001 0.5087305222953769
0.2400000000000001 0.5154724641049446
0.2500000000000001 0.521459258660511
0.2600000000000001 0.5259480062360666
0.27000000000000013 0.5287624242927098
0.28000000000000014 0.5307773313533709
0.29000000000000015 0.5322558048086482
0.30000000000000016 0.5314589600306211
0.31000000000000016 0.5304807890142396
0.3200000000000002 0.5288889231886027
0.3300000000000002 0.5260834725162676
0.3400000000000002 0.5224687480399633
0.3500000000000002 0.5184849101265114
0.3600000000000002 0.5137086394273588
0.3700000000000002 0.5083511553010002
0.3800000000000002 0.5018591202189813
0.39000000000000024 0.4945928355229062
0.40000000000000024 0.4878332512419721
0.41000000000000025 0.47985048254024576
0.42000000000000026 0.47134901166144405
0.43000000000000027 0.46320757549800584
0.4400000000000003 0.45448069206044195
0.4500000000000003 0.44480844917050855
0.4600000000000003 0.43500484885549395
0.4700000000000003 0.4260540229525451
0.4800000000000003 0.41655405540616586
0.4900000000000003 0.40602464399376226
0.5000000000000003 0.3965632547906794
0.5100000000000003 0.3865561265252688
0.5200000000000004 0.37657847104742476
0.5300000000000004 0.36672211368992874
0.5400000000000004 0.35697491590561686
0.5500000000000004 0.34739005023232944
0.5600000000000004 0.33814373575791723
0.5700000000000004 0.32887158729464483
0.5800000000000004 0.3190210495072529
0.5900000000000004 0.3101913642339998
0.6000000000000004 0.30080444966652986
0.6100000000000004 0.2914339644215463
0.6200000000000004 0.28166504355919775
0.6300000000000004 0.27199581992532323
0.6400000000000005 0.2628554557362487
0.6500000000000005 0.2543172384788072
0.6600000000000005 0.24562620318158546
0.6700000000000005 0.2369189742071019
0.6800000000000005 0.2285088133574542
0.6900000000000005 0.2208711814340161
0.7000000000000005 0.21255799000278558
0.7100000000000005 0.20379126022327704
0.7200000000000005 0.1952883786690127
0.7300000000000005 0.18655605051219937
0.7400000000000005 0.17832627509007978
0.7500000000000006 0.16999338515644888
0.7600000000000006 0.16153108465635937
0.7700000000000006 0.15381385577443948
0.7800000000000006 0.14598588735930068
0.7900000000000006 0.13828348283112576
0.8000000000000006 0.1306519099414092
0.8100000000000006 0.12313345210597215
0.8200000000000006 0.11547028673185829
0.8300000000000006 0.10775501270110702
0.8400000000000006 0.10053051112251055
0.8500000000000006 0.09284186548612881
0.8600000000000007 0.08596108374903329
0.8700000000000007 0.078594385453242
0.8800000000000007 0.07106764020071697
0.8900000000000007 0.0640652450346372
0.29000000000000015 0.5322558048086482

submit

In [36]:
test_iter = iterators.MultiprocessIterator(
    test_dataset, settings["batch_size"], repeat=False, shuffle=False)
test_pred, _ = predict(trained_model, test_iter, gpu_device=0)
tmp_valid_iteration: 00059 => valid end: 40.22 sec
In [37]:
test_prob = functions.sigmoid(test_pred).data
test_pred_ids = make_pred_ids(test_prob, th=best_th)
In [38]:
test_df.attribute_ids = test_pred_ids
print(test_df.shape)
test_df.head()
(7443, 2)
Out[38]:
id attribute_ids
0 10023b2cc4ed5f68 195 223 343 344 369 766 1059
1 100fbe75ed8fd887 93 188 1039
2 101b627524a04f19 79 147 180 728 784 961 996
3 10234480c41284c6 147 483 501 553 725 738 776 813 830 1046
4 1023b0e2636dcea8 147 283 322 477 501 584 671 737 776 813 954 10...
In [39]:
test_df.to_csv("submission.csv", index=False)