Multi-label F-beta score

From: https://www.kaggle.com/arailly/multi-label-f-beta-score

Author: arailly

Calculate F score on multi-label classification task with scikit-learn and scipy.sparse


In [1]:
import numpy as np
import pandas as pd
import os

from scipy.sparse import lil_matrix
from sklearn.metrics import fbeta_score

single-label

example from scikit-learn fbeta_score

In [2]:
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
In [3]:
fbeta_score(y_true, y_pred, average='macro', beta=0.5)
Out[3]:
0.23809523809523805

multi-label

In [4]:
y_true = [[0, 1], [1], [1, 2], [0], [1], [0, 2]]
y_pred = [[0], [0, 2], [1, 2], [2], [0, 1], [1, 2]]
In [5]:
# fbeta_score(y_true, y_pred, average='macro', beta=0.5)
# -> ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

Convert into sparse matrix

In [6]:
def label_to_sm(labels, n_classes):
    sm = lil_matrix((len(labels), n_classes))
    for i, label in enumerate(labels):
        sm[i, label] = 1
    return sm
In [7]:
y_true_sm = label_to_sm(labels=y_true, n_classes=3)
y_true_sm.toarray()
Out[7]:
array([[1., 1., 0.],
       [0., 1., 0.],
       [0., 1., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 1.]])
In [8]:
y_pred_sm = label_to_sm(labels=y_pred, n_classes=3)
y_pred_sm.toarray()
Out[8]:
array([[1., 0., 0.],
       [1., 0., 1.],
       [0., 1., 1.],
       [0., 0., 1.],
       [1., 1., 0.],
       [0., 1., 1.]])
In [9]:
fbeta_score(y_true_sm, y_pred_sm, average='macro', beta=0.5)
Out[9]:
0.5046296296296297

yay!