Ensembling iMet

From: https://www.kaggle.com/axel81/ensembling-imet

Author: Ram Ramrakhya

iMet Collection 2019

This kernel is an implementation for KFolding using the already trained models. Run time for the kernel is 1.5hrs which will work even on the huge test set.

In this Kernel I'll use 6 Folds

  • I have used on Konrads Kernel from here
  • Four SeResNext50 with LB score between 0.597-0.608
  • One Resnet50
In [1]:
import pandas as pd
import gzip
import base64
import os
from pathlib import Path
from typing import Dict


# this is base64 encoded source code
file_data: Dict = {'imet/transforms.py': 'aW1wb3J0IHJhbmRvbQppbXBvcnQgbWF0aAoKZnJvbSBQSUwgaW1wb3J0IEltYWdlCmZyb20gdG9yY2h2aXNpb24udHJhbnNmb3JtcyBpbXBvcnQgKAogICAgVG9UZW5zb3IsIE5vcm1hbGl6ZSwgQ29tcG9zZSwgUmVzaXplLCBDZW50ZXJDcm9wLCBSYW5kb21Dcm9wLAogICAgUmFuZG9tSG9yaXpvbnRhbEZsaXApCgoKY2xhc3MgUmFuZG9tU2l6ZWRDcm9wOgogICAgIiIiUmFuZG9tIGNyb3AgdGhlIGdpdmVuIFBJTC5JbWFnZSB0byBhIHJhbmRvbSBzaXplCiAgICBvZiB0aGUgb3JpZ2luYWwgc2l6ZSBhbmQgYW5kIGEgcmFuZG9tIGFzcGVjdCByYXRpbwogICAgb2YgdGhlIG9yaWdpbmFsIGFzcGVjdCByYXRpby4KICAgIHNpemU6IHNpemUgb2YgdGhlIHNtYWxsZXIgZWRnZQogICAgaW50ZXJwb2xhdGlvbjogRGVmYXVsdDogUElMLkltYWdlLkJJTElORUFSCiAgICAiIiIKCiAgICBkZWYgX19pbml0X18oc2VsZiwgc2l6ZSwgaW50ZXJwb2xhdGlvbj1JbWFnZS5CSUxJTkVBUiwKICAgICAgICAgICAgICAgICBtaW5fYXNwZWN0PTQvNSwgbWF4X2FzcGVjdD01LzQsCiAgICAgICAgICAgICAgICAgbWluX2FyZWE9MC4yNSwgbWF4X2FyZWE9MSk6CiAgICAgICAgc2VsZi5zaXplID0gc2l6ZQogICAgICAgIHNlbGYuaW50ZXJwb2xhdGlvbiA9IGludGVycG9sYXRpb24KICAgICAgICBzZWxmLm1pbl9hc3BlY3QgPSBtaW5fYXNwZWN0CiAgICAgICAgc2VsZi5tYXhfYXNwZWN0ID0gbWF4X2FzcGVjdAogICAgICAgIHNlbGYubWluX2FyZWEgPSBtaW5fYXJlYQogICAgICAgIHNlbGYubWF4X2FyZWEgPSBtYXhfYXJlYQoKICAgIGRlZiBfX2NhbGxfXyhzZWxmLCBpbWcpOgogICAgICAgIGZvciBhdHRlbXB0IGluIHJhbmdlKDEwKToKICAgICAgICAgICAgYXJlYSA9IGltZy5zaXplWzBdICogaW1nLnNpemVbMV0KICAgICAgICAgICAgdGFyZ2V0X2FyZWEgPSByYW5kb20udW5pZm9ybShzZWxmLm1pbl9hcmVhLCBzZWxmLm1heF9hcmVhKSAqIGFyZWEKICAgICAgICAgICAgYXNwZWN0X3JhdGlvID0gcmFuZG9tLnVuaWZvcm0oc2VsZi5taW5fYXNwZWN0LCBzZWxmLm1heF9hc3BlY3QpCgogICAgICAgICAgICB3ID0gaW50KHJvdW5kKG1hdGguc3FydCh0YXJnZXRfYXJlYSAqIGFzcGVjdF9yYXRpbykpKQogICAgICAgICAgICBoID0gaW50KHJvdW5kKG1hdGguc3FydCh0YXJnZXRfYXJlYSAvIGFzcGVjdF9yYXRpbykpKQoKICAgICAgICAgICAgaWYgcmFuZG9tLnJhbmRvbSgpIDwgMC41OgogICAgICAgICAgICAgICAgdywgaCA9IGgsIHcKCiAgICAgICAgICAgIGlmIHcgPD0gaW1nLnNpemVbMF0gYW5kIGggPD0gaW1nLnNpemVbMV06CiAgICAgICAgICAgICAgICB4MSA9IHJhbmRvbS5yYW5kaW50KDAsIGltZy5zaXplWzBdIC0gdykKICAgICAgICAgICAgICAgIHkxID0gcmFuZG9tLnJhbmRpbnQoMCwgaW1nLnNpemVbMV0gLSBoKQoKICAgICAgICAgICAgICAgIGltZyA9IGltZy5jcm9wKCh4MSwgeTEsIHgxICsgdywgeTEgKyBoKSkKICAgICAgICAgICAgICAgIGFzc2VydChpbWcuc2l6ZSA9PSAodywgaCkpCgogICAgICAgICAgICAgICAgcmV0dXJuIGltZy5yZXNpemUoKHNlbGYuc2l6ZSwgc2VsZi5zaXplKSwgc2VsZi5pbnRlcnBvbGF0aW9uKQoKICAgICAgICAjIEZhbGxiYWNrCiAgICAgICAgc2NhbGUgPSBSZXNpemUoc2VsZi5zaXplLCBpbnRlcnBvbGF0aW9uPXNlbGYuaW50ZXJwb2xhdGlvbikKICAgICAgICBjcm9wID0gQ2VudGVyQ3JvcChzZWxmLnNpemUpCiAgICAgICAgcmV0dXJuIGNyb3Aoc2NhbGUoaW1nKSkKCgp0cmFpbl90cmFuc2Zvcm0gPSBDb21wb3NlKFsKICAgIFJhbmRvbUNyb3AoMjg4KSwKICAgIFJhbmRvbUhvcml6b250YWxGbGlwKCksCl0pCgoKdGVzdF90cmFuc2Zvcm0gPSBDb21wb3NlKFsKICAgIFJhbmRvbUNyb3AoMjg4KSwKICAgIFJhbmRvbUhvcml6b250YWxGbGlwKCksCl0pCgoKdGVuc29yX3RyYW5zZm9ybSA9IENvbXBvc2UoWwogICAgVG9UZW5zb3IoKSwKICAgIE5vcm1hbGl6ZShtZWFuPVswLjQ4NSwgMC40NTYsIDAuNDA2XSwgc3RkPVswLjIyOSwgMC4yMjQsIDAuMjI1XSksCl0pCg==', 
                    'imet/make_submission.py': 'aW1wb3J0IGFyZ3BhcnNlCgppbXBvcnQgcGFuZGFzIGFzIHBkCgpmcm9tIC51dGlscyBpbXBvcnQgbWVhbl9kZgpmcm9tIC5kYXRhc2V0IGltcG9ydCBEQVRBX1JPT1QKZnJvbSAubWFpbiBpbXBvcnQgYmluYXJpemVfcHJlZGljdGlvbgoKCmRlZiBtYWluKCk6CiAgICBwYXJzZXIgPSBhcmdwYXJzZS5Bcmd1bWVudFBhcnNlcigpCiAgICBhcmcgPSBwYXJzZXIuYWRkX2FyZ3VtZW50CiAgICBhcmcoJ3ByZWRpY3Rpb25zJywgbmFyZ3M9JysnKQogICAgYXJnKCdvdXRwdXQnKQogICAgYXJnKCctLXRocmVzaG9sZCcsIHR5cGU9ZmxvYXQsIGRlZmF1bHQ9MC4yKQogICAgYXJncyA9IHBhcnNlci5wYXJzZV9hcmdzKCkKICAgIHNhbXBsZV9zdWJtaXNzaW9uID0gcGQucmVhZF9jc3YoCiAgICAgICAgREFUQV9ST09UIC8gJ3NhbXBsZV9zdWJtaXNzaW9uLmNzdicsIGluZGV4X2NvbD0naWQnKQogICAgZGZzID0gW10KICAgIGZvciBwcmVkaWN0aW9uIGluIGFyZ3MucHJlZGljdGlvbnM6CiAgICAgICAgZGYgPSBwZC5yZWFkX2hkZihwcmVkaWN0aW9uLCBpbmRleF9jb2w9J2lkJykKICAgICAgICBkZiA9IGRmLnJlaW5kZXgoc2FtcGxlX3N1Ym1pc3Npb24uaW5kZXgpCiAgICAgICAgZGZzLmFwcGVuZChkZikKICAgIGRmID0gcGQuY29uY2F0KGRmcykKICAgIGRmID0gbWVhbl9kZihkZikKICAgIGRmWzpdID0gYmluYXJpemVfcHJlZGljdGlvbihkZi52YWx1ZXMsIHRocmVzaG9sZD1hcmdzLnRocmVzaG9sZCkKICAgIGRmID0gZGYuYXBwbHkoZ2V0X2NsYXNzZXMsIGF4aXM9MSkKICAgIGRmLm5hbWUgPSAnYXR0cmlidXRlX2lkcycKICAgIGRmLnRvX2NzdihhcmdzLm91dHB1dCwgaGVhZGVyPVRydWUpCgoKZGVmIGdldF9jbGFzc2VzKGl0ZW0pOgogICAgcmV0dXJuICcgJy5qb2luKGNscyBmb3IgY2xzLCBpc19wcmVzZW50IGluIGl0ZW0uaXRlbXMoKSBpZiBpc19wcmVzZW50KQoKCmlmIF9fbmFtZV9fID09ICdfX21haW5fXyc6CiAgICBtYWluKCkK', 
                    'imet/models.py': 'ZnJvbSBmdW5jdG9vbHMgaW1wb3J0IHBhcnRpYWwKCmltcG9ydCB0b3JjaApmcm9tIHRvcmNoIGltcG9ydCBubgpmcm9tIHRvcmNoLm5uIGltcG9ydCBmdW5jdGlvbmFsIGFzIEYKaW1wb3J0IHRvcmNodmlzaW9uLm1vZGVscyBhcyBNCgpmcm9tIC51dGlscyBpbXBvcnQgT05fS0FHR0xFCgoKY2xhc3MgQXZnUG9vbChubi5Nb2R1bGUpOgogICAgZGVmIGZvcndhcmQoc2VsZiwgeCk6CiAgICAgICAgcmV0dXJuIEYuYXZnX3Bvb2wyZCh4LCB4LnNoYXBlWzI6XSkKCgpkZWYgY3JlYXRlX25ldChuZXRfY2xzLCBwcmV0cmFpbmVkOiBib29sKToKICAgIGlmIE9OX0tBR0dMRSBhbmQgcHJldHJhaW5lZDoKICAgICAgICBuZXQgPSBuZXRfY2xzKCkKICAgICAgICBtb2RlbF9uYW1lID0gbmV0X2Nscy5fX25hbWVfXwogICAgICAgIHdlaWdodHNfcGF0aCA9IGYnLi4vaW5wdXQve21vZGVsX25hbWV9L3ttb2RlbF9uYW1lfS5wdGgnCiAgICAgICAgbmV0LmxvYWRfc3RhdGVfZGljdCh0b3JjaC5sb2FkKHdlaWdodHNfcGF0aCkpCiAgICBlbHNlOgogICAgICAgIG5ldCA9IG5ldF9jbHMocHJldHJhaW5lZD1wcmV0cmFpbmVkKQogICAgcmV0dXJuIG5ldAoKCmNsYXNzIFJlc05ldChubi5Nb2R1bGUpOgogICAgZGVmIF9faW5pdF9fKHNlbGYsIG51bV9jbGFzc2VzLAogICAgICAgICAgICAgICAgIHByZXRyYWluZWQ9RmFsc2UsIG5ldF9jbHM9TS5yZXNuZXQxMDEsIGRyb3BvdXQ9RmFsc2UpOgogICAgICAgIHN1cGVyKCkuX19pbml0X18oKQogICAgICAgIHNlbGYubmV0ID0gY3JlYXRlX25ldChuZXRfY2xzLCBwcmV0cmFpbmVkPXByZXRyYWluZWQpCiAgICAgICAgc2VsZi5uZXQuYXZncG9vbCA9IEF2Z1Bvb2woKQogICAgICAgIGlmIGRyb3BvdXQ6CiAgICAgICAgICAgIHNlbGYubmV0LmZjID0gbm4uU2VxdWVudGlhbCgKICAgICAgICAgICAgICAgIG5uLkRyb3BvdXQoKSwKICAgICAgICAgICAgICAgIG5uLkxpbmVhcihzZWxmLm5ldC5mYy5pbl9mZWF0dXJlcywgbnVtX2NsYXNzZXMpLAogICAgICAgICAgICApCiAgICAgICAgZWxzZToKICAgICAgICAgICAgc2VsZi5uZXQuZmMgPSBubi5MaW5lYXIoc2VsZi5uZXQuZmMuaW5fZmVhdHVyZXMsIG51bV9jbGFzc2VzKQoKICAgIGRlZiBmcmVzaF9wYXJhbXMoc2VsZik6CiAgICAgICAgcmV0dXJuIHNlbGYubmV0LmZjLnBhcmFtZXRlcnMoKQoKICAgIGRlZiBmb3J3YXJkKHNlbGYsIHgpOgogICAgICAgIHJldHVybiBzZWxmLm5ldCh4KQoKCmNsYXNzIERlbnNlTmV0KG5uLk1vZHVsZSk6CiAgICBkZWYgX19pbml0X18oc2VsZiwgbnVtX2NsYXNzZXMsCiAgICAgICAgICAgICAgICAgcHJldHJhaW5lZD1GYWxzZSwgbmV0X2Nscz1NLmRlbnNlbmV0MTIxKToKICAgICAgICBzdXBlcigpLl9faW5pdF9fKCkKICAgICAgICBzZWxmLm5ldCA9IGNyZWF0ZV9uZXQobmV0X2NscywgcHJldHJhaW5lZD1wcmV0cmFpbmVkKQogICAgICAgIHNlbGYuYXZnX3Bvb2wgPSBBdmdQb29sKCkKICAgICAgICBzZWxmLm5ldC5jbGFzc2lmaWVyID0gbm4uTGluZWFyKAogICAgICAgICAgICBzZWxmLm5ldC5jbGFzc2lmaWVyLmluX2ZlYXR1cmVzLCBudW1fY2xhc3NlcykKCiAgICBkZWYgZnJlc2hfcGFyYW1zKHNlbGYpOgogICAgICAgIHJldHVybiBzZWxmLm5ldC5jbGFzc2lmaWVyLnBhcmFtZXRlcnMoKQoKICAgIGRlZiBmb3J3YXJkKHNlbGYsIHgpOgogICAgICAgIG91dCA9IHNlbGYubmV0LmZlYXR1cmVzKHgpCiAgICAgICAgb3V0ID0gRi5yZWx1KG91dCwgaW5wbGFjZT1UcnVlKQogICAgICAgIG91dCA9IHNlbGYuYXZnX3Bvb2wob3V0KS52aWV3KG91dC5zaXplKDApLCAtMSkKICAgICAgICBvdXQgPSBzZWxmLm5ldC5jbGFzc2lmaWVyKG91dCkKICAgICAgICByZXR1cm4gb3V0CgoKcmVzbmV0MTggPSBwYXJ0aWFsKFJlc05ldCwgbmV0X2Nscz1NLnJlc25ldDE4KQpyZXNuZXQzNCA9IHBhcnRpYWwoUmVzTmV0LCBuZXRfY2xzPU0ucmVzbmV0MzQpCnJlc25ldDUwID0gcGFydGlhbChSZXNOZXQsIG5ldF9jbHM9TS5yZXNuZXQ1MCkKcmVzbmV0MTAxID0gcGFydGlhbChSZXNOZXQsIG5ldF9jbHM9TS5yZXNuZXQxMDEpCnJlc25ldDE1MiA9IHBhcnRpYWwoUmVzTmV0LCBuZXRfY2xzPU0ucmVzbmV0MTUyKQoKZGVuc2VuZXQxMjEgPSBwYXJ0aWFsKERlbnNlTmV0LCBuZXRfY2xzPU0uZGVuc2VuZXQxMjEpCmRlbnNlbmV0MTY5ID0gcGFydGlhbChEZW5zZU5ldCwgbmV0X2Nscz1NLmRlbnNlbmV0MTY5KQpkZW5zZW5ldDIwMSA9IHBhcnRpYWwoRGVuc2VOZXQsIG5ldF9jbHM9TS5kZW5zZW5ldDIwMSkKZGVuc2VuZXQxNjEgPSBwYXJ0aWFsKERlbnNlTmV0LCBuZXRfY2xzPU0uZGVuc2VuZXQxNjEpCg==', 
                    'imet/__init__.py': 'aW1wb3J0IGN2MgoKCmN2Mi5zZXROdW1UaHJlYWRzKDApICAjIGZpeCBwb3RlbnRpYWwgcHl0b3JjaCB3b3JrZXIgaXNzdWVzCg==', 
                    'imet/make_folds.py': 'aW1wb3J0IGFyZ3BhcnNlCmZyb20gY29sbGVjdGlvbnMgaW1wb3J0IGRlZmF1bHRkaWN0LCBDb3VudGVyCmltcG9ydCByYW5kb20KCmltcG9ydCBwYW5kYXMgYXMgcGQKaW1wb3J0IHRxZG0KCmZyb20gLmRhdGFzZXQgaW1wb3J0IERBVEFfUk9PVAoKCmRlZiBtYWtlX2ZvbGRzKG5fZm9sZHM6IGludCkgLT4gcGQuRGF0YUZyYW1lOgogICAgZGYgPSBwZC5yZWFkX2NzdihEQVRBX1JPT1QgLyAndHJhaW4uY3N2JykKICAgIGNsc19jb3VudHMgPSBDb3VudGVyKGNscyBmb3IgY2xhc3NlcyBpbiBkZlsnYXR0cmlidXRlX2lkcyddLnN0ci5zcGxpdCgpCiAgICAgICAgICAgICAgICAgICAgICAgICBmb3IgY2xzIGluIGNsYXNzZXMpCiAgICBmb2xkX2Nsc19jb3VudHMgPSBkZWZhdWx0ZGljdChpbnQpCiAgICBmb2xkcyA9IFstMV0gKiBsZW4oZGYpCiAgICBmb3IgaXRlbSBpbiB0cWRtLnRxZG0oZGYuc2FtcGxlKGZyYWM9MSwgcmFuZG9tX3N0YXRlPTQyKS5pdGVydHVwbGVzKCksCiAgICAgICAgICAgICAgICAgICAgICAgICAgdG90YWw9bGVuKGRmKSk6CiAgICAgICAgY2xzID0gbWluKGl0ZW0uYXR0cmlidXRlX2lkcy5zcGxpdCgpLCBrZXk9bGFtYmRhIGNsczogY2xzX2NvdW50c1tjbHNdKQogICAgICAgIGZvbGRfY291bnRzID0gWyhmLCBmb2xkX2Nsc19jb3VudHNbZiwgY2xzXSkgZm9yIGYgaW4gcmFuZ2Uobl9mb2xkcyldCiAgICAgICAgbWluX2NvdW50ID0gbWluKFtjb3VudCBmb3IgXywgY291bnQgaW4gZm9sZF9jb3VudHNdKQogICAgICAgIHJhbmRvbS5zZWVkKGl0ZW0uSW5kZXgpCiAgICAgICAgZm9sZCA9IHJhbmRvbS5jaG9pY2UoW2YgZm9yIGYsIGNvdW50IGluIGZvbGRfY291bnRzCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGlmIGNvdW50ID09IG1pbl9jb3VudF0pCiAgICAgICAgZm9sZHNbaXRlbS5JbmRleF0gPSBmb2xkCiAgICAgICAgZm9yIGNscyBpbiBpdGVtLmF0dHJpYnV0ZV9pZHMuc3BsaXQoKToKICAgICAgICAgICAgZm9sZF9jbHNfY291bnRzW2ZvbGQsIGNsc10gKz0gMQogICAgZGZbJ2ZvbGQnXSA9IGZvbGRzCiAgICByZXR1cm4gZGYKCgpkZWYgbWFpbigpOgogICAgcGFyc2VyID0gYXJncGFyc2UuQXJndW1lbnRQYXJzZXIoKQogICAgcGFyc2VyLmFkZF9hcmd1bWVudCgnLS1uLWZvbGRzJywgdHlwZT1pbnQsIGRlZmF1bHQ9NSkKICAgIGFyZ3MgPSBwYXJzZXIucGFyc2VfYXJncygpCiAgICBkZiA9IG1ha2VfZm9sZHMobl9mb2xkcz1hcmdzLm5fZm9sZHMpCiAgICBkZi50b19jc3YoJ2ZvbGRzLmNzdicsIGluZGV4PU5vbmUpCgoKaWYgX19uYW1lX18gPT0gJ19fbWFpbl9fJzoKICAgIG1haW4oKQo=', 
                    'imet/dataset.py': 'ZnJvbSBwYXRobGliIGltcG9ydCBQYXRoCmZyb20gdHlwaW5nIGltcG9ydCBDYWxsYWJsZSwgTGlzdAoKaW1wb3J0IGN2MgppbXBvcnQgcGFuZGFzIGFzIHBkCmZyb20gUElMIGltcG9ydCBJbWFnZQppbXBvcnQgdG9yY2gKZnJvbSB0b3JjaC51dGlscy5kYXRhIGltcG9ydCBEYXRhc2V0Cgpmcm9tIC50cmFuc2Zvcm1zIGltcG9ydCB0ZW5zb3JfdHJhbnNmb3JtCmZyb20gLnV0aWxzIGltcG9ydCBPTl9LQUdHTEUKCgpOX0NMQVNTRVMgPSAxMTAzCkRBVEFfUk9PVCA9IFBhdGgoJy4uL2lucHV0L2ltZXQtMjAxOS1mZ3ZjNicgaWYgT05fS0FHR0xFIGVsc2UgJy4vZGF0YScpCgoKY2xhc3MgVHJhaW5EYXRhc2V0KERhdGFzZXQpOgogICAgZGVmIF9faW5pdF9fKHNlbGYsIHJvb3Q6IFBhdGgsIGRmOiBwZC5EYXRhRnJhbWUsCiAgICAgICAgICAgICAgICAgaW1hZ2VfdHJhbnNmb3JtOiBDYWxsYWJsZSwgZGVidWc6IGJvb2wgPSBUcnVlKToKICAgICAgICBzdXBlcigpLl9faW5pdF9fKCkKICAgICAgICBzZWxmLl9yb290ID0gcm9vdAogICAgICAgIHNlbGYuX2RmID0gZGYKICAgICAgICBzZWxmLl9pbWFnZV90cmFuc2Zvcm0gPSBpbWFnZV90cmFuc2Zvcm0KICAgICAgICBzZWxmLl9kZWJ1ZyA9IGRlYnVnCgogICAgZGVmIF9fbGVuX18oc2VsZik6CiAgICAgICAgcmV0dXJuIGxlbihzZWxmLl9kZikKCiAgICBkZWYgX19nZXRpdGVtX18oc2VsZiwgaWR4OiBpbnQpOgogICAgICAgIGl0ZW0gPSBzZWxmLl9kZi5pbG9jW2lkeF0KICAgICAgICBpbWFnZSA9IGxvYWRfdHJhbnNmb3JtX2ltYWdlKAogICAgICAgICAgICBpdGVtLCBzZWxmLl9yb290LCBzZWxmLl9pbWFnZV90cmFuc2Zvcm0sIGRlYnVnPXNlbGYuX2RlYnVnKQogICAgICAgIHRhcmdldCA9IHRvcmNoLnplcm9zKE5fQ0xBU1NFUykKICAgICAgICBmb3IgY2xzIGluIGl0ZW0uYXR0cmlidXRlX2lkcy5zcGxpdCgpOgogICAgICAgICAgICB0YXJnZXRbaW50KGNscyldID0gMQogICAgICAgIHJldHVybiBpbWFnZSwgdGFyZ2V0CgoKY2xhc3MgVFRBRGF0YXNldDoKICAgIGRlZiBfX2luaXRfXyhzZWxmLCByb290OiBQYXRoLCBkZjogcGQuRGF0YUZyYW1lLAogICAgICAgICAgICAgICAgIGltYWdlX3RyYW5zZm9ybTogQ2FsbGFibGUsIHR0YTogaW50KToKICAgICAgICBzZWxmLl9yb290ID0gcm9vdAogICAgICAgIHNlbGYuX2RmID0gZGYKICAgICAgICBzZWxmLl9pbWFnZV90cmFuc2Zvcm0gPSBpbWFnZV90cmFuc2Zvcm0KICAgICAgICBzZWxmLl90dGEgPSB0dGEKCiAgICBkZWYgX19sZW5fXyhzZWxmKToKICAgICAgICByZXR1cm4gbGVuKHNlbGYuX2RmKSAqIHNlbGYuX3R0YQoKICAgIGRlZiBfX2dldGl0ZW1fXyhzZWxmLCBpZHgpOgogICAgICAgIGl0ZW0gPSBzZWxmLl9kZi5pbG9jW2lkeCAlIGxlbihzZWxmLl9kZildCiAgICAgICAgaW1hZ2UgPSBsb2FkX3RyYW5zZm9ybV9pbWFnZShpdGVtLCBzZWxmLl9yb290LCBzZWxmLl9pbWFnZV90cmFuc2Zvcm0pCiAgICAgICAgcmV0dXJuIGltYWdlLCBpdGVtLmlkCgoKZGVmIGxvYWRfdHJhbnNmb3JtX2ltYWdlKAogICAgICAgIGl0ZW0sIHJvb3Q6IFBhdGgsIGltYWdlX3RyYW5zZm9ybTogQ2FsbGFibGUsIGRlYnVnOiBib29sID0gRmFsc2UpOgogICAgaW1hZ2UgPSBsb2FkX2ltYWdlKGl0ZW0sIHJvb3QpCiAgICBpbWFnZSA9IGltYWdlX3RyYW5zZm9ybShpbWFnZSkKICAgIGlmIGRlYnVnOgogICAgICAgIGltYWdlLnNhdmUoJ19kZWJ1Zy5wbmcnKQogICAgcmV0dXJuIHRlbnNvcl90cmFuc2Zvcm0oaW1hZ2UpCgoKZGVmIGxvYWRfaW1hZ2UoaXRlbSwgcm9vdDogUGF0aCkgLT4gSW1hZ2UuSW1hZ2U6CiAgICBpbWFnZSA9IGN2Mi5pbXJlYWQoc3RyKHJvb3QgLyBmJ3tpdGVtLmlkfS5wbmcnKSkKICAgIGltYWdlID0gY3YyLmN2dENvbG9yKGltYWdlLCBjdjIuQ09MT1JfQkdSMlJHQikKICAgIHJldHVybiBJbWFnZS5mcm9tYXJyYXkoaW1hZ2UpCgoKZGVmIGdldF9pZHMocm9vdDogUGF0aCkgLT4gTGlzdFtzdHJdOgogICAgcmV0dXJuIHNvcnRlZCh7cC5uYW1lLnNwbGl0KCdfJylbMF0gZm9yIHAgaW4gcm9vdC5nbG9iKCcqLnBuZycpfSkK', 
                    'imet/utils.py': 'ZnJvbSBkYXRldGltZSBpbXBvcnQgZGF0ZXRpbWUKaW1wb3J0IGpzb24KaW1wb3J0IGdsb2IKaW1wb3J0IG9zCmZyb20gcGF0aGxpYiBpbXBvcnQgUGF0aApmcm9tIG11bHRpcHJvY2Vzc2luZy5wb29sIGltcG9ydCBUaHJlYWRQb29sCmZyb20gdHlwaW5nIGltcG9ydCBEaWN0CgppbXBvcnQgbnVtcHkgYXMgbnAKaW1wb3J0IHBhbmRhcyBhcyBwZApmcm9tIHNjaXB5LnN0YXRzLm1zdGF0cyBpbXBvcnQgZ21lYW4KaW1wb3J0IHRvcmNoCmZyb20gdG9yY2ggaW1wb3J0IG5uCmZyb20gdG9yY2gudXRpbHMuZGF0YSBpbXBvcnQgRGF0YUxvYWRlcgoKCk9OX0tBR0dMRTogYm9vbCA9ICdLQUdHTEVfV09SS0lOR19ESVInIGluIG9zLmVudmlyb24KCgpkZWYgZ21lYW5fZGYoZGY6IHBkLkRhdGFGcmFtZSkgLT4gcGQuRGF0YUZyYW1lOgogICAgcmV0dXJuIGRmLmdyb3VwYnkobGV2ZWw9MCkuYWdnKGxhbWJkYSB4OiBnbWVhbihsaXN0KHgpKSkKCgpkZWYgbWVhbl9kZihkZjogcGQuRGF0YUZyYW1lKSAtPiBwZC5EYXRhRnJhbWU6CiAgICByZXR1cm4gZGYuZ3JvdXBieShsZXZlbD0wKS5tZWFuKCkKCgpkZWYgbG9hZF9tb2RlbChtb2RlbDogbm4uTW9kdWxlLCBwYXRoOiBQYXRoKSAtPiBEaWN0OgogICAgc3RhdGUgPSB0b3JjaC5sb2FkKHN0cihwYXRoKSkKICAgIG1vZGVsLmxvYWRfc3RhdGVfZGljdChzdGF0ZVsnbW9kZWwnXSkKICAgIHByaW50KCdMb2FkZWQgbW9kZWwgZnJvbSBlcG9jaCB7ZXBvY2h9LCBzdGVwIHtzdGVwOix9Jy5mb3JtYXQoKipzdGF0ZSkpCiAgICByZXR1cm4gc3RhdGUKCgpjbGFzcyBUaHJlYWRpbmdEYXRhTG9hZGVyKERhdGFMb2FkZXIpOgogICAgZGVmIF9faXRlcl9fKHNlbGYpOgogICAgICAgIHNhbXBsZV9pdGVyID0gaXRlcihzZWxmLmJhdGNoX3NhbXBsZXIpCiAgICAgICAgaWYgc2VsZi5udW1fd29ya2VycyA9PSAwOgogICAgICAgICAgICBmb3IgaW5kaWNlcyBpbiBzYW1wbGVfaXRlcjoKICAgICAgICAgICAgICAgIHlpZWxkIHNlbGYuY29sbGF0ZV9mbihbc2VsZi5fZ2V0X2l0ZW0oaSkgZm9yIGkgaW4gaW5kaWNlc10pCiAgICAgICAgZWxzZToKICAgICAgICAgICAgcHJlZmV0Y2ggPSAxCiAgICAgICAgICAgIHdpdGggVGhyZWFkUG9vbChwcm9jZXNzZXM9c2VsZi5udW1fd29ya2VycykgYXMgcG9vbDoKICAgICAgICAgICAgICAgIGZ1dHVyZXMgPSBbXQogICAgICAgICAgICAgICAgZm9yIGluZGljZXMgaW4gc2FtcGxlX2l0ZXI6CiAgICAgICAgICAgICAgICAgICAgZnV0dXJlcy5hcHBlbmQoW3Bvb2wuYXBwbHlfYXN5bmMoc2VsZi5fZ2V0X2l0ZW0sIGFyZ3M9KGksKSkKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZm9yIGkgaW4gaW5kaWNlc10pCiAgICAgICAgICAgICAgICAgICAgaWYgbGVuKGZ1dHVyZXMpID4gcHJlZmV0Y2g6CiAgICAgICAgICAgICAgICAgICAgICAgIHlpZWxkIHNlbGYuY29sbGF0ZV9mbihbZi5nZXQoKSBmb3IgZiBpbiBmdXR1cmVzLnBvcCgwKV0pCiAgICAgICAgICAgICAgICAgICAgIyBpdGVtcyA9IHBvb2wubWFwKGxhbWJkYSBpOiBzZWxmLmRhdGFzZXRbaV0sIGluZGljZXMpCiAgICAgICAgICAgICAgICAgICAgIyB5aWVsZCBzZWxmLmNvbGxhdGVfZm4oaXRlbXMpCiAgICAgICAgICAgICAgICBmb3IgYmF0Y2hfZnV0dXJlcyBpbiBmdXR1cmVzOgogICAgICAgICAgICAgICAgICAgIHlpZWxkIHNlbGYuY29sbGF0ZV9mbihbZi5nZXQoKSBmb3IgZiBpbiBiYXRjaF9mdXR1cmVzXSkKCiAgICBkZWYgX2dldF9pdGVtKHNlbGYsIGkpOgogICAgICAgIHJldHVybiBzZWxmLmRhdGFzZXRbaV0KCgpkZWYgd3JpdGVfZXZlbnQobG9nLCBzdGVwOiBpbnQsICoqZGF0YSk6CiAgICBkYXRhWydzdGVwJ10gPSBzdGVwCiAgICBkYXRhWydkdCddID0gZGF0ZXRpbWUubm93KCkuaXNvZm9ybWF0KCkKICAgIGxvZy53cml0ZShqc29uLmR1bXBzKGRhdGEsIHNvcnRfa2V5cz1UcnVlKSkKICAgIGxvZy53cml0ZSgnXG4nKQogICAgbG9nLmZsdXNoKCkKCgpkZWYgcGxvdCgqYXJncywgeW1pbj1Ob25lLCB5bWF4PU5vbmUsIHhtaW49Tm9uZSwgeG1heD1Ob25lLCBwYXJhbXM9RmFsc2UsCiAgICAgICAgIG1heF9wb2ludHM9MjAwLCBsZWdlbmQ9VHJ1ZSwgdGl0bGU9Tm9uZSwKICAgICAgICAgcHJpbnRfa2V5cz1GYWxzZSwgcHJpbnRfcGF0aHM9RmFsc2UsIHBsdD1Ob25lLCBuZXdmaWd1cmU9VHJ1ZSwKICAgICAgICAgeF9zY2FsZT0xKToKICAgICIiIgogICAgVXNlIGluIHRoZSBub3RlYm9vayBsaWtlIHRoaXM6OgoKICAgICAgICAlbWF0cGxvdGxpYiBpbmxpbmUKICAgICAgICBmcm9tIGltZXQudXRpbHMgaW1wb3J0IHBsb3QKICAgICAgICBwbG90KCcuL3J1bnMvb2MyJywgJy4vcnVucy9vYzEnLCAnbG9zcycsICd2YWxpZF9sb3NzJykKCiAgICAiIiIKICAgIGltcG9ydCBqc29uX2xpbmVzICAjIG5vIGF2YWlsYWJsZSBvbiBLYWdnbGUKCiAgICBpZiBwbHQgaXMgTm9uZToKICAgICAgICBmcm9tIG1hdHBsb3RsaWIgaW1wb3J0IHB5cGxvdCBhcyBwbHQKICAgIHBhdGhzLCBrZXlzID0gW10sIFtdCiAgICBmb3IgeCBpbiBhcmdzOgogICAgICAgIGlmIHguc3RhcnRzd2l0aCgnLicpIG9yICcvJyBpbiB4OgogICAgICAgICAgICBpZiAnKicgaW4geDoKICAgICAgICAgICAgICAgIHBhdGhzLmV4dGVuZChnbG9iLmdsb2IoeCkpCiAgICAgICAgICAgIGVsc2U6CiAgICAgICAgICAgICAgICBwYXRocy5hcHBlbmQoeCkKICAgICAgICBlbHNlOgogICAgICAgICAgICBrZXlzLmFwcGVuZCh4KQogICAgaWYgcHJpbnRfcGF0aHM6CiAgICAgICAgcHJpbnQoJ0ZvdW5kIHBhdGhzOiB7fScuZm9ybWF0KCcgJy5qb2luKHNvcnRlZChwYXRocykpKSkKICAgIGlmIG5ld2ZpZ3VyZToKICAgICAgICBwbHQuZmlndXJlKGZpZ3NpemU9KDEyLCA4KSkKICAgIGtleXMgPSBrZXlzIG9yIFsnbG9zcycsICd2YWxpZF9sb3NzJ10KCiAgICB5bGltX2t3ID0ge30KICAgIGlmIHltaW4gaXMgbm90IE5vbmU6CiAgICAgICAgeWxpbV9rd1snYm90dG9tJ10gPSB5bWluCiAgICBpZiB5bWF4IGlzIG5vdCBOb25lOgogICAgICAgIHlsaW1fa3dbJ3RvcCddID0geW1heAogICAgaWYgeWxpbV9rdzoKICAgICAgICBwbHQueWxpbSgqKnlsaW1fa3cpCgogICAgeGxpbV9rdyA9IHt9CiAgICBpZiB4bWluIGlzIG5vdCBOb25lOgogICAgICAgIHhsaW1fa3dbJ2xlZnQnXSA9IHhtaW4KICAgIGlmIHhtYXggaXMgbm90IE5vbmU6CiAgICAgICAgeGxpbV9rd1sncmlnaHQnXSA9IHhtYXgKICAgIGlmIHhsaW1fa3c6CiAgICAgICAgcGx0LnhsaW0oKip4bGltX2t3KQogICAgYWxsX2tleXMgPSBzZXQoKQogICAgZm9yIHBhdGggaW4gc29ydGVkKHBhdGhzKToKICAgICAgICBwYXRoID0gUGF0aChwYXRoKQogICAgICAgIHdpdGgganNvbl9saW5lcy5vcGVuKHBhdGggLyAndHJhaW4ubG9nJywgYnJva2VuPVRydWUpIGFzIGY6CiAgICAgICAgICAgIGV2ZW50cyA9IGxpc3QoZikKICAgICAgICBhbGxfa2V5cy51cGRhdGUoayBmb3IgZSBpbiBldmVudHMgZm9yIGsgaW4gZSkKICAgICAgICBmb3Iga2V5IGluIHNvcnRlZChrZXlzKToKICAgICAgICAgICAgeHMsIHlzLCB5c19lcnIgPSBbXSwgW10sIFtdCiAgICAgICAgICAgIGZvciBlIGluIGV2ZW50czoKICAgICAgICAgICAgICAgIGlmIGtleSBpbiBlOgogICAgICAgICAgICAgICAgICAgIHhzLmFwcGVuZChlWydzdGVwJ10gKiB4X3NjYWxlKQogICAgICAgICAgICAgICAgICAgIHlzLmFwcGVuZChlW2tleV0pCiAgICAgICAgICAgICAgICAgICAgc3RkX2tleSA9IGtleSArICdfc3RkJwogICAgICAgICAgICAgICAgICAgIGlmIHN0ZF9rZXkgaW4gZToKICAgICAgICAgICAgICAgICAgICAgICAgeXNfZXJyLmFwcGVuZChlW3N0ZF9rZXldKQogICAgICAgICAgICBpZiB4czoKICAgICAgICAgICAgICAgIGlmIG5wLmlzbmFuKHlzKS5hbnkoKToKICAgICAgICAgICAgICAgICAgICBwcmludCgnV2FybmluZzogTmFOIHt9IGZvciB7fScuZm9ybWF0KGtleSwgcGF0aCkpCiAgICAgICAgICAgICAgICBpZiBsZW4oeHMpID4gMiAqIG1heF9wb2ludHM6CiAgICAgICAgICAgICAgICAgICAgaW5kaWNlcyA9IChucC5hcmFuZ2UoMCwgbGVuKHhzKSAtIDEsIGxlbih4cykgLyBtYXhfcG9pbnRzKQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLmFzdHlwZShucC5pbnQzMikpCiAgICAgICAgICAgICAgICAgICAgeHMgPSBucC5hcnJheSh4cylbaW5kaWNlc1sxOl1dCiAgICAgICAgICAgICAgICAgICAgeXMgPSBfc21vb3RoKHlzLCBpbmRpY2VzKQogICAgICAgICAgICAgICAgICAgIGlmIHlzX2VycjoKICAgICAgICAgICAgICAgICAgICAgICAgeXNfZXJyID0gX3Ntb290aCh5c19lcnIsIGluZGljZXMpCiAgICAgICAgICAgICAgICBsYWJlbCA9ICd7fToge30nLmZvcm1hdChwYXRoLCBrZXkpCiAgICAgICAgICAgICAgICBpZiBsYWJlbC5zdGFydHN3aXRoKCdfJyk6CiAgICAgICAgICAgICAgICAgICAgbGFiZWwgPSAnICcgKyBsYWJlbAogICAgICAgICAgICAgICAgaWYgeXNfZXJyOgogICAgICAgICAgICAgICAgICAgIHlzX2VyciA9IDEuOTYgKiBucC5hcnJheSh5c19lcnIpCiAgICAgICAgICAgICAgICAgICAgcGx0LmVycm9yYmFyKHhzLCB5cywgeWVycj15c19lcnIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZtdD0nLW8nLCBjYXBzaXplPTUsIGNhcHRoaWNrPTIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxhYmVsPWxhYmVsKQogICAgICAgICAgICAgICAgZWxzZToKICAgICAgICAgICAgICAgICAgICBwbHQucGxvdCh4cywgeXMsIGxhYmVsPWxhYmVsKQogICAgICAgICAgICAgICAgcGx0LmxlZ2VuZCgpCiAgICBpZiBuZXdmaWd1cmU6CiAgICAgICAgcGx0LmdyaWQoKQogICAgaWYgbGVnZW5kOgogICAgICAgIHBsdC5sZWdlbmQoKQogICAgaWYgdGl0bGU6CiAgICAgICAgcGx0LnRpdGxlKHRpdGxlKQogICAgaWYgcHJpbnRfa2V5czoKICAgICAgICBwcmludCgnRm91bmQga2V5czoge30nCiAgICAgICAgICAgICAgLmZvcm1hdCgnLCAnLmpvaW4oc29ydGVkKGFsbF9rZXlzIC0geydzdGVwJywgJ2R0J30pKSkpCgoKZGVmIF9zbW9vdGgoeXMsIGluZGljZXMpOgogICAgcmV0dXJuIFtucC5tZWFuKHlzW2lkeDogaW5kaWNlc1tpICsgMV1dKQogICAgICAgICAgICBmb3IgaSwgaWR4IGluIGVudW1lcmF0ZShpbmRpY2VzWzotMV0pXQo=', 
                    'imet/main.py': 'import argparse
from itertools import islice
import json
from pathlib import Path
import shutil
import warnings
from typing import Dict

import numpy as np
import pandas as pd
from sklearn.metrics import fbeta_score
from sklearn.exceptions import UndefinedMetricWarning
import torch
from torch import nn, cuda
from torch.optim import Adam
import tqdm

from . import models
from .dataset import TrainDataset, TTADataset, get_ids, N_CLASSES, DATA_ROOT
from .transforms import train_transform, test_transform
from .utils import (
    write_event, load_model, mean_df, ThreadingDataLoader as DataLoader,
    ON_KAGGLE)


def main():
    parser = argparse.ArgumentParser()
    arg = parser.add_argument
    arg('mode', choices=['train', 'validate', 'predict_valid', 'predict_test'])
    arg('run_root')
    arg('--model', default='resnet101')
    arg('--pretrained', type=int, default=1)
    arg('--batch-size', type=int, default=64)
    arg('--step', type=int, default=1)
    arg('--workers', type=int, default=2 if ON_KAGGLE else 4)
    arg('--lr', type=float, default=1e-4)
    arg('--patience', type=int, default=4)
    arg('--clean', action='store_true')
    arg('--n-epochs', type=int, default=100)
    arg('--epoch-size', type=int)
    arg('--tta', type=int, default=4)
    arg('--use-sample', action='store_true', help='use a sample of the dataset')
    arg('--debug', action='store_true')
    arg('--limit', type=int)
    arg('--fold', type=int, default=0)
    args = parser.parse_args()

    run_root = Path(args.run_root)
    folds = pd.read_csv('folds.csv')
    train_root = DATA_ROOT / ('train_sample' if args.use_sample else 'train')
    if args.use_sample:
        folds = folds[folds['Id'].isin(set(get_ids(train_root)))]
    train_fold = folds[folds['fold'] != args.fold]
    valid_fold = folds[folds['fold'] == args.fold]
    if args.limit:
        train_fold = train_fold[:args.limit]
        valid_fold = valid_fold[:args.limit]

    def make_loader(df: pd.DataFrame, image_transform) -> DataLoader:
        return DataLoader(
            TrainDataset(train_root, df, image_transform, debug=args.debug),
            shuffle=True,
            batch_size=args.batch_size,
            num_workers=args.workers,
        )
    criterion = nn.BCEWithLogitsLoss(reduction='none')
    model = getattr(models, args.model)(
        num_classes=N_CLASSES, pretrained=args.pretrained)
    use_cuda = cuda.is_available()
    fresh_params = list(model.fresh_params())
    all_params = list(model.parameters())
    if use_cuda:
        model = model.cuda()

    if args.mode == 'train':
        if run_root.exists() and args.clean:
            shutil.rmtree(run_root)
        run_root.mkdir(exist_ok=True, parents=True)
        (run_root / 'params.json').write_text(
            json.dumps(vars(args), indent=4, sort_keys=True))

        train_loader = make_loader(train_fold, train_transform)
        valid_loader = make_loader(valid_fold, test_transform)
        print(f'{len(train_loader.dataset):,} items in train, '
              f'{len(valid_loader.dataset):,} in valid')

        train_kwargs = dict(
            args=args,
            model=model,
            criterion=criterion,
            train_loader=train_loader,
            valid_loader=valid_loader,
            patience=args.patience,
            init_optimizer=lambda params, lr: Adam(params, lr),
            use_cuda=use_cuda,
        )

        if args.pretrained:
            if train(params=fresh_params, n_epochs=1, **train_kwargs):
                train(params=all_params, **train_kwargs)
        else:
            train(params=all_params, **train_kwargs)

    elif args.mode == 'validate':
        valid_loader = make_loader(valid_fold, test_transform)
        load_model(model, run_root / 'model.pt')
        validation(model, criterion, tqdm.tqdm(valid_loader, desc='Validation'),
                   use_cuda=use_cuda)

    elif args.mode.startswith('predict'):
        load_model(model, run_root / 'best-model.pt')
        predict_kwargs = dict(
            batch_size=args.batch_size,
            tta=args.tta,
            use_cuda=use_cuda,
            workers=args.workers,
        )
        if args.mode == 'predict_valid':
            predict(model, df=valid_fold, root=train_root, out_path=run_root / 'val.h5', **predict_kwargs)
        elif args.mode == 'predict_test':
            test_root = DATA_ROOT / (
                'test_sample' if args.use_sample else 'test')
            ss = pd.read_csv(DATA_ROOT / 'sample_submission.csv')
            if args.use_sample:
                ss = ss[ss['id'].isin(set(get_ids(test_root)))]
            if args.limit:
                ss = ss[:args.limit]
            predict(model, df=ss, root=test_root,
                    out_path=run_root / 'test.h5',
                    **predict_kwargs)


def predict(model, root: Path, df: pd.DataFrame, out_path: Path,
            batch_size: int, tta: int, workers: int, use_cuda: bool):
    loader = DataLoader(
        dataset=TTADataset(root, df, test_transform, tta=tta),
        shuffle=False,
        batch_size=batch_size,
        num_workers=workers,
    )
    model.eval()
    all_outputs, all_ids = [], []
    with torch.no_grad():
        for inputs, ids in tqdm.tqdm(loader, desc='Predict'):
            if use_cuda:
                inputs = inputs.cuda()
            outputs = torch.sigmoid(model(inputs))
            all_outputs.append(outputs.data.cpu().numpy())
            all_ids.extend(ids)
    df = pd.DataFrame(
        data=np.concatenate(all_outputs),
        index=all_ids,
        columns=map(str, range(N_CLASSES)))
    df = mean_df(df)
    df.to_hdf(out_path, 'prob', index_label='id')
    print(f'Saved predictions to {out_path}')


def train(args, model: nn.Module, criterion, *, params,
          train_loader, valid_loader, init_optimizer, use_cuda,
          n_epochs=None, patience=2, max_lr_changes=2) -> bool:
    lr = args.lr
    n_epochs = n_epochs or args.n_epochs
    params = list(params)
    optimizer = init_optimizer(params, lr)

    run_root = Path(args.run_root)
    model_path = run_root / 'model.pt'
    best_model_path = run_root / 'best-model.pt'
    if model_path.exists():
        state = load_model(model, model_path)
        epoch = state['epoch']
        step = state['step']
        best_valid_loss = state['best_valid_loss']
    else:
        epoch = 1
        step = 0
        best_valid_loss = float('inf')
    lr_changes = 0

    save = lambda ep: torch.save({
        'model': model.state_dict(),
        'epoch': ep,
        'step': step,
        'best_valid_loss': best_valid_loss
    }, str(model_path))

    report_each = 10000
    log = run_root.joinpath('train.log').open('at', encoding='utf8')
    valid_losses = []
    lr_reset_epoch = epoch
    for epoch in range(epoch, n_epochs + 1):
        model.train()
        tq = tqdm.tqdm(total=(args.epoch_size or
                              len(train_loader) * args.batch_size))
        tq.set_description(f'Epoch {epoch}, lr {lr}')
        losses = []
        tl = train_loader
        if args.epoch_size:
            tl = islice(tl, args.epoch_size // args.batch_size)
        try:
            mean_loss = 0
            for i, (inputs, targets) in enumerate(tl):
                if use_cuda:
                    inputs, targets = inputs.cuda(), targets.cuda()
                outputs = model(inputs)
                loss = _reduce_loss(criterion(outputs, targets))
                batch_size = inputs.size(0)
                (batch_size * loss).backward()
                if (i + 1) % args.step == 0:
                    optimizer.step()
                    optimizer.zero_grad()
                    step += 1
                tq.update(batch_size)
                losses.append(loss.item())
                mean_loss = np.mean(losses[-report_each:])
                tq.set_postfix(loss=f'{mean_loss:.3f}')
                if i and i % report_each == 0:
                    write_event(log, step, loss=mean_loss)
            write_event(log, step, loss=mean_loss)
            tq.close()
            save(epoch + 1)
            valid_metrics = validation(model, criterion, valid_loader, use_cuda)
            write_event(log, step, **valid_metrics)
            valid_loss = valid_metrics['valid_loss']
            valid_losses.append(valid_loss)
            if valid_loss < best_valid_loss:
                best_valid_loss = valid_loss
                shutil.copy(str(model_path), str(best_model_path))
            elif (patience and epoch - lr_reset_epoch > patience and
                  min(valid_losses[-patience:]) > best_valid_loss):
                # "patience" epochs without improvement
                lr_changes +=1
                if lr_changes > max_lr_changes:
                    break
                lr /= 5
                print(f'lr updated to {lr}')
                lr_reset_epoch = epoch
                optimizer = init_optimizer(params, lr)
        except KeyboardInterrupt:
            tq.close()
            print('Ctrl+C, saving snapshot')
            save(epoch)
            print('done.')
            return False
    return True


def validation(
        model: nn.Module, criterion, valid_loader, use_cuda,
        ) -> Dict[str, float]:
    model.eval()
    all_losses, all_predictions, all_targets = [], [], []
    with torch.no_grad():
        for inputs, targets in valid_loader:
            all_targets.append(targets.numpy().copy())
            if use_cuda:
                inputs, targets = inputs.cuda(), targets.cuda()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            all_losses.append(_reduce_loss(loss).item())
            predictions = torch.sigmoid(outputs)
            all_predictions.append(predictions.cpu().numpy())
    all_predictions = np.concatenate(all_predictions)
    all_targets = np.concatenate(all_targets)

    def get_score(y_pred):
        with warnings.catch_warnings():
            warnings.simplefilter('ignore', category=UndefinedMetricWarning)
            return fbeta_score(
                all_targets, y_pred, beta=2, average='samples')

    metrics = {}
    argsorted = all_predictions.argsort(axis=1)
    for threshold in [0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.20]:
        metrics[f'valid_f2_th_{threshold:.2f}'] = get_score(
            binarize_prediction(all_predictions, threshold, argsorted))
    metrics['valid_loss'] = np.mean(all_losses)
    print(' | '.join(f'{k} {v:.3f}' for k, v in sorted(
        metrics.items(), key=lambda kv: -kv[1])))

    return metrics


def binarize_prediction(probabilities, threshold: float, argsorted=None,
                        min_labels=1, max_labels=10):
    """ Return matrix of 0/1 predictions, same shape as probabilities.
    """
    assert probabilities.shape[1] == N_CLASSES
    if argsorted is None:
        argsorted = probabilities.argsort(axis=1)
    max_mask = _make_mask(argsorted, max_labels)
    min_mask = _make_mask(argsorted, min_labels)
    prob_mask = probabilities > threshold
    return (max_mask & prob_mask) | min_mask


def _make_mask(argsorted, top_n: int):
    mask = np.zeros_like(argsorted, dtype=np.uint8)
    col_indices = argsorted[:, -top_n:].reshape(-1)
    row_indices = [i // top_n for i in range(len(col_indices))]
    mask[row_indices, col_indices] = 1
    return mask


def _reduce_loss(loss):
    return loss.sum() / loss.shape[0]


if __name__ == '__main__':
    main()
',
                    'setup.py': 'ZnJvbSBzZXR1cHRvb2xzIGltcG9ydCBzZXR1cAoKc2V0dXAoCiAgICBuYW1lPSdpbWV0JywKICAgIHBhY2thZ2VzPVsnaW1ldCddLAop'
                    }

for path, encoded in file_data.items():
    print(path)
    path = Path(path)
    path.parent.mkdir(exist_ok=True)
    path.write_bytes(base64.b64decode(encoded))


def run(command):
    os.system('export PYTHONPATH=${PYTHONPATH}:/kaggle/working && ' + command)


print(os.listdir('/kaggle/working/'))
run('python setup.py develop --install-dir /kaggle/working')

!cp '../input/public-0615/best-model.pt' '/kaggle/working/'

run('python setup.py develop --install-dir /kaggle/working')

run('python -m imet.make_folds --n-folds 40')
# run('python -m imet.main train model_1 --n-epochs 16')

run('python -m imet.main predict_test .')
run('python -m imet.make_submission ./test.h5 /kaggle/working/sub.csv --threshold 0.10')

print(os.listdir('.'))
imet/transforms.py
imet/make_submission.py
imet/models.py
imet/__init__.py
imet/make_folds.py
imet/dataset.py
imet/utils.py
imet/main.py
setup.py
['imet', '__notebook_source__.ipynb', '.ipynb_checkpoints', 'setup.py']
['imet.egg-link', 'imet', '__notebook_source__.ipynb', 'best-model.pt', '.ipynb_checkpoints', '__pycache__', 'setup.py', 'folds.csv', 'test.h5', 'site.py', 'sub.csv', 'easy-install.pth', 'imet.egg-info']
In [2]:
try:
    test_preds_kon = pd.read_csv('sub.csv')
    attr_ids_kon = test_preds_kon['attribute_ids'].apply(lambda x: x.split()).values

    attr_id_thr = 0.10
except Exception as e:
    pass
In [3]:
len(attr_ids_kon)
Out[3]:
7443
In [4]:
!rm -rf /kaggle/working/*

This step is to make SeResNext work in FastAI

In [5]:
import os
os.system('cp -r ../input/pretrained-models-cadene/pretrained_models_pytroch/ /kaggle/working/')
os.chdir('/kaggle/working/pretrained_models_pytroch/pretrained-models.pytorch-master')
!python setup.py install
running install
running bdist_egg
running egg_info
creating pretrainedmodels.egg-info
writing pretrainedmodels.egg-info/PKG-INFO
writing dependency_links to pretrainedmodels.egg-info/dependency_links.txt
writing requirements to pretrainedmodels.egg-info/requires.txt
writing top-level names to pretrainedmodels.egg-info/top_level.txt
writing manifest file 'pretrainedmodels.egg-info/SOURCES.txt'
reading manifest file 'pretrainedmodels.egg-info/SOURCES.txt'
writing manifest file 'pretrainedmodels.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/pretrainedmodels
copying pretrainedmodels/version.py -> build/lib/pretrainedmodels
copying pretrainedmodels/utils.py -> build/lib/pretrainedmodels
copying pretrainedmodels/__init__.py -> build/lib/pretrainedmodels
creating build/lib/pretrainedmodels/datasets
copying pretrainedmodels/datasets/utils.py -> build/lib/pretrainedmodels/datasets
copying pretrainedmodels/datasets/__init__.py -> build/lib/pretrainedmodels/datasets
copying pretrainedmodels/datasets/voc.py -> build/lib/pretrainedmodels/datasets
creating build/lib/pretrainedmodels/models
copying pretrainedmodels/models/polynet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/nasnet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/bninception.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/wideresnet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/utils.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/__init__.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/torchvision_models.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/inceptionv4.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/vggm.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/xception.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/cafferesnet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/resnext.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/fbresnet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/inceptionresnetv2.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/dpn.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/senet.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/nasnet_mobile.py -> build/lib/pretrainedmodels/models
copying pretrainedmodels/models/pnasnet.py -> build/lib/pretrainedmodels/models
creating build/lib/pretrainedmodels/models/resnext_features
copying pretrainedmodels/models/resnext_features/__init__.py -> build/lib/pretrainedmodels/models/resnext_features
copying pretrainedmodels/models/resnext_features/resnext101_64x4d_features.py -> build/lib/pretrainedmodels/models/resnext_features
copying pretrainedmodels/models/resnext_features/resnext101_32x4d_features.py -> build/lib/pretrainedmodels/models/resnext_features
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/pretrainedmodels
creating build/bdist.linux-x86_64/egg/pretrainedmodels/datasets
copying build/lib/pretrainedmodels/datasets/utils.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/datasets
copying build/lib/pretrainedmodels/datasets/__init__.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/datasets
copying build/lib/pretrainedmodels/datasets/voc.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/datasets
copying build/lib/pretrainedmodels/version.py -> build/bdist.linux-x86_64/egg/pretrainedmodels
copying build/lib/pretrainedmodels/utils.py -> build/bdist.linux-x86_64/egg/pretrainedmodels
copying build/lib/pretrainedmodels/__init__.py -> build/bdist.linux-x86_64/egg/pretrainedmodels
creating build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/polynet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/nasnet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/bninception.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/wideresnet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/utils.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/__init__.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/torchvision_models.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/inceptionv4.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/vggm.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/xception.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/cafferesnet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
creating build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features
copying build/lib/pretrainedmodels/models/resnext_features/__init__.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features
copying build/lib/pretrainedmodels/models/resnext_features/resnext101_64x4d_features.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features
copying build/lib/pretrainedmodels/models/resnext_features/resnext101_32x4d_features.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features
copying build/lib/pretrainedmodels/models/resnext.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/fbresnet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/inceptionresnetv2.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/dpn.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/senet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/nasnet_mobile.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
copying build/lib/pretrainedmodels/models/pnasnet.py -> build/bdist.linux-x86_64/egg/pretrainedmodels/models
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/datasets/utils.py to utils.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/datasets/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/datasets/voc.py to voc.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/version.py to version.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/utils.py to utils.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/polynet.py to polynet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/nasnet.py to nasnet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/bninception.py to bninception.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/wideresnet.py to wideresnet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/utils.py to utils.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/torchvision_models.py to torchvision_models.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/inceptionv4.py to inceptionv4.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/vggm.py to vggm.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/xception.py to xception.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/cafferesnet.py to cafferesnet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features/resnext101_64x4d_features.py to resnext101_64x4d_features.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext_features/resnext101_32x4d_features.py to resnext101_32x4d_features.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/resnext.py to resnext.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/fbresnet.py to fbresnet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/inceptionresnetv2.py to inceptionresnetv2.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/dpn.py to dpn.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/senet.py to senet.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/nasnet_mobile.py to nasnet_mobile.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/pretrainedmodels/models/pnasnet.py to pnasnet.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying pretrainedmodels.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pretrainedmodels.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pretrainedmodels.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pretrainedmodels.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pretrainedmodels.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/pretrainedmodels-0.7.4-py3.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing pretrainedmodels-0.7.4-py3.6.egg
Copying pretrainedmodels-0.7.4-py3.6.egg to /opt/conda/lib/python3.6/site-packages
Adding pretrainedmodels 0.7.4 to easy-install.pth file

Installed /opt/conda/lib/python3.6/site-packages/pretrainedmodels-0.7.4-py3.6.egg
Processing dependencies for pretrainedmodels==0.7.4
Searching for torchvision
Reading https://pypi.org/simple/torchvision/
Download error on https://pypi.org/simple/torchvision/: [Errno -3] Temporary failure in name resolution -- Some packages may not be found!
Couldn't find index page for 'torchvision' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
Download error on https://pypi.org/simple/: [Errno -3] Temporary failure in name resolution -- Some packages may not be found!
No local packages or working download links found for torchvision
error: Could not find suitable distribution for Requirement.parse('torchvision')
In [6]:
import os
import pickle
import fastai

from fastai.vision import *
from fastai.vision.models.cadene_models import *

from shutil import copyfile
fastai.__version__
Out[6]:
'1.0.51'
In [7]:
os.chdir('/kaggle/working')
os.listdir('.')
Out[7]:
['.ipynb_checkpoints', 'pretrained_models_pytroch']
In [8]:
# Source: https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/78109
class FocalLoss(nn.Module):
    def __init__(self, gamma=2):
        super().__init__()
        self.gamma = gamma

    def forward(self, logit, target):
        target = target.float()
        max_val = (-logit).clamp(min=0)
        loss = logit - logit * target + max_val + \
               ((-max_val).exp() + (-logit - max_val).exp()).log()

        invprobs = F.logsigmoid(-logit * (target * 2.0 - 1.0))
        loss = (invprobs * self.gamma).exp() * loss
        if len(loss.size())==2:
            loss = loss.sum(dim=1)
        return loss.mean()
In [9]:
path = Path('../input/imet-2019-fgvc6/') # iMet data path
In [10]:
BATCH  = 32
SIZE   = 256
In [11]:
from torch.utils import model_zoo
Path('models').mkdir(exist_ok=True)
In [12]:
test_df = pd.read_csv(path/'sample_submission.csv')
test_df.head()
Out[12]:
id attribute_ids
0 10023b2cc4ed5f68 0 1 2
1 100fbe75ed8fd887 0 1 2
2 101b627524a04f19 0 1 2
3 10234480c41284c6 0 1 2
4 1023b0e2636dcea8 0 1 2
In [13]:
labels_df = pd.read_csv(path/'labels.csv')
labels_df.head()
Out[13]:
attribute_id attribute_name
0 0 culture::abruzzi
1 1 culture::achaemenid
2 2 culture::aegean
3 3 culture::afghan
4 4 culture::after british
In [14]:
train_df = pd.read_csv(path/'train.csv')
train_df.head()
Out[14]:
id attribute_ids
0 1000483014d91860 147 616 813
1 1000fe2e667721fe 51 616 734 813
2 1001614cb89646ee 776
3 10041eb49b297c08 51 671 698 813 1092
4 100501c227f8beea 13 404 492 903 1093
In [15]:
t_preds = []

n_fold = 5

val_preds = np.zeros((2, 10923, 1103))
In [16]:
tfms = get_transforms(do_flip=True, flip_vert=False, max_rotate=0.10, max_zoom=1.5, max_warp=0.2, max_lighting=0.2,
                     xtra_tfms=[(symmetric_warp(magnitude=(-0,0), p=0)), rand_crop(p=0.75),])

train, test = [ImageList.from_df(df, path=path, cols='id', folder=folder, suffix='.png') 
               for df, folder in zip([train_df, test_df], ['train', 'test'])]


data = (train.split_by_rand_pct(0.1, seed=42)
        .label_from_df(cols='attribute_ids', label_delim=' ')
        .add_test(test)
        .transform(tfms, size=SIZE, resize_method=ResizeMethod.PAD, padding_mode='border',)
        .databunch(path=Path('.'), bs=BATCH).normalize(imagenet_stats))

Here starts the Ensemble magic

In [17]:
# model dictionary
model = {
    'seresnextkfolds_15': 'se_resnext50_32x4d-a260b3a4',
    'seresnextkfolds_11': 'se_resnext50_32x4d-a260b3a4',
    'resnet5022epochs': 'resnet50',
    'seresnextkfolds_16': 'se_resnext50_32x4d-a260b3a4',
    'seresnextkfolds_49': 'se_resnext50_32x4d-a260b3a4',
}
In [ ]:
for model_path in model.keys():
    m_path = 'stage-1.pth'
    
    if len(model_path.split('_')) > 1:
        m_path = 'seresnext-' + model_path.split('_')[1] + '.pth'
    print(model_path, m_path)

    copyfile('../input/' + model_path.split('_')[0] + '/' + m_path, 'models/'+ model[model_path] +'.pth')

    def load_url(*args, **kwargs):
        model_dir = Path('models')
        filename  = model[model_path] + '.pth'
        if not (model_dir/filename).is_file(): raise FileNotFoundError
        return torch.load(model_dir/filename)
    model_zoo.load_url = load_url
    
    arch = se_resnext50_32x4d
    if model[model_path] == 'resnet50':
        arch = models.resnet50

    learn = cnn_learner(data, base_arch=arch, loss_func=FocalLoss(), metrics=fbeta, pretrained=False)
    learn.load(model[model_path])
    
    preds = learn.TTA(ds_type=DatasetType.Test)
    np.save(model_path + '.npy', preds[0].numpy())
    t_preds.append(preds[0].numpy())
87.50% [7/8 10:55<01:33]
43.35% [101/233 00:42<00:54]
In [ ]:
attr_ids = []
thrs = [0.27, 0.27, 0.28, 0.27, 0.29]

def preds_ids(preds, thr):
    preds = [torch.tensor(preds)]
    return [i2c[np.where(t==1)[0],1].astype(str) for t in (preds[0].sigmoid()>thr).long()]
In [ ]:
attr_ids = [preds_ids(t_preds[i], thrs[i]) for i in range(len(t_preds))]

Ensembling approach

I take all the preds generated by all models and for every sample I count the frequency of attribute ids.

So for example,

For sample 1 let's say 3 model predicted attr_id_1, and 1 model predicted attr_id_2. Then there is a high chance attr_id_1 must be right one.

In such a way I calculate frequency of all attr ids and then all the attr_id frequency which are greater than equal to a threshold thr I choose those for final preds

In [ ]:
attr_ids.append(attr_ids_kon)
thrs.append(attr_id_thr)

res_attr_ids = []
thr = 4/6
for i in range(len(attr_ids[0])):
    id_s = np.concatenate([p_attr_id[i] for p_attr_id in attr_ids],
                         axis=None)
    id_s, counts = np.unique(list(id_s),
                             return_counts=True)
    counts = counts/6
    pids = ' '.join(id_s[np.where(counts >= thr)])
    res_attr_ids.append(pids)

print(len(res_attr_ids), len(attr_ids[0]))
In [ ]:
test_df.attribute_ids = res_attr_ids
test_df.to_csv('submission.csv', index=False)
In [ ]:
test_df.head()

Please upvote if you find the kernel interesting.

Note: I haven't revealed my private datasets so I dont think this kernel should affect the standings. I have just revealed the approach of my solution.

In [ ]:
!rm -rf /kaggle/working/pretrained_models_pytroch

If this notebook is not running after forking just copy the code and datasets in the new kernel. Some Kaggle side issue.