HSV_Analysis

From: https://www.kaggle.com/d5195295/hsv-analysis

Author: DHTT

In [1]:
import numpy as np
import pandas as pd
import pylab as plt
import seaborn as sns
import cv2
import os
In [2]:
def hsv_analysis(img_path,show_pic=False):    
    image=cv2.imread(img_path)
    if show_pic:
        image_rgb=cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#         try:
#             plt.subplot(3, 3, i)
#         except:pass
        plt.imshow(image_rgb)
        plt.show()
        
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    h,s,v=np.average(hsv_image,axis=(0,1))
    return h,s,v
# print(os.listdir('../input'))
train = pd.read_csv('../input/imet-2019-fgvc6/train.csv')
read_len=train.id.__len__()
read_len=5
hsv_list=[]
for i in range(read_len):    
    img_path='../input/imet-2019-fgvc6/train/'+train.id[i]+".png"    
    hsv_list.append(hsv_analysis(img_path,show_pic=True))

hsv_list_sum=np.average(np.array(hsv_list),axis=0)
print(hsv_list_sum)
[ 13.92156456  45.17374113 180.26797291]
In [3]:
# the first 1000 pics
# read_len=train.id.__len__()
read_len=1000
hsv_list=[]
for i in range(read_len):    
    img_path='../input/imet-2019-fgvc6/train/'+train.id[i]+".png"    
    hsv_list.append(hsv_analysis(img_path,show_pic=False))

hsv_list_sum=np.average(np.array(hsv_list),axis=0)
print(hsv_list_sum)
np.save('hsv_list.h5',np.array(hsv_list),)
[ 28.43184678  39.25677386 165.16378383]
In [4]:
import seaborn as sns
df = pd.DataFrame(hsv_list, columns=["Hue", "y",'Brightness(Values)'])
sns.jointplot(x="Hue", y="Brightness(Values)", data=df)
/opt/conda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[4]:
<seaborn.axisgrid.JointGrid at 0x7f5c488a0f98>

From the distribution , the brightness value is not low, largely because the background is white.

Most of the pictures are concentrated in the 0-40 range, indicating that the tue is warmer, probably because of the antiques.

In [5]:
assert os.path.exists("../input/hsv-list-h5/hsv_list.h5.npy")
hsv_list=np.load("../input/hsv-list-h5/hsv_list.h5.npy")
import seaborn as sns
df = pd.DataFrame(hsv_list, columns=["Hue", "y",'Brightness(Values)'])
sns.jointplot(x="Hue", y="Brightness(Values)", data=df)
/opt/conda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[5]:
<seaborn.axisgrid.JointGrid at 0x7f5c4722f8d0>

The above picture is the distribution of all the pictures, and the distribution of the first 1000 is basically the same.