diff --git a/.gitignore b/.gitignore index ee2b73e4c7b03f41cd226670391b8705a1e88935..d5c4948bc1ef7bfb673b1a56d962132072ba5cce 100644 --- a/.gitignore +++ b/.gitignore @@ -137,5 +137,4 @@ dmypy.json /python_test.py *.pth *.edf -*log* *.png \ No newline at end of file diff --git a/README.md b/README.md index a0a408354aee765b77f94ddc84736be06bf04be2..f5bdf340d29caa7c11e0e3895df056e0eaf3a446 100644 --- a/README.md +++ b/README.md @@ -1,88 +1,52 @@ # candock -[这原本是一个用于记录毕业设计的日志仓库](),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.
目前,毕业设计已经完成,我将继续跟进这个项目。项目重点将转变为如何将代码进行实际应用,我们将考虑运算量与准确率之间的平衡。另外,将提供一些预训练的模型便于使用。
同时我们相信这些代码也可以用于其他生理信号(如ECG,EMG等)的分类.希望这将有助于您的研究或项目.
-![image](https://github.com/HypoX64/candock/blob/master/image/compare.png) +[这原本是一个用于记录毕业设计的日志仓库](),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.
目前,项目重点将转变为如何建立一个通用的一维时序信号分析,分类框架.
它将包含多种网络结构,并提供数据预处理,读取,训练,评估,测试等功能.
+一些训练时的输出样例: [heatmap](./image/heatmap_eg.png) [running_err](./image/running_err_eg.png) [log.txt](./docs/log_eg.txt) ## 注意 -为了适应新的项目,代码已被大幅更改,不能确保仍然能正常运行如sleep-edfx等睡眠数据集,如果仍然需要运行,请按照输入格式标准自行加载数据,如果有时间我会修复这个问题。 -当然,也可以直接使用[老的版本](https://github.com/HypoX64/candock/tree/f24cc44933f494d2235b3bf965a04cde5e6a1ae9) -```python -''' -#数据输入格式 -change your own data to train -but the data needs meet the following conditions: -1.type numpydata signals:np.float16 labels:np.int16 -2.shape signals:[num,ch,length] labels:[num] -''' +为了适应新的项目,代码已被大幅更改,不能确保仍然能正常运行如sleep-edfx等睡眠数据集,如果仍然需要运行,请参照下文按照输入格式标准自行加载数据,如果有时间我会修复这个问题。 +当然,如果需要加载睡眠数据集也可以直接使用[老的版本](https://github.com/HypoX64/candock/tree/f24cc44933f494d2235b3bf965a04cde5e6a1ae9) + +## Getting Started +### Prerequisites +- Linux, Windows,mac +- CPU or NVIDIA GPU + CUDA CuDNN +- Python 3 +- Pytroch 1.0+ +### Dependencies +This code depends on torchvision, numpy, scipy , matplotlib,available via pip install.
+For example:
+ +```bash +pip3 install matplotlib ``` - -## 如何运行 -如果你需要运行这些代码(训练自己的模型或者使用预训练模型对自己的数据进行预测)请进入以下页面
-[How to run codes](https://github.com/HypoX64/candock/blob/master/how_to_run.md)
- -## 数据集 -使用了两个公开的睡眠数据集进行训练,分别是: [[CinC Challenge 2018]](https://physionet.org/physiobank/database/challenge/2018/#files) [[sleep-edfx]](https://www.physionet.org/physiobank/database/sleep-edfx/)
-对于CinC Challenge 2018数据集,我们仅使用其C4-M1通道, 对于sleep-edfx与sleep-edf数据集,使用Fpz-Cz通道
-注意:
-1.如果需要获得其他EEG通道的预训练模型,这需要下载这两个数据集并使用train.py完成训练。当然,你也可以使用自己的数据训练模型。
-2.对于sleep-edfx数据集,我们仅仅截取了入睡前30分钟到醒来后30分钟之间的睡眠区间作为读入数据(实验结果中用select sleep time 进行标注),目的是平衡各睡眠时期的比例并加快训练速度.
- -## 一些说明 -* 数据预处理
- - 1.降采样:CinC Challenge 2018数据集的EEG信号将被降采样到100HZ
- - 2.归一化处理:我们推荐每个受试者的EEG信号均采用5th-95th分位数归一化,即第5%大的数据为0,第95%大的数据为1。注意:所有预训练模型均按照这个方法进行归一化后训练得到
- - 3.将读取的数据分割为30s/Epoch作为一个输入,每个输入包含3000个数据点。睡眠阶段标签为5个分别是N3,N2,N1,REM,W.每个Epoch的数据将对应一个标签。标签映射:N3(S4+S3)->0 N2->1 N1->2 REM->3 W->4
- - 4.数据集扩充:训练时,对每一个Epoch的数据均要进行随机切,随机翻转,随机改变信号幅度等操作
- - 5.对于不同的网络结构,对原始eeg信号采取了预处理,使其拥有不同的shape:
- LSTM:将30s的eeg信号进行FIR带通滤波,获得θ,σ,α,δ,β波,并将它们进行连接后作为输入数据
- CNN_1d类(标有1d的网络):没有什么特别的操作,其实就是把图像领域的各种模型换成Conv1d之后拿过来用而已
- DFCNN类(就是科大讯飞的那种想法,先转化为频谱图,然后直接用图像分类的各种模型):将30s的eeg信号进行短时傅里叶变换,并生成频谱图作为输入,并使用图像分类网络进行分类。我们不推荐使用这种方法,因为转化为频谱图需要耗费较大的运算资源。
- -* EEG频谱图
- 这里展示5个睡眠阶段对应的频谱图,它们依次是Wake, Stage 1, Stage 2, Stage 3, REM
- ![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Wake.png) - ![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage1.png) - ![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage2.png) - ![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage3.png) - ![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_REM.png)
- -* multi_scale_resnet_1d 网络结构
- 该网络参考[geekfeiw / Multi-Scale-1D-ResNet](https://github.com/geekfeiw/Multi-Scale-1D-ResNet) 这个网络将被我们命名为micro_multi_scale_resnet_1d
- 修改后的[网络结构](https://github.com/HypoX64/candock/blob/master/image/multi_scale_resnet_1d_network.png)
- -* 关于交叉验证
为了更好的进行实际应用,我们将使用受试者交叉验证。即训练集和验证集的数据来自于不同的受试者。值得注意的是sleep-edfx数据集中每个受试者均有两个样本,我们视两个样本为同一个受试者,很多paper忽略了这一点,手动滑稽。
- -* 关于评估指标
- 对于各睡眠阶段标签: Accuracy = (TP+TN)/(TP+FN+TN+FP) Recall = sensitivity = (TP)/(TP+FN)
- 对于总体: Top1 err. Kappa 另外对Acc与Re做平均
- 特别说明:这项分类任务中样本标签分布及不平衡,为了更具说服力,我们的平均并不加权。 - -## 部分实验结果 -该部分将持续更新... ...
-[[Confusion matrix]](https://github.com/HypoX64/candock/blob/master/confusion_mat)
- -#### Subject Cross-Validation Results -特别说明:这项分类任务中样本标签分布及不平衡,我们对分类损失函数中的类别权重进行了魔改,这将使得Average Recall得到小幅提升,但同时整体error也将提升.若使用默认权重,Top1 err.至少下降5%,但这会导致数据占比极小的N1时期的recall猛跌20%,这绝对不是我们在实际应用中所希望看到的。下面给出的结果均是使用魔改后的权重得到的。
-* [sleep-edfx](https://www.physionet.org/physiobank/database/sleep-edfx/) ->sample size = 197, select sleep time - -| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature | -| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- | -| lstm | 1.25M | 26.32% | 89.47% | 68.57% | Yes | -| micro_multi_scale_resnet_1d | 2.11M | 25.33% | 89.87% | 72.61% | No | -| resnet18_1d | 3.85M | 24.21% | 90.31% | 72.87% | No | -| multi_scale_resnet_1d | 8.42M | 24.01% | 90.40% | 72.37% | No | -* [CinC Challenge 2018](https://physionet.org/physiobank/database/challenge/2018/#files) ->sample size = 994 - -| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature | -| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- | -| lstm | 1.25M | 26.85% | 89.26% | 71.39% | Yes | -| micro_multi_scale_resnet_1d | 2.11M | 27.01% | 89.20% | 73.12% | No | -| resnet18_1d | 3.85M | 25.84% | 89.66% | 73.32% | No | -| multi_scale_resnet_1d | 8.42M | 25.27% | 89.89% | 73.63% | No | +### Clone this repo: +```bash +git clone https://github.com/HypoX64/candock +cd candock ``` - +### Download dataset and pretrained-model +[[Google Drive]](https://drive.google.com/open?id=1NTtLmT02jqlc81lhtzQ7GlPK8epuHfU5) [[百度云,y4ks]](https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA) + +* This datasets consists of signals.npy(shape:18207, 1, 2000) and labels.npy(shape:18207) which can be loaded by "np.load()". +* samples:18207, channel:1, length of each sample:2000, class:50 +* Top1 err: 2.09% +### Train +```bash +python3 train.py --label 50 --input_nc 1 --dataset_dir ./datasets/simple_test --save_dir ./checkpoints/simple_test --model_name micro_multi_scale_resnet_1d --gpu_id 0 --batchsize 64 --k_fold 5 +``` +* For more [options](./options.py). +#### Use your own data to train +* step1: Generate signals.npy and labels.npy in the following format. +```python +#1.type:numpydata signals:np.float64 labels:np.int64 +#2.shape signals:[num,ch,length] labels:[num] +#num:samples_num, ch :channel_num, num:length of each sample +#for example: +signals = np.zeros((10,1,10),dtype='np.float64') +labels = np.array([0,0,0,0,0,1,1,1,1,1]) #0->class0 1->class1 +``` +* step2: input ```--dataset_dir your_dataset_dir``` when running code. +### Test +```bash +python3 simple_test.py --label 50 --input_nc 1 --model_name micro_multi_scale_resnet_1d --gpu_id 0 ``` \ No newline at end of file diff --git a/checkpoints/pretrained/pretrained_model_URL b/checkpoints/pretrained/pretrained_model_URL index 0a25140dcf0c5eb545231c2553b2f08a41f7de64..49970f5015a0aff88769419d4a6fa189d16577b8 100644 --- a/checkpoints/pretrained/pretrained_model_URL +++ b/checkpoints/pretrained/pretrained_model_URL @@ -1 +1,4 @@ -https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf \ No newline at end of file +google drive: https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf +百度云: https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA key:y4ks + + diff --git a/datasets/simple_test/test_data_URL b/datasets/simple_test/test_data_URL index 0a25140dcf0c5eb545231c2553b2f08a41f7de64..435cfbce2bfd0ec44984c3982d61c1b903b6724a 100644 --- a/datasets/simple_test/test_data_URL +++ b/datasets/simple_test/test_data_URL @@ -1 +1,2 @@ -https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf \ No newline at end of file +google drive: https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf +百度云: https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA key:y4ks diff --git a/how_to_run.md b/docs/how_to_run(sleep_stage).md similarity index 100% rename from how_to_run.md rename to docs/how_to_run(sleep_stage).md diff --git a/docs/log_eg.txt b/docs/log_eg.txt new file mode 100644 index 0000000000000000000000000000000000000000..2a740a6db25ba0f666ca498191fb97108d3f56f6 --- /dev/null +++ b/docs/log_eg.txt @@ -0,0 +1,270 @@ +Sun Mar 22 00:30:40 2020 +----------------- Options --------------- + BID: not-supported [default: 5_95_th] + batchsize: 16 [default: 64] + continue_train: False + dataset_dir: /home/hypo/MyProject/Ear_AU/datasets/emotion/candock_6class_60s_pad_selectlabel [default: ./datasets/sleep-edfx/] + dataset_name: preload + epochs: 150 [default: 20] + gpu_id: 1 [default: 0] + input_nc: 5 [default: 3] + k_fold: 5 [default: 0] + label: 6 [default: 5] + label_name: ['Amus', 'Neut', 'Sadn', 'Tend', 'Disg', 'Fear'] [default: auto] + lr: 0.001 + model_name: multi_scale_resnet_1d [default: lstm] + network_save_freq: 1000 [default: 5] + no_cuda: False + no_cudnn: False + no_shuffle: False + pretrained: False + sample_num: not-supported [default: 20] + save_dir: ./checkpoints/EMDB_5ch_6class_last60s_pad_weightauto_selectlabel_multiscale [default: ./checkpoints/] + select_sleep_time: not-supported [default: False] + separated: False + signal_name: not-supported [default: EEG Fpz-Cz] + weight_mod: auto [default: normal] +----------------- End ------------------- +network: +Multi_Scale_ResNet( + (pre_conv): Sequential( + (0): Conv1d(5, 64, kernel_size=(15,), stride=(2,), padding=(7,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): MaxPool1d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) + ) + (Route1): Route( + (block1): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 64, kernel_size=(3,), stride=(1,), padding=(1,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(64, 64, kernel_size=(3,), stride=(1,), padding=(1,), bias=False) + (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block2): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 128, kernel_size=(3,), stride=(2,), padding=(1,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(1,), bias=False) + (4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block3): ResidualBlock( + (conv): Sequential( + (0): Conv1d(128, 256, kernel_size=(3,), stride=(2,), padding=(1,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,), bias=False) + (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block4): ResidualBlock( + (conv): Sequential( + (0): Conv1d(256, 512, kernel_size=(3,), stride=(2,), padding=(1,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False) + (4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (avgpool): AdaptiveAvgPool1d(output_size=1) + ) + (Route2): Route( + (block1): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 64, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(64, 64, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) + (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block2): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 128, kernel_size=(5,), stride=(2,), padding=(2,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(128, 128, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) + (4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block3): ResidualBlock( + (conv): Sequential( + (0): Conv1d(128, 256, kernel_size=(5,), stride=(2,), padding=(2,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) + (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block4): ResidualBlock( + (conv): Sequential( + (0): Conv1d(256, 512, kernel_size=(5,), stride=(2,), padding=(2,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) + (4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (avgpool): AdaptiveAvgPool1d(output_size=1) + ) + (Route3): Route( + (block1): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 64, kernel_size=(7,), stride=(1,), padding=(3,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(64, 64, kernel_size=(7,), stride=(1,), padding=(3,), bias=False) + (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block2): ResidualBlock( + (conv): Sequential( + (0): Conv1d(64, 128, kernel_size=(7,), stride=(2,), padding=(3,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(3,), bias=False) + (4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block3): ResidualBlock( + (conv): Sequential( + (0): Conv1d(128, 256, kernel_size=(7,), stride=(2,), padding=(3,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,), bias=False) + (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (block4): ResidualBlock( + (conv): Sequential( + (0): Conv1d(256, 512, kernel_size=(7,), stride=(2,), padding=(3,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (2): ReLU(inplace=True) + (3): Conv1d(512, 512, kernel_size=(7,), stride=(1,), padding=(3,), bias=False) + (4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + (shortcut): Sequential( + (0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False) + (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + ) + ) + (avgpool): AdaptiveAvgPool1d(output_size=1) + ) + (fc): Linear(in_features=1536, out_features=6, bias=True) +) +net parameters: 8.42M +label statistics: [715 643 518 254 517 436] +Loss_weight:[0.81885882 0.87833147 0.99945861 1.39877569 1.00054139 1.09602268] +------------------------------ k-fold:1 ------------------------------ +>>> per epoch cost time:99.97s +fold -> macro-prec,reca,F1,err,kappa: (0.3178, 0.3181, 0.2995, 0.6234, 0.2222) +confusion_mat: +[[64 42 4 3 16 6] + [18 82 8 5 20 5] + [24 41 21 2 16 6] + [12 18 4 0 7 2] + [ 8 24 7 0 50 10] + [ 8 28 6 2 27 12]] + +------------------------------ k-fold:2 ------------------------------ +>>> per epoch cost time:96.49s +fold -> macro-prec,reca,F1,err,kappa: (0.3149, 0.3155, 0.3127, 0.6464, 0.2092) +confusion_mat: +[[71 21 18 12 14 11] + [13 56 22 6 7 11] + [17 28 30 7 9 10] + [15 11 4 4 8 7] + [ 9 23 15 0 33 27] + [ 9 16 15 5 23 21]] + +------------------------------ k-fold:3 ------------------------------ +>>> per epoch cost time:95.27s +fold -> macro-prec,reca,F1,err,kappa: (0.3436, 0.3481, 0.3369, 0.6036, 0.2566) +confusion_mat: +[[86 17 18 8 13 2] + [16 53 24 4 15 7] + [24 20 38 3 9 5] + [15 16 14 3 7 2] + [10 18 12 4 49 12] + [11 23 16 2 20 12]] + +------------------------------ k-fold:4 ------------------------------ +>>> per epoch cost time:50.3s +fold -> macro-prec,reca,F1,err,kappa: (0.349, 0.354, 0.3469, 0.6102, 0.2523) +confusion_mat: +[[73 21 13 12 19 4] + [27 44 18 9 12 14] + [26 14 33 5 15 11] + [15 17 7 4 5 6] + [12 12 3 4 61 11] + [ 9 5 7 5 33 22]] + +------------------------------ k-fold:5 ------------------------------ +>>> per epoch cost time:49.65s +fold -> macro-prec,reca,F1,err,kappa: (0.3306, 0.3352, 0.3303, 0.6217, 0.2363) +confusion_mat: +[[68 18 18 9 20 7] + [17 61 29 5 13 12] + [18 23 30 5 14 7] + [ 9 14 13 2 5 3] + [11 14 5 3 45 19] + [ 9 17 11 3 27 24]] + +------------------------------ final result ------------------------------ +final -> macro-prec,reca,F1,err,kappa: (0.3299, 0.3345, 0.3284, 0.6211, 0.2357) +confusion_mat: +[[362 119 71 44 82 30] + [ 91 296 101 29 67 49] + [109 126 152 22 63 39] + [ 66 76 42 13 32 20] + [ 50 91 42 11 238 79] + [ 46 89 55 17 130 91]] + diff --git a/docs/sleep_stage.md b/docs/sleep_stage.md new file mode 100644 index 0000000000000000000000000000000000000000..7d31463e1590fc272a13603e05d0938bc97916c7 --- /dev/null +++ b/docs/sleep_stage.md @@ -0,0 +1,71 @@ +# candock +[这原本是一个用于记录毕业设计的日志仓库](),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.
目前,毕业设计已经完成,我将继续跟进这个项目。项目重点将转变为如何将代码进行实际应用,我们将考虑运算量与准确率之间的平衡。另外,将提供一些预训练的模型便于使用。
同时我们相信这些代码也可以用于其他生理信号(如ECG,EMG等)的分类.希望这将有助于您的研究或项目.
+![image](../image/compare.png) + +## 如何运行 +如果你需要运行这些代码(训练自己的模型或者使用预训练模型对自己的数据进行预测)请进入以下页面
+[How to run codes](./how_to_run(sleep_stage).md)
+ +## 数据集 +使用了两个公开的睡眠数据集进行训练,分别是: [[CinC Challenge 2018]](https://physionet.org/physiobank/database/challenge/2018/#files) [[sleep-edfx]](https://www.physionet.org/physiobank/database/sleep-edfx/)
+对于CinC Challenge 2018数据集,我们仅使用其C4-M1通道, 对于sleep-edfx与sleep-edf数据集,使用Fpz-Cz通道
+注意:
+1.如果需要获得其他EEG通道的预训练模型,这需要下载这两个数据集并使用train.py完成训练。当然,你也可以使用自己的数据训练模型。
+2.对于sleep-edfx数据集,我们仅仅截取了入睡前30分钟到醒来后30分钟之间的睡眠区间作为读入数据(实验结果中用select sleep time 进行标注),目的是平衡各睡眠时期的比例并加快训练速度.
+ +## 一些说明 +* 数据预处理
+ + 1.降采样:CinC Challenge 2018数据集的EEG信号将被降采样到100HZ
+ + 2.归一化处理:我们推荐每个受试者的EEG信号均采用5th-95th分位数归一化,即第5%大的数据为0,第95%大的数据为1。注意:所有预训练模型均按照这个方法进行归一化后训练得到
+ + 3.将读取的数据分割为30s/Epoch作为一个输入,每个输入包含3000个数据点。睡眠阶段标签为5个分别是N3,N2,N1,REM,W.每个Epoch的数据将对应一个标签。标签映射:N3(S4+S3)->0 N2->1 N1->2 REM->3 W->4
+ + 4.数据集扩充:训练时,对每一个Epoch的数据均要进行随机切,随机翻转,随机改变信号幅度等操作
+ + 5.对于不同的网络结构,对原始eeg信号采取了预处理,使其拥有不同的shape:
+ LSTM:将30s的eeg信号进行FIR带通滤波,获得θ,σ,α,δ,β波,并将它们进行连接后作为输入数据
+ CNN_1d类(标有1d的网络):没有什么特别的操作,其实就是把图像领域的各种模型换成Conv1d之后拿过来用而已
+ DFCNN类(就是科大讯飞的那种想法,先转化为频谱图,然后直接用图像分类的各种模型):将30s的eeg信号进行短时傅里叶变换,并生成频谱图作为输入,并使用图像分类网络进行分类。我们不推荐使用这种方法,因为转化为频谱图需要耗费较大的运算资源。
+ +* EEG频谱图
+ 这里展示5个睡眠阶段对应的频谱图,它们依次是Wake, Stage 1, Stage 2, Stage 3, REM
+ ![image](../image/spectrum_Wake.png)![image](../image/spectrum_Stage1.png)![image](../image/spectrum_Stage2.png)![image](../image/spectrum_Stage3.png)![image](../image/spectrum_REM.png)
+ +* multi_scale_resnet_1d 网络结构
+ 该网络参考[geekfeiw / Multi-Scale-1D-ResNet](https://github.com/geekfeiw/Multi-Scale-1D-ResNet) 这个网络将被我们命名为micro_multi_scale_resnet_1d
+ 修改后的[网络结构](https://github.com/HypoX64/candock/blob/master/image/multi_scale_resnet_1d_network.png)
+ +* 关于交叉验证
为了更好的进行实际应用,我们将使用受试者交叉验证。即训练集和验证集的数据来自于不同的受试者。值得注意的是sleep-edfx数据集中每个受试者均有两个样本,我们视两个样本为同一个受试者,很多paper忽略了这一点,手动滑稽。
+ +* 关于评估指标
+ 对于各睡眠阶段标签: Accuracy = (TP+TN)/(TP+FN+TN+FP) Recall = sensitivity = (TP)/(TP+FN)
+ 对于总体: Top1 err. Kappa 另外对Acc与Re做平均
+ 特别说明:这项分类任务中样本标签分布及不平衡,为了更具说服力,我们的平均并不加权。 + +## 部分实验结果 +该部分将持续更新... ...
+[[Confusion matrix]](../confusion_mat)
+ +#### Subject Cross-Validation Results +特别说明:这项分类任务中样本标签分布及不平衡,我们对分类损失函数中的类别权重进行了魔改,这将使得Average Recall得到小幅提升,但同时整体error也将提升.若使用默认权重,Top1 err.至少下降5%,但这会导致数据占比极小的N1时期的recall猛跌20%,这绝对不是我们在实际应用中所希望看到的。下面给出的结果均是使用魔改后的权重得到的。
+* [sleep-edfx](https://www.physionet.org/physiobank/database/sleep-edfx/) ->sample size = 197, select sleep time + +| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature | +| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- | +| lstm | 1.25M | 26.32% | 89.47% | 68.57% | Yes | +| micro_multi_scale_resnet_1d | 2.11M | 25.33% | 89.87% | 72.61% | No | +| resnet18_1d | 3.85M | 24.21% | 90.31% | 72.87% | No | +| multi_scale_resnet_1d | 8.42M | 24.01% | 90.40% | 72.37% | No | +* [CinC Challenge 2018](https://physionet.org/physiobank/database/challenge/2018/#files) ->sample size = 994 + +| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature | +| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- | +| lstm | 1.25M | 26.85% | 89.26% | 71.39% | Yes | +| micro_multi_scale_resnet_1d | 2.11M | 27.01% | 89.20% | 73.12% | No | +| resnet18_1d | 3.85M | 25.84% | 89.66% | 73.32% | No | +| multi_scale_resnet_1d | 8.42M | 25.27% | 89.89% | 73.63% | No | +``` + +``` \ No newline at end of file diff --git a/heatmap.py b/heatmap.py index da67fd689a35144ec028823d9073fcaa09deba8c..34d455382d118e5b88c1e03077ce2f082abe5d7f 100644 --- a/heatmap.py +++ b/heatmap.py @@ -131,16 +131,22 @@ def annotate_heatmap(im, data=None, valfmt="{x:.2f}", def draw(mat,opt,name = 'train'): + if 'merge' in name: + label_name = opt.mergelabel_name + else: + label_name = opt.label_name mat = mat.astype(float) for i in range(mat.shape[0]): mat[i,:]=mat[i,:]/np.sum(mat[i])*100 - - fig, ax = plt.subplots() + if len(mat)>8: + fig, ax = plt.subplots(figsize=(len(mat)+2.5, len(mat))) + else: + fig, ax = plt.subplots() ax.set_ylabel('True',fontsize=12) ax.set_xlabel('Pred',fontsize=12) - im, cbar = create_heatmap(mat, opt.label_name, opt.label_name, ax=ax, + im, cbar = create_heatmap(mat, label_name, label_name, ax=ax, cmap="Blues", cbarlabel="percentage") texts = annotate_heatmap(im,valfmt="{x:.1f}%") diff --git a/options.py b/options.py index f606e428c08f3c83c58da8161f63647f3499f88c..9dfd771bea130c80c7ef433300ddb087acb7c151 100644 --- a/options.py +++ b/options.py @@ -3,9 +3,6 @@ import os import time import util -# python3 train.py --dataset_dir '/media/hypo/Hypo/physionet_org_train' --dataset_name cc2018 --signal_name 'C4-M1' --sample_num 20 --model_name lstm --batchsize 64 --epochs 20 --lr 0.0005 --no_cudnn -# python3 train.py --dataset_dir './datasets/sleep-edfx/' --dataset_name sleep-edfx --signal_name 'EEG Fpz-Cz' --sample_num 50 --model_name lstm --batchsize 64 --network_save_freq 5 --epochs 25 --lr 0.0005 --BID 5_95_th --select_sleep_time --no_cudnn --select_sleep_time - class Options(): def __init__(self): self.parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) @@ -19,7 +16,7 @@ class Options(): self.parser.add_argument('--label', type=int, default=5,help='number of labels') self.parser.add_argument('--input_nc', type=int, default=3, help='# of input channels') self.parser.add_argument('--label_name', type=str, default='auto',help='name of labels,example:"a,b,c,d,e,f"') - self.parser.add_argument('--model_name', type=str, default='lstm',help='Choose model lstm | multi_scale_resnet_1d | resnet18 | micro_multi_scale_resnet_1d...') + self.parser.add_argument('--model_name', type=str, default='micro_multi_scale_resnet_1d',help='Choose model lstm | multi_scale_resnet_1d | resnet18 | micro_multi_scale_resnet_1d...') self.parser.add_argument('--pretrained', action='store_true', help='if input, use pretrained models') self.parser.add_argument('--continue_train', action='store_true', help='if input, continue train') self.parser.add_argument('--lr', type=float, default=0.001,help='learning rate') @@ -30,7 +27,8 @@ class Options(): self.parser.add_argument('--k_fold', type=int, default=0,help='fold_num of k-fold.if 0 or 1,no k-fold') self.parser.add_argument('--mergelabel', type=str, default='None', help='merge some labels to one label and give the result, example:"[[0,1,4],[2,3,5]]" , label(0,1,4) regard as 0,label(2,3,5) regard as 1') - + self.parser.add_argument('--mergelabel_name', type=str, default='None',help='name of labels,example:"a,b,c,d,e,f"') + self.parser.add_argument('--dataset_dir', type=str, default='./datasets/sleep-edfx/', help='your dataset path') self.parser.add_argument('--save_dir', type=str, default='./checkpoints/',help='save checkpoints') @@ -76,12 +74,14 @@ class Options(): names.append(str(i)) self.opt.label_name = names else: - names = self.opt.label_name - names = names.replace(" ", "") - names = names.split(",") - self.opt.label_name = names + self.opt.label_name = self.opt.label_name.replace(" ", "").split(",") + self.opt.mergelabel = eval(self.opt.mergelabel) + if self.opt.mergelabel_name != 'None': + self.opt.mergelabel_name = self.opt.mergelabel_name.replace(" ", "").split(",") + + """Print and save options diff --git a/simple_test.py b/simple_test.py index ae3234d21a70a0d0bf44462f5d2a4d8644517d18..9091ec8fcb875bccda8e2d82abc6ec2acaff246c 100644 --- a/simple_test.py +++ b/simple_test.py @@ -10,73 +10,29 @@ from options import Options from creatnet import CreatNet ''' +--------------------------------preload data-------------------------------- @hypox64 -19/05/18 -download pretrained model and test data here: -https://drive.google.com/open?id=1NTtLmT02jqlc81lhtzQ7GlPK8epuHfU5 +2020/04/03 ''' opt = Options().getparse() -#choose and creat model -net=CreatNet(opt.model_name) +net = CreatNet(opt) -if not opt.no_cuda: - net.cuda() -if not opt.no_cudnn: - import torch.backends.cudnn as cudnn - cudnn.benchmark = True +#load data +signals = np.load('./datasets/simple_test/signals.npy') +labels = np.load('./datasets/simple_test/labels.npy') #load prtrained_model -net.load_state_dict(torch.load('./checkpoints/pretrained/'+opt.dataset_name+'/'+opt.model_name+'.pth')) +net.load_state_dict(torch.load('./checkpoints/pretrained/micro_multi_scale_resnet_1d_50class.pth')) net.eval() +if not opt.no_cuda: + net.cuda() -def runmodel(eeg): - eeg = eeg.reshape(1,-1) - eeg = transformer.ToInputShape(eeg,opt.model_name,test_flag =True) - eeg = transformer.ToTensor(eeg,no_cuda =opt.no_cuda) - out = net(eeg) - pred = torch.max(out, 1)[1] - pred_stage=pred.data.cpu().numpy() - return pred_stage[0] - -''' -you can change your input data here. -but the data needs meet the following conditions: -1.fs = 100Hz -2.collect by uv -3.type numpydata signals:np.float16 stages:np.int16 -4.shape signals:[?,3000] stages:[?] -''' -eegdata = np.load('./datasets/simple_test/sleep_edfx_Fpz_Cz_test.npy') -true_stages = np.load('./datasets/simple_test/sleep_edfx_stages_test.npy') -print('shape of eegdata:',eegdata.shape) -print('shape of true_stage:',true_stages.shape) - -#Normalize -eegdata = transformer.Balance_individualized_differences(eegdata, '5_95_th') - -#run pretrained model -pred_stages=[] -for i in range(len(eegdata)): - pred_stages.append(runmodel(eegdata[i])) -pred_stages = np.array(pred_stages) - -print('err:',sum((true_stages[i]!=pred_stages[i])for i in range(len(pred_stages)))/len(true_stages)*100,'%') - -#plot result -plt.subplot(211) -plt.plot(true_stages+1) -plt.xlim((0,len(true_stages))) -plt.ylim((0,6)) -plt.yticks([1, 2, 3, 4, 5],['N3', 'N2', 'N1', 'REM', 'W']) -plt.xticks([],[]) -plt.title('Manually scored hypnogram') - -plt.subplot(212) -plt.plot(pred_stages+1) -plt.xlim((0,len(true_stages))) -plt.ylim((0,6)) -plt.yticks([1, 2, 3, 4, 5],['N3', 'N2', 'N1', 'REM', 'W']) -plt.xlabel('Epoch number') -plt.title('Auto scored hypnogram') -plt.show() - +for signal,true_label in zip(signals, labels): + signal = signal.reshape(1,1,-1) #batchsize,ch,length + true_label = true_label.reshape(1,-1) #batchsize,label + signal,true_label = transformer.ToTensor(signal,true_label,no_cuda =opt.no_cuda) + out = net(signal) + pred_label = torch.max(out, 1)[1] + pred_label=pred_label.data.cpu().numpy() + true_label=true_label.data.cpu().numpy() + print(("true:{0:d} predict:{1:d}").format(true_label[0][0],pred_label[0])) diff --git a/train.py b/train.py index 85f9d3afac2d8e5653b84b4b2a42c449426086f5..9bc5dc0d3bc55d2a0e7498bc24a107b736751b69 100644 --- a/train.py +++ b/train.py @@ -45,9 +45,11 @@ util.writelog('network:\n'+str(net),opt,True) util.show_paramsnumber(net,opt) weight = np.ones(opt.label) if opt.weight_mod == 'auto': - weight = np.log(1/label_cnt_per) - weight = weight/np.median(weight) - weight = np.clip(weight, 0.8, 2) + weight = 1/label_cnt_per + weight = weight/np.min(weight) + # weight = np.log(1/label_cnt_per) + # weight = weight/np.median(weight) + # weight = np.clip(weight, 0.8, 2) util.writelog('label statistics: '+str(label_cnt),opt,True) util.writelog('Loss_weight:'+str(weight),opt,True) weight = torch.from_numpy(weight).float() @@ -149,6 +151,7 @@ for fold in range(opt.k_fold): final_confusion_mat = confusion_mats[pos] if opt.k_fold==1: statistics.statistics(final_confusion_mat, opt, 'final', 'final_test') + np.save(os.path.join(opt.save_dir,'confusion_mat.npy'), final_confusion_mat) else: fold_final_confusion_mat += final_confusion_mat util.writelog('fold -> macro-prec,reca,F1,err,kappa: '+str(statistics.report(final_confusion_mat)),opt,True) @@ -157,7 +160,8 @@ for fold in range(opt.k_fold): if opt.k_fold != 1: statistics.statistics(fold_final_confusion_mat, opt, 'final', 'k-fold-final_test') - + np.save(os.path.join(opt.save_dir,'confusion_mat.npy'), fold_final_confusion_mat) + if opt.mergelabel: mat = statistics.mergemat(fold_final_confusion_mat, opt.mergelabel) - statistics.statistics(mat, opt, 'merge', 'mergelabel_test') + statistics.statistics(mat, opt, 'merge', 'mergelabel_final')