提交 6043de95 编写于 作者: H hypox64

Update simple_test.py

上级 97149ce5
......@@ -137,5 +137,4 @@ dmypy.json
/python_test.py
*.pth
*.edf
*log*
*.png
\ No newline at end of file
# candock
[这原本是一个用于记录毕业设计的日志仓库](<https://github.com/HypoX64/candock/tree/Graduation_Project>),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.<br>目前,毕业设计已经完成,我将继续跟进这个项目。项目重点将转变为如何将代码进行实际应用,我们将考虑运算量与准确率之间的平衡。另外,将提供一些预训练的模型便于使用。<br>同时我们相信这些代码也可以用于其他生理信号(如ECG,EMG等)的分类.希望这将有助于您的研究或项目.<br>
![image](https://github.com/HypoX64/candock/blob/master/image/compare.png)
[这原本是一个用于记录毕业设计的日志仓库](<https://github.com/HypoX64/candock/tree/Graduation_Project>),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.<br>目前,项目重点将转变为如何建立一个通用的一维时序信号分析,分类框架.<br>它将包含多种网络结构,并提供数据预处理,读取,训练,评估,测试等功能.<br>
一些训练时的输出样例: [heatmap](./image/heatmap_eg.png) [running_err](./image/running_err_eg.png) [log.txt](./docs/log_eg.txt)
## 注意
为了适应新的项目,代码已被大幅更改,不能确保仍然能正常运行如sleep-edfx等睡眠数据集,如果仍然需要运行,请按照输入格式标准自行加载数据,如果有时间我会修复这个问题。
当然,也可以直接使用[老的版本](https://github.com/HypoX64/candock/tree/f24cc44933f494d2235b3bf965a04cde5e6a1ae9)
```python
'''
#数据输入格式
change your own data to train
but the data needs meet the following conditions:
1.type numpydata signals:np.float16 labels:np.int16
2.shape signals:[num,ch,length] labels:[num]
'''
为了适应新的项目,代码已被大幅更改,不能确保仍然能正常运行如sleep-edfx等睡眠数据集,如果仍然需要运行,请参照下文按照输入格式标准自行加载数据,如果有时间我会修复这个问题。
当然,如果需要加载睡眠数据集也可以直接使用[老的版本](https://github.com/HypoX64/candock/tree/f24cc44933f494d2235b3bf965a04cde5e6a1ae9)
## Getting Started
### Prerequisites
- Linux, Windows,mac
- CPU or NVIDIA GPU + CUDA CuDNN
- Python 3
- Pytroch 1.0+
### Dependencies
This code depends on torchvision, numpy, scipy , matplotlib,available via pip install.<br>
For example:<br>
```bash
pip3 install matplotlib
```
## 如何运行
如果你需要运行这些代码(训练自己的模型或者使用预训练模型对自己的数据进行预测)请进入以下页面<br>
[How to run codes](https://github.com/HypoX64/candock/blob/master/how_to_run.md)<br>
## 数据集
使用了两个公开的睡眠数据集进行训练,分别是: [[CinC Challenge 2018]](https://physionet.org/physiobank/database/challenge/2018/#files) [[sleep-edfx]](https://www.physionet.org/physiobank/database/sleep-edfx/) <br>
对于CinC Challenge 2018数据集,我们仅使用其C4-M1通道, 对于sleep-edfx与sleep-edf数据集,使用Fpz-Cz通道<br>
注意:<br>
1.如果需要获得其他EEG通道的预训练模型,这需要下载这两个数据集并使用train.py完成训练。当然,你也可以使用自己的数据训练模型。<br>
2.对于sleep-edfx数据集,我们仅仅截取了入睡前30分钟到醒来后30分钟之间的睡眠区间作为读入数据(实验结果中用select sleep time 进行标注),目的是平衡各睡眠时期的比例并加快训练速度.<br>
## 一些说明
* 数据预处理<br>
1.降采样:CinC Challenge 2018数据集的EEG信号将被降采样到100HZ<br>
2.归一化处理:我们推荐每个受试者的EEG信号均采用5th-95th分位数归一化,即第5%大的数据为0,第95%大的数据为1。注意:所有预训练模型均按照这个方法进行归一化后训练得到<br>
3.将读取的数据分割为30s/Epoch作为一个输入,每个输入包含3000个数据点。睡眠阶段标签为5个分别是N3,N2,N1,REM,W.每个Epoch的数据将对应一个标签。标签映射:N3(S4+S3)->0 N2->1 N1->2 REM->3 W->4<br>
4.数据集扩充:训练时,对每一个Epoch的数据均要进行随机切,随机翻转,随机改变信号幅度等操作<br>
5.对于不同的网络结构,对原始eeg信号采取了预处理,使其拥有不同的shape:<br>
LSTM:将30s的eeg信号进行FIR带通滤波,获得θ,σ,α,δ,β波,并将它们进行连接后作为输入数据<br>
CNN_1d类(标有1d的网络):没有什么特别的操作,其实就是把图像领域的各种模型换成Conv1d之后拿过来用而已<br>
DFCNN类(就是科大讯飞的那种想法,先转化为频谱图,然后直接用图像分类的各种模型):将30s的eeg信号进行短时傅里叶变换,并生成频谱图作为输入,并使用图像分类网络进行分类。我们不推荐使用这种方法,因为转化为频谱图需要耗费较大的运算资源。<br>
* EEG频谱图<br>
这里展示5个睡眠阶段对应的频谱图,它们依次是Wake, Stage 1, Stage 2, Stage 3, REM<br>
![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Wake.png)
![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage1.png)
![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage2.png)
![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_Stage3.png)
![image](https://github.com/HypoX64/candock/blob/master/image/spectrum_REM.png)<br>
* multi_scale_resnet_1d 网络结构<br>
该网络参考[geekfeiw / Multi-Scale-1D-ResNet](https://github.com/geekfeiw/Multi-Scale-1D-ResNet) 这个网络将被我们命名为micro_multi_scale_resnet_1d<br>
修改后的[网络结构](https://github.com/HypoX64/candock/blob/master/image/multi_scale_resnet_1d_network.png)<br>
* 关于交叉验证<br>为了更好的进行实际应用,我们将使用受试者交叉验证。即训练集和验证集的数据来自于不同的受试者。值得注意的是sleep-edfx数据集中每个受试者均有两个样本,我们视两个样本为同一个受试者,很多paper忽略了这一点,手动滑稽。<br>
* 关于评估指标<br>
对于各睡眠阶段标签: Accuracy = (TP+TN)/(TP+FN+TN+FP) Recall = sensitivity = (TP)/(TP+FN)<br>
对于总体: Top1 err. Kappa 另外对Acc与Re做平均<br>
特别说明:这项分类任务中样本标签分布及不平衡,为了更具说服力,我们的平均并不加权。
## 部分实验结果
该部分将持续更新... ...<br>
[[Confusion matrix]](https://github.com/HypoX64/candock/blob/master/confusion_mat)<br>
#### Subject Cross-Validation Results
特别说明:这项分类任务中样本标签分布及不平衡,我们对分类损失函数中的类别权重进行了魔改,这将使得Average Recall得到小幅提升,但同时整体error也将提升.若使用默认权重,Top1 err.至少下降5%,但这会导致数据占比极小的N1时期的recall猛跌20%,这绝对不是我们在实际应用中所希望看到的。下面给出的结果均是使用魔改后的权重得到的。<br>
* [sleep-edfx](https://www.physionet.org/physiobank/database/sleep-edfx/) ->sample size = 197, select sleep time
| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature |
| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- |
| lstm | 1.25M | 26.32% | 89.47% | 68.57% | Yes |
| micro_multi_scale_resnet_1d | 2.11M | 25.33% | 89.87% | 72.61% | No |
| resnet18_1d | 3.85M | 24.21% | 90.31% | 72.87% | No |
| multi_scale_resnet_1d | 8.42M | 24.01% | 90.40% | 72.37% | No |
* [CinC Challenge 2018](https://physionet.org/physiobank/database/challenge/2018/#files) ->sample size = 994
| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature |
| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- |
| lstm | 1.25M | 26.85% | 89.26% | 71.39% | Yes |
| micro_multi_scale_resnet_1d | 2.11M | 27.01% | 89.20% | 73.12% | No |
| resnet18_1d | 3.85M | 25.84% | 89.66% | 73.32% | No |
| multi_scale_resnet_1d | 8.42M | 25.27% | 89.89% | 73.63% | No |
### Clone this repo:
```bash
git clone https://github.com/HypoX64/candock
cd candock
```
### Download dataset and pretrained-model
[[Google Drive]](https://drive.google.com/open?id=1NTtLmT02jqlc81lhtzQ7GlPK8epuHfU5) [[百度云,y4ks]](https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA)
* This datasets consists of signals.npy(shape:18207, 1, 2000) and labels.npy(shape:18207) which can be loaded by "np.load()".
* samples:18207, channel:1, length of each sample:2000, class:50
* Top1 err: 2.09%
### Train
```bash
python3 train.py --label 50 --input_nc 1 --dataset_dir ./datasets/simple_test --save_dir ./checkpoints/simple_test --model_name micro_multi_scale_resnet_1d --gpu_id 0 --batchsize 64 --k_fold 5
```
* For more [options](./options.py).
#### Use your own data to train
* step1: Generate signals.npy and labels.npy in the following format.
```python
#1.type:numpydata signals:np.float64 labels:np.int64
#2.shape signals:[num,ch,length] labels:[num]
#num:samples_num, ch :channel_num, num:length of each sample
#for example:
signals = np.zeros((10,1,10),dtype='np.float64')
labels = np.array([0,0,0,0,0,1,1,1,1,1]) #0->class0 1->class1
```
* step2: input ```--dataset_dir your_dataset_dir``` when running code.
### Test
```bash
python3 simple_test.py --label 50 --input_nc 1 --model_name micro_multi_scale_resnet_1d --gpu_id 0
```
\ No newline at end of file
https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf
\ No newline at end of file
google drive: https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf
百度云: https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA key:y4ks
https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf
\ No newline at end of file
google drive: https://drive.google.com/open?id=1pup2_tZFGQQwB-hoXRjpMxiD4Vmpn0Lf
百度云: https://pan.baidu.com/s/1WKWZL91SekrSlhOoEC1bQA key:y4ks
Sun Mar 22 00:30:40 2020
----------------- Options ---------------
BID: not-supported [default: 5_95_th]
batchsize: 16 [default: 64]
continue_train: False
dataset_dir: /home/hypo/MyProject/Ear_AU/datasets/emotion/candock_6class_60s_pad_selectlabel [default: ./datasets/sleep-edfx/]
dataset_name: preload
epochs: 150 [default: 20]
gpu_id: 1 [default: 0]
input_nc: 5 [default: 3]
k_fold: 5 [default: 0]
label: 6 [default: 5]
label_name: ['Amus', 'Neut', 'Sadn', 'Tend', 'Disg', 'Fear'] [default: auto]
lr: 0.001
model_name: multi_scale_resnet_1d [default: lstm]
network_save_freq: 1000 [default: 5]
no_cuda: False
no_cudnn: False
no_shuffle: False
pretrained: False
sample_num: not-supported [default: 20]
save_dir: ./checkpoints/EMDB_5ch_6class_last60s_pad_weightauto_selectlabel_multiscale [default: ./checkpoints/]
select_sleep_time: not-supported [default: False]
separated: False
signal_name: not-supported [default: EEG Fpz-Cz]
weight_mod: auto [default: normal]
----------------- End -------------------
network:
Multi_Scale_ResNet(
(pre_conv): Sequential(
(0): Conv1d(5, 64, kernel_size=(15,), stride=(2,), padding=(7,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool1d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)
(Route1): Route(
(block1): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 64, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(64, 64, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block2): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 128, kernel_size=(3,), stride=(2,), padding=(1,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
(4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block3): ResidualBlock(
(conv): Sequential(
(0): Conv1d(128, 256, kernel_size=(3,), stride=(2,), padding=(1,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
(4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block4): ResidualBlock(
(conv): Sequential(
(0): Conv1d(256, 512, kernel_size=(3,), stride=(2,), padding=(1,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
(4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool1d(output_size=1)
)
(Route2): Route(
(block1): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 64, kernel_size=(5,), stride=(1,), padding=(2,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(64, 64, kernel_size=(5,), stride=(1,), padding=(2,), bias=False)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block2): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 128, kernel_size=(5,), stride=(2,), padding=(2,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(128, 128, kernel_size=(5,), stride=(1,), padding=(2,), bias=False)
(4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block3): ResidualBlock(
(conv): Sequential(
(0): Conv1d(128, 256, kernel_size=(5,), stride=(2,), padding=(2,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False)
(4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block4): ResidualBlock(
(conv): Sequential(
(0): Conv1d(256, 512, kernel_size=(5,), stride=(2,), padding=(2,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,), bias=False)
(4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool1d(output_size=1)
)
(Route3): Route(
(block1): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 64, kernel_size=(7,), stride=(1,), padding=(3,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(64, 64, kernel_size=(7,), stride=(1,), padding=(3,), bias=False)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 64, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block2): ResidualBlock(
(conv): Sequential(
(0): Conv1d(64, 128, kernel_size=(7,), stride=(2,), padding=(3,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(3,), bias=False)
(4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(64, 128, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block3): ResidualBlock(
(conv): Sequential(
(0): Conv1d(128, 256, kernel_size=(7,), stride=(2,), padding=(3,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,), bias=False)
(4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(128, 256, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(block4): ResidualBlock(
(conv): Sequential(
(0): Conv1d(256, 512, kernel_size=(7,), stride=(2,), padding=(3,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv1d(512, 512, kernel_size=(7,), stride=(1,), padding=(3,), bias=False)
(4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(shortcut): Sequential(
(0): Conv1d(256, 512, kernel_size=(1,), stride=(2,), bias=False)
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool1d(output_size=1)
)
(fc): Linear(in_features=1536, out_features=6, bias=True)
)
net parameters: 8.42M
label statistics: [715 643 518 254 517 436]
Loss_weight:[0.81885882 0.87833147 0.99945861 1.39877569 1.00054139 1.09602268]
------------------------------ k-fold:1 ------------------------------
>>> per epoch cost time:99.97s
fold -> macro-prec,reca,F1,err,kappa: (0.3178, 0.3181, 0.2995, 0.6234, 0.2222)
confusion_mat:
[[64 42 4 3 16 6]
[18 82 8 5 20 5]
[24 41 21 2 16 6]
[12 18 4 0 7 2]
[ 8 24 7 0 50 10]
[ 8 28 6 2 27 12]]
------------------------------ k-fold:2 ------------------------------
>>> per epoch cost time:96.49s
fold -> macro-prec,reca,F1,err,kappa: (0.3149, 0.3155, 0.3127, 0.6464, 0.2092)
confusion_mat:
[[71 21 18 12 14 11]
[13 56 22 6 7 11]
[17 28 30 7 9 10]
[15 11 4 4 8 7]
[ 9 23 15 0 33 27]
[ 9 16 15 5 23 21]]
------------------------------ k-fold:3 ------------------------------
>>> per epoch cost time:95.27s
fold -> macro-prec,reca,F1,err,kappa: (0.3436, 0.3481, 0.3369, 0.6036, 0.2566)
confusion_mat:
[[86 17 18 8 13 2]
[16 53 24 4 15 7]
[24 20 38 3 9 5]
[15 16 14 3 7 2]
[10 18 12 4 49 12]
[11 23 16 2 20 12]]
------------------------------ k-fold:4 ------------------------------
>>> per epoch cost time:50.3s
fold -> macro-prec,reca,F1,err,kappa: (0.349, 0.354, 0.3469, 0.6102, 0.2523)
confusion_mat:
[[73 21 13 12 19 4]
[27 44 18 9 12 14]
[26 14 33 5 15 11]
[15 17 7 4 5 6]
[12 12 3 4 61 11]
[ 9 5 7 5 33 22]]
------------------------------ k-fold:5 ------------------------------
>>> per epoch cost time:49.65s
fold -> macro-prec,reca,F1,err,kappa: (0.3306, 0.3352, 0.3303, 0.6217, 0.2363)
confusion_mat:
[[68 18 18 9 20 7]
[17 61 29 5 13 12]
[18 23 30 5 14 7]
[ 9 14 13 2 5 3]
[11 14 5 3 45 19]
[ 9 17 11 3 27 24]]
------------------------------ final result ------------------------------
final -> macro-prec,reca,F1,err,kappa: (0.3299, 0.3345, 0.3284, 0.6211, 0.2357)
confusion_mat:
[[362 119 71 44 82 30]
[ 91 296 101 29 67 49]
[109 126 152 22 63 39]
[ 66 76 42 13 32 20]
[ 50 91 42 11 238 79]
[ 46 89 55 17 130 91]]
# candock
[这原本是一个用于记录毕业设计的日志仓库](<https://github.com/HypoX64/candock/tree/Graduation_Project>),其目的是尝试多种不同的深度神经网络结构(如LSTM,ResNet,DFCNN等)对单通道EEG进行自动化睡眠阶段分期.<br>目前,毕业设计已经完成,我将继续跟进这个项目。项目重点将转变为如何将代码进行实际应用,我们将考虑运算量与准确率之间的平衡。另外,将提供一些预训练的模型便于使用。<br>同时我们相信这些代码也可以用于其他生理信号(如ECG,EMG等)的分类.希望这将有助于您的研究或项目.<br>
![image](../image/compare.png)
## 如何运行
如果你需要运行这些代码(训练自己的模型或者使用预训练模型对自己的数据进行预测)请进入以下页面<br>
[How to run codes](./how_to_run(sleep_stage).md)<br>
## 数据集
使用了两个公开的睡眠数据集进行训练,分别是: [[CinC Challenge 2018]](https://physionet.org/physiobank/database/challenge/2018/#files) [[sleep-edfx]](https://www.physionet.org/physiobank/database/sleep-edfx/) <br>
对于CinC Challenge 2018数据集,我们仅使用其C4-M1通道, 对于sleep-edfx与sleep-edf数据集,使用Fpz-Cz通道<br>
注意:<br>
1.如果需要获得其他EEG通道的预训练模型,这需要下载这两个数据集并使用train.py完成训练。当然,你也可以使用自己的数据训练模型。<br>
2.对于sleep-edfx数据集,我们仅仅截取了入睡前30分钟到醒来后30分钟之间的睡眠区间作为读入数据(实验结果中用select sleep time 进行标注),目的是平衡各睡眠时期的比例并加快训练速度.<br>
## 一些说明
* 数据预处理<br>
1.降采样:CinC Challenge 2018数据集的EEG信号将被降采样到100HZ<br>
2.归一化处理:我们推荐每个受试者的EEG信号均采用5th-95th分位数归一化,即第5%大的数据为0,第95%大的数据为1。注意:所有预训练模型均按照这个方法进行归一化后训练得到<br>
3.将读取的数据分割为30s/Epoch作为一个输入,每个输入包含3000个数据点。睡眠阶段标签为5个分别是N3,N2,N1,REM,W.每个Epoch的数据将对应一个标签。标签映射:N3(S4+S3)->0 N2->1 N1->2 REM->3 W->4<br>
4.数据集扩充:训练时,对每一个Epoch的数据均要进行随机切,随机翻转,随机改变信号幅度等操作<br>
5.对于不同的网络结构,对原始eeg信号采取了预处理,使其拥有不同的shape:<br>
LSTM:将30s的eeg信号进行FIR带通滤波,获得θ,σ,α,δ,β波,并将它们进行连接后作为输入数据<br>
CNN_1d类(标有1d的网络):没有什么特别的操作,其实就是把图像领域的各种模型换成Conv1d之后拿过来用而已<br>
DFCNN类(就是科大讯飞的那种想法,先转化为频谱图,然后直接用图像分类的各种模型):将30s的eeg信号进行短时傅里叶变换,并生成频谱图作为输入,并使用图像分类网络进行分类。我们不推荐使用这种方法,因为转化为频谱图需要耗费较大的运算资源。<br>
* EEG频谱图<br>
这里展示5个睡眠阶段对应的频谱图,它们依次是Wake, Stage 1, Stage 2, Stage 3, REM<br>
![image](../image/spectrum_Wake.png)![image](../image/spectrum_Stage1.png)![image](../image/spectrum_Stage2.png)![image](../image/spectrum_Stage3.png)![image](../image/spectrum_REM.png)<br>
* multi_scale_resnet_1d 网络结构<br>
该网络参考[geekfeiw / Multi-Scale-1D-ResNet](https://github.com/geekfeiw/Multi-Scale-1D-ResNet) 这个网络将被我们命名为micro_multi_scale_resnet_1d<br>
修改后的[网络结构](https://github.com/HypoX64/candock/blob/master/image/multi_scale_resnet_1d_network.png)<br>
* 关于交叉验证<br>为了更好的进行实际应用,我们将使用受试者交叉验证。即训练集和验证集的数据来自于不同的受试者。值得注意的是sleep-edfx数据集中每个受试者均有两个样本,我们视两个样本为同一个受试者,很多paper忽略了这一点,手动滑稽。<br>
* 关于评估指标<br>
对于各睡眠阶段标签: Accuracy = (TP+TN)/(TP+FN+TN+FP) Recall = sensitivity = (TP)/(TP+FN)<br>
对于总体: Top1 err. Kappa 另外对Acc与Re做平均<br>
特别说明:这项分类任务中样本标签分布及不平衡,为了更具说服力,我们的平均并不加权。
## 部分实验结果
该部分将持续更新... ...<br>
[[Confusion matrix]](../confusion_mat)<br>
#### Subject Cross-Validation Results
特别说明:这项分类任务中样本标签分布及不平衡,我们对分类损失函数中的类别权重进行了魔改,这将使得Average Recall得到小幅提升,但同时整体error也将提升.若使用默认权重,Top1 err.至少下降5%,但这会导致数据占比极小的N1时期的recall猛跌20%,这绝对不是我们在实际应用中所希望看到的。下面给出的结果均是使用魔改后的权重得到的。<br>
* [sleep-edfx](https://www.physionet.org/physiobank/database/sleep-edfx/) ->sample size = 197, select sleep time
| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature |
| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- |
| lstm | 1.25M | 26.32% | 89.47% | 68.57% | Yes |
| micro_multi_scale_resnet_1d | 2.11M | 25.33% | 89.87% | 72.61% | No |
| resnet18_1d | 3.85M | 24.21% | 90.31% | 72.87% | No |
| multi_scale_resnet_1d | 8.42M | 24.01% | 90.40% | 72.37% | No |
* [CinC Challenge 2018](https://physionet.org/physiobank/database/challenge/2018/#files) ->sample size = 994
| Network | Parameters | Top1.err. | Avg. Acc. | Avg. Re. | Need to extract feature |
| --------------------------- | ---------- | --------- | --------- | -------- | ----------------------- |
| lstm | 1.25M | 26.85% | 89.26% | 71.39% | Yes |
| micro_multi_scale_resnet_1d | 2.11M | 27.01% | 89.20% | 73.12% | No |
| resnet18_1d | 3.85M | 25.84% | 89.66% | 73.32% | No |
| multi_scale_resnet_1d | 8.42M | 25.27% | 89.89% | 73.63% | No |
```
```
\ No newline at end of file
......@@ -131,16 +131,22 @@ def annotate_heatmap(im, data=None, valfmt="{x:.2f}",
def draw(mat,opt,name = 'train'):
if 'merge' in name:
label_name = opt.mergelabel_name
else:
label_name = opt.label_name
mat = mat.astype(float)
for i in range(mat.shape[0]):
mat[i,:]=mat[i,:]/np.sum(mat[i])*100
fig, ax = plt.subplots()
if len(mat)>8:
fig, ax = plt.subplots(figsize=(len(mat)+2.5, len(mat)))
else:
fig, ax = plt.subplots()
ax.set_ylabel('True',fontsize=12)
ax.set_xlabel('Pred',fontsize=12)
im, cbar = create_heatmap(mat, opt.label_name, opt.label_name, ax=ax,
im, cbar = create_heatmap(mat, label_name, label_name, ax=ax,
cmap="Blues", cbarlabel="percentage")
texts = annotate_heatmap(im,valfmt="{x:.1f}%")
......
......@@ -3,9 +3,6 @@ import os
import time
import util
# python3 train.py --dataset_dir '/media/hypo/Hypo/physionet_org_train' --dataset_name cc2018 --signal_name 'C4-M1' --sample_num 20 --model_name lstm --batchsize 64 --epochs 20 --lr 0.0005 --no_cudnn
# python3 train.py --dataset_dir './datasets/sleep-edfx/' --dataset_name sleep-edfx --signal_name 'EEG Fpz-Cz' --sample_num 50 --model_name lstm --batchsize 64 --network_save_freq 5 --epochs 25 --lr 0.0005 --BID 5_95_th --select_sleep_time --no_cudnn --select_sleep_time
class Options():
def __init__(self):
self.parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
......@@ -19,7 +16,7 @@ class Options():
self.parser.add_argument('--label', type=int, default=5,help='number of labels')
self.parser.add_argument('--input_nc', type=int, default=3, help='# of input channels')
self.parser.add_argument('--label_name', type=str, default='auto',help='name of labels,example:"a,b,c,d,e,f"')
self.parser.add_argument('--model_name', type=str, default='lstm',help='Choose model lstm | multi_scale_resnet_1d | resnet18 | micro_multi_scale_resnet_1d...')
self.parser.add_argument('--model_name', type=str, default='micro_multi_scale_resnet_1d',help='Choose model lstm | multi_scale_resnet_1d | resnet18 | micro_multi_scale_resnet_1d...')
self.parser.add_argument('--pretrained', action='store_true', help='if input, use pretrained models')
self.parser.add_argument('--continue_train', action='store_true', help='if input, continue train')
self.parser.add_argument('--lr', type=float, default=0.001,help='learning rate')
......@@ -30,7 +27,8 @@ class Options():
self.parser.add_argument('--k_fold', type=int, default=0,help='fold_num of k-fold.if 0 or 1,no k-fold')
self.parser.add_argument('--mergelabel', type=str, default='None',
help='merge some labels to one label and give the result, example:"[[0,1,4],[2,3,5]]" , label(0,1,4) regard as 0,label(2,3,5) regard as 1')
self.parser.add_argument('--mergelabel_name', type=str, default='None',help='name of labels,example:"a,b,c,d,e,f"')
self.parser.add_argument('--dataset_dir', type=str, default='./datasets/sleep-edfx/',
help='your dataset path')
self.parser.add_argument('--save_dir', type=str, default='./checkpoints/',help='save checkpoints')
......@@ -76,12 +74,14 @@ class Options():
names.append(str(i))
self.opt.label_name = names
else:
names = self.opt.label_name
names = names.replace(" ", "")
names = names.split(",")
self.opt.label_name = names
self.opt.label_name = self.opt.label_name.replace(" ", "").split(",")
self.opt.mergelabel = eval(self.opt.mergelabel)
if self.opt.mergelabel_name != 'None':
self.opt.mergelabel_name = self.opt.mergelabel_name.replace(" ", "").split(",")
"""Print and save options
......
......@@ -10,73 +10,29 @@ from options import Options
from creatnet import CreatNet
'''
--------------------------------preload data--------------------------------
@hypox64
19/05/18
download pretrained model and test data here:
https://drive.google.com/open?id=1NTtLmT02jqlc81lhtzQ7GlPK8epuHfU5
2020/04/03
'''
opt = Options().getparse()
#choose and creat model
net=CreatNet(opt.model_name)
net = CreatNet(opt)
if not opt.no_cuda:
net.cuda()
if not opt.no_cudnn:
import torch.backends.cudnn as cudnn
cudnn.benchmark = True
#load data
signals = np.load('./datasets/simple_test/signals.npy')
labels = np.load('./datasets/simple_test/labels.npy')
#load prtrained_model
net.load_state_dict(torch.load('./checkpoints/pretrained/'+opt.dataset_name+'/'+opt.model_name+'.pth'))
net.load_state_dict(torch.load('./checkpoints/pretrained/micro_multi_scale_resnet_1d_50class.pth'))
net.eval()
if not opt.no_cuda:
net.cuda()
def runmodel(eeg):
eeg = eeg.reshape(1,-1)
eeg = transformer.ToInputShape(eeg,opt.model_name,test_flag =True)
eeg = transformer.ToTensor(eeg,no_cuda =opt.no_cuda)
out = net(eeg)
pred = torch.max(out, 1)[1]
pred_stage=pred.data.cpu().numpy()
return pred_stage[0]
'''
you can change your input data here.
but the data needs meet the following conditions:
1.fs = 100Hz
2.collect by uv
3.type numpydata signals:np.float16 stages:np.int16
4.shape signals:[?,3000] stages:[?]
'''
eegdata = np.load('./datasets/simple_test/sleep_edfx_Fpz_Cz_test.npy')
true_stages = np.load('./datasets/simple_test/sleep_edfx_stages_test.npy')
print('shape of eegdata:',eegdata.shape)
print('shape of true_stage:',true_stages.shape)
#Normalize
eegdata = transformer.Balance_individualized_differences(eegdata, '5_95_th')
#run pretrained model
pred_stages=[]
for i in range(len(eegdata)):
pred_stages.append(runmodel(eegdata[i]))
pred_stages = np.array(pred_stages)
print('err:',sum((true_stages[i]!=pred_stages[i])for i in range(len(pred_stages)))/len(true_stages)*100,'%')
#plot result
plt.subplot(211)
plt.plot(true_stages+1)
plt.xlim((0,len(true_stages)))
plt.ylim((0,6))
plt.yticks([1, 2, 3, 4, 5],['N3', 'N2', 'N1', 'REM', 'W'])
plt.xticks([],[])
plt.title('Manually scored hypnogram')
plt.subplot(212)
plt.plot(pred_stages+1)
plt.xlim((0,len(true_stages)))
plt.ylim((0,6))
plt.yticks([1, 2, 3, 4, 5],['N3', 'N2', 'N1', 'REM', 'W'])
plt.xlabel('Epoch number')
plt.title('Auto scored hypnogram')
plt.show()
for signal,true_label in zip(signals, labels):
signal = signal.reshape(1,1,-1) #batchsize,ch,length
true_label = true_label.reshape(1,-1) #batchsize,label
signal,true_label = transformer.ToTensor(signal,true_label,no_cuda =opt.no_cuda)
out = net(signal)
pred_label = torch.max(out, 1)[1]
pred_label=pred_label.data.cpu().numpy()
true_label=true_label.data.cpu().numpy()
print(("true:{0:d} predict:{1:d}").format(true_label[0][0],pred_label[0]))
......@@ -45,9 +45,11 @@ util.writelog('network:\n'+str(net),opt,True)
util.show_paramsnumber(net,opt)
weight = np.ones(opt.label)
if opt.weight_mod == 'auto':
weight = np.log(1/label_cnt_per)
weight = weight/np.median(weight)
weight = np.clip(weight, 0.8, 2)
weight = 1/label_cnt_per
weight = weight/np.min(weight)
# weight = np.log(1/label_cnt_per)
# weight = weight/np.median(weight)
# weight = np.clip(weight, 0.8, 2)
util.writelog('label statistics: '+str(label_cnt),opt,True)
util.writelog('Loss_weight:'+str(weight),opt,True)
weight = torch.from_numpy(weight).float()
......@@ -149,6 +151,7 @@ for fold in range(opt.k_fold):
final_confusion_mat = confusion_mats[pos]
if opt.k_fold==1:
statistics.statistics(final_confusion_mat, opt, 'final', 'final_test')
np.save(os.path.join(opt.save_dir,'confusion_mat.npy'), final_confusion_mat)
else:
fold_final_confusion_mat += final_confusion_mat
util.writelog('fold -> macro-prec,reca,F1,err,kappa: '+str(statistics.report(final_confusion_mat)),opt,True)
......@@ -157,7 +160,8 @@ for fold in range(opt.k_fold):
if opt.k_fold != 1:
statistics.statistics(fold_final_confusion_mat, opt, 'final', 'k-fold-final_test')
np.save(os.path.join(opt.save_dir,'confusion_mat.npy'), fold_final_confusion_mat)
if opt.mergelabel:
mat = statistics.mergemat(fold_final_confusion_mat, opt.mergelabel)
statistics.statistics(mat, opt, 'merge', 'mergelabel_test')
statistics.statistics(mat, opt, 'merge', 'mergelabel_final')
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册