各位大神,我想请教问题,刚入门的小白。
人脸表情数据集用的 FER2013。
CNN 结构是:conv1(3 3 64)->conv2(3 3 64)->maxpool1->conv3(3 3 128)->conv4(3 3 128)->maxpool2(dropout=0.2)->conv5(3 3 256)->conv6(3 3 256)->maxpool3(dropout=0.25)->conv7(3 3 512)->conv8(3 3 512)->maxpool4(dropout=0.25)->fc1(dropout=0.25)->fc2(dropout=0.25)->softmax
激活函数都是 RElu batch_size=50 learning_rate=0.001 训练数据=30000 个 测试数据=5000 个
我发现 epoch 跑到 50 多的时候,每一个 batch 的 loss 跟 acc 开始重复上一个 epoch 了,请问这样应该怎么改进呀,这个模型是看的一篇论文上的,论文上能跑到 60%,我卡在 25%不动了
希望大佬们如果做过这方面的话给我指点一下,谢谢了。
下面这是我跑的数据: Epoch: 72, Test Loss= 0.018, Test Accuracy= 0.256 Epoch: 73, Batch: 0, Loss= 1.779, Training Accuracy= 0.240 Epoch: 73, Batch: 50, Loss= 1.774, Training Accuracy= 0.320 Epoch: 73, Batch: 100, Loss= 1.803, Training Accuracy= 0.220 Epoch: 73, Batch: 150, Loss= 1.802, Training Accuracy= 0.260 Epoch: 73, Batch: 200, Loss= 1.882, Training Accuracy= 0.180 Epoch: 73, Batch: 250, Loss= 1.808, Training Accuracy= 0.220 Epoch: 73, Batch: 300, Loss= 1.932, Training Accuracy= 0.160 Epoch: 73, Batch: 350, Loss= 1.811, Training Accuracy= 0.300 Epoch: 73, Batch: 400, Loss= 1.801, Training Accuracy= 0.300 Epoch: 73, Batch: 450, Loss= 1.775, Training Accuracy= 0.280 Epoch: 73, Batch: 500, Loss= 1.754, Training Accuracy= 0.280 Epoch: 73, Batch: 550, Loss= 1.737, Training Accuracy= 0.280 Epoch: 73, Test Loss= 0.018, Test Accuracy= 0.256 Epoch: 74, Batch: 0, Loss= 1.779, Training Accuracy= 0.240 Epoch: 74, Batch: 50, Loss= 1.774, Training Accuracy= 0.320 Epoch: 74, Batch: 100, Loss= 1.803, Training Accuracy= 0.220 Epoch: 74, Batch: 150, Loss= 1.802, Training Accuracy= 0.260 Epoch: 74, Batch: 200, Loss= 1.882, Training Accuracy= 0.180 Epoch: 74, Batch: 250, Loss= 1.808, Training Accuracy= 0.220 Epoch: 74, Batch: 300, Loss= 1.932, Training Accuracy= 0.160 Epoch: 74, Batch: 350, Loss= 1.811, Training Accuracy= 0.300 Epoch: 74, Batch: 400, Loss= 1.801, Training Accuracy= 0.300 Epoch: 74, Batch: 450, Loss= 1.775, Training Accuracy= 0.280 Epoch: 74, Batch: 500, Loss= 1.754, Training Accuracy= 0.280 Epoch: 74, Batch: 550, Loss= 1.737, Training Accuracy= 0.280 Epoch: 74, Test Loss= 0.018, Test Accuracy= 0.256
1
larryli1995 OP Epoch: 72, Test Loss= 0.018, Test Accuracy= 0.256
Epoch: 73, Batch: 0, Loss= 1.779, Training Accuracy= 0.240 Epoch: 73, Batch: 50, Loss= 1.774, Training Accuracy= 0.320 Epoch: 73, Batch: 100, Loss= 1.803, Training Accuracy= 0.220 Epoch: 73, Batch: 150, Loss= 1.802, Training Accuracy= 0.260 Epoch: 73, Batch: 200, Loss= 1.882, Training Accuracy= 0.180 Epoch: 73, Batch: 250, Loss= 1.808, Training Accuracy= 0.220 Epoch: 73, Batch: 300, Loss= 1.932, Training Accuracy= 0.160 Epoch: 73, Batch: 350, Loss= 1.811, Training Accuracy= 0.300 Epoch: 73, Batch: 400, Loss= 1.801, Training Accuracy= 0.300 Epoch: 73, Batch: 450, Loss= 1.775, Training Accuracy= 0.280 Epoch: 73, Batch: 500, Loss= 1.754, Training Accuracy= 0.280 Epoch: 73, Batch: 550, Loss= 1.737, Training Accuracy= 0.280 Epoch: 73, Test Loss= 0.018, Test Accuracy= 0.256 Epoch: 74, Batch: 0, Loss= 1.779, Training Accuracy= 0.240 Epoch: 74, Batch: 50, Loss= 1.774, Training Accuracy= 0.320 Epoch: 74, Batch: 100, Loss= 1.803, Training Accuracy= 0.220 Epoch: 74, Batch: 150, Loss= 1.802, Training Accuracy= 0.260 Epoch: 74, Batch: 200, Loss= 1.882, Training Accuracy= 0.180 Epoch: 74, Batch: 250, Loss= 1.808, Training Accuracy= 0.220 Epoch: 74, Batch: 300, Loss= 1.932, Training Accuracy= 0.160 Epoch: 74, Batch: 350, Loss= 1.811, Training Accuracy= 0.300 Epoch: 74, Batch: 400, Loss= 1.801, Training Accuracy= 0.300 Epoch: 74, Batch: 450, Loss= 1.775, Training Accuracy= 0.280 Epoch: 74, Batch: 500, Loss= 1.754, Training Accuracy= 0.280 Epoch: 74, Batch: 550, Loss= 1.737, Training Accuracy= 0.280 Epoch: 74, Test Loss= 0.018, Test Accuracy= 0.256 |
2
winglight2016 2018-03-14 14:13:42 +08:00
你这个卷积后面怎么又跟一个卷积?每个卷积后面都要加一个池化,一个 dropout 吧?
|
3
Hzzone 2018-03-14 14:20:34 +08:00
训练数据打乱了吗?
|
4
enenaaa 2018-03-14 14:27:16 +08:00
batch size 搞大点试试
|
5
ttvlls 2018-03-14 14:33:56 +08:00 via Android
@winglight2016 当然不是
|
6
ioiogoo 2018-03-14 14:35:29 +08:00
能否把论文发出来看看?
我感觉这个结构里面用的 dropout 太多了(纯讨论),dropout 是为了防止参数过多而导致过拟合,卷积层由于所有参数共享且参数较少,所以过拟合的问题不是很严重,加这么多的 dropout 会不会因为信息丢失太多而导致欠拟合或者训练速度减慢? 看到这个帖子后搜到的一些关于 dropout 层是否应该用在卷积层的讨论: https://www.quora.com/Why-would-I-need-to-apply-a-dropout-layer-before-a-convolutional-layer https://stats.stackexchange.com/questions/240305/where-should-i-place-dropout-layers-in-a-neural-network https://www.zhihu.com/question/52426832 |
7
takato 2018-03-14 14:43:53 +08:00 via iPhone
个人感觉的优化方向:
1.batchnorm 2.增加 residual connection 3.减少中间层的 dropout |
8
glasslion 2018-03-14 15:40:12 +08:00
你这模型是 vgg? 原版 vgg 的卷积层是没有 dropout 的. 前面几位也提到了, 卷积层一般不需要加 dropout, 可以考虑加 batchnorm.
也可以调调 learning rate 和 optimizer. |
9
glasslion 2018-03-14 15:49:53 +08:00
先画个 confusion matrix 看看每个每个分类的到底是错在了哪里?
|
10
Suddoo 2018-03-14 16:49:53 +08:00
我之前一般将 batch_size 设置成 2 的整数次幂,显存不够的话,batch_size 有限制的,训练集和验证集一般 4:1,楼主用的什么框架? 还有,数据预处理的时候是不是可以考虑标准化,减去均值,除以方差。
学习率我一般用 0.001 初始化,然后不断调小。 前面有人提到的,数据训练前要 shuffle 一下。 |
11
larryli1995 OP @winglight2016 我是看的一篇论文上这么做的,他就达到了 60 多的准确率,不知道怎么搞得。
@Hzzone 这个数据本来就是乱的呀 你是说 batch 随机取吗? @enenaaa 我等下试试 谢谢啦。 @ioiogoo 感谢感谢,我研究下,不会了再问您。 @takato 谢谢 我等下试试 我觉得改成 INCEPTION 模型应该也不错。 @glasslion 谢谢,我等下画个 confusion matrix 分析一下。 @Suddoo 我用的是 TF 框架,这个 FER2013 本来不就是乱着的吗? shuffle 会有用吗?还有您说数据标准化,我已经标准化了,然后把最后 softmax 去掉了,不知道这样可以不可以,之前没标准化,最后加 SOFTMAX 准确率更低了 |
12
inflationaaron 2018-03-15 07:48:29 +08:00 via iPad
Dropout 起的是 regulaization 的作用,你的 training acc 还这么低的时候可以先把所有的 dropout 关掉,等到调整完其他的结构、overfit 之后,再加入 dropout 并调整参数
|
13
YRodT 2018-03-15 08:54:28 +08:00 via Android
可以尝试的方向:
1.在所有卷积层加 batchnorm,去掉所有 dropout 2.如果使用了 vgg,尝试用已有的 vgg 模型参数初始化你这个模型的前几层 3.学习率使用 steps 方式,当 loss 反复时学习率减半 4.你没有说优化方法,adam 和 momentum 适合的学习率不太一样 |
14
larryli1995 OP |
15
larryli1995 OP @YRodT 优化是 AdamOptimizer
|
16
smit 2018-04-19 09:41:59 +08:00
你好,我想我跟你遇到一样的问题,甚至有点玄学。。我用另外一个网络,test loss 也卡在 0.018 ,跟你的结果一模一样,能否加扣扣交流?我的是 2640062655,期待你的回复
|