TensorFlow之CNN：运用Batch Norm、Dropout和早停优化卷积神经网络-白红宇

TensorFlow之CNN：运用Batch Norm、Dropout和早停优化卷积神经网络

阅读量：5459 次

发布时间：2019-06-15

本文共 13610 字，大约阅读时间需要 45 分钟。

学卷积神经网络的理论的时候，我觉得自己看懂了，可是到了用代码来搭建一个卷积神经网络时，我发现自己有太多模糊的地方。这次还是基于MINIST数据集搭建一个卷积神经网络，首先给出一个基本的模型，然后再用Batch Norm、Dropout和早停对模型进行优化；在此过程中说明我在调试代码过程中遇到的一些问题和解决方法。

一、搭建基本的卷积神经网络

第一步：准备数据

在《Hands on Machine Learning with Scikit-Learn and TensorFlow》这本书上，用的是下面这一段代码来下载MINIST数据集。

from tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("/tmp/data/")

用这种方式下载可能会报一个URLError的错误。大意是SSL证书验证失败，可以在前面加上下面那一段代码来取消SSL证书验证。

URLError:

import sslssl._create_default_https_context = ssl._create_unverified_context

然后运行后会出现一大堆的WARNING，但是不用担心，数据集还是能下载成功，而且还贴心地划分好了训练集、验证集和测试集，生成了batch，并reshape成了恰当的输入格式（比如训练集的维度已经是（55000, 784））。问题是下载太慢了，我失败了很多次，成功全靠运气。

我还是倾向于用tf.keras.datasets.mnist.load_data()来下载野生原始数据，然后自己动手划分数据、生成batch、整理成恰当的输入格式。

import tensorflow as tfimport numpy as npimport timefrom datetime import timedelta# 记录训练花费的时间def get_time_dif(start_time):    end_time = time.time()    time_dif = end_time - start_time    #timedelta是用于对间隔进行规范化输出，间隔10秒的输出为：00:00:10        return timedelta(seconds=int(round(time_dif)))# 准备训练数据集、验证集和测试集，并生成小批量样本(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()# 对数据进行归一化，把训练集reshape成（60000,784）的维度X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0y_train = y_train.astype(np.int32)y_test = y_test.astype(np.int32)# 划分训练集和验证集X_valid, X_train = X_train[:5000], X_train[5000:]y_valid, y_train = y_train[:5000], y_train[5000:]def shuffle_batch(X, y, batch_size):    rnd_idx = np.random.permutation(len(X))    n_batches = len(X) // batch_size    for batch_idx in np.array_split(rnd_idx, n_batches):        X_batch, y_batch = X[batch_idx], y[batch_idx]        yield X_batch, y_batch

第二步：配置参数

构建的网络有两个卷积层和一个全连接层，结构是：输入层—卷积层1—卷积层2—最大池化层—全连接层—输出层。卷积层又由卷积核与ReLU激活函数构成。

第一个卷积层有16个卷积核，尺寸为(3, 3)，步幅为1，进行补零操作。第二个卷积层有32个卷积核，尺寸为（3，3），步幅为2，也进行补零。一般而言，越靠后的卷积层，输出的特征图要越多，而每个特征图的尺寸要越小，这就需要增加卷积核、增大卷积核尺寸和增大步幅。这样越往后就能提取到越高级的特征。

每个特征图上的神经元的参数（权重和偏置）是共享的，而不同特征图则有着不同的参数。每一个特征图都能提取出一个图片特征，这意味着特征图越多，提取到的图片特征也越多。

然后我们来看看相关的计算。假设卷积层的输入神经元个数为 n，卷积核大小为 m，步长为 s，输入神经元两端各填补p个零，那么该卷积层的输出神经元的个数为 (n-m+2p)/s + 1。由下面的参数可以知道，第1个卷积层输入神经元的数量为 n=28*28=784，m=3，s=1，由于padding=“SAME”，那么由 (784-3+2p)+1=784可知，p=1，也就是左右各补1个零。

可是在第2个卷积层，我却算出来补零的个数p不是整数，不知道是怎么进行后续操作的。

# 设定输入的高度、宽度、通道数height = 28width = 28channels = 1n_inputs = height * width# 设定卷积层特征图（过滤器）的个数，卷积核的尺寸、步幅conv1_fmaps = 16conv1_ksize = 3conv1_stride = 1conv1_pad = "SAME"conv2_fmaps = 32conv2_ksize = 3conv2_stride = 2conv2_pad = "SAME"# 最大池化层的特征图数量（通道数）pool3_fmaps = conv2_fmaps# 设定全连接层的神经元数量。n_fc1 = 32n_outputs = 10

第三步：构建卷积网络

下面的代码正是按照上面所说的网络结构去构建的，需要注意的地方有两点：一是最大池化时不要补零，因为池化的作用就是减少内存占用和参数数量；二是在输入到全连接层之前，要把所有特征图拉平成一个向量。

with tf.name_scope("inputs"):    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])    y = tf.placeholder(tf.int32, shape=[None], name="y")conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,                         strides=conv1_stride, padding=conv1_pad,                         activation=tf.nn.relu, name="conv1")conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize,                         strides=conv2_stride, padding=conv2_pad,                         activation=tf.nn.relu, name="conv2")with tf.name_scope("pool3"):    pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")    # 把所有特征图拉平成一个向量，最大池化后特在图缩小为原来的1/16，所以由28*28变成了7*7    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 7 * 7])with tf.name_scope("fc1"):    fc1 = tf.layers.dense(pool3_flat, n_fc1, activation=tf.nn.relu, name="fc1")with tf.name_scope("output"):    logits = tf.layers.dense(fc1, n_outputs, name="output")    Y_proba = tf.nn.softmax(logits, name="Y_proba")with tf.name_scope("train"):    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)    loss = tf.reduce_mean(xentropy)    optimizer = tf.train.AdamOptimizer()    training_op = optimizer.minimize(loss)with tf.name_scope("eval"):    correct = tf.nn.in_top_k(logits, y, 1)    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

第四步：训练和评估模型

训练和评估阶段最大的问题就是在卷积层可能存在内存溢出，尤其是评估和测试时。训练时batch-size=100，问题不大，而验证集的样本数为5000，测试集的样本数为10000，在计算时是非常消耗内存的。我在测试时，就出现了如下的错误：

ResourceExhaustedError: OOM when allocating tensor with shape[10000,16,29,29]...

OOM意思就是“ Out of Memorry”，这段错误是指在测试阶段内存溢出了。我的GPU是GTX960M，显卡内存是2G，实际训练模型时可用的大概是1.65G，还是比较小。

遇到这种问题，有几种解决办法：一种是让模型简单点，比如减少卷积层的特征图数量，增加步幅，减少卷积层的数量，但是这一般会让模型的性能下降；第二种方法是把32位的浮点数改为16位的；第三种方法是在评估和测试时也进行小批量操作。

让模型变得简单会减低模型的性能，我试了，的确如此，因此我选择了第三种方法，在评估和测试时，把数据按每批次1000个样本输入，然后求平均值。最终的验证精度为98.74%。

with tf.name_scope("init_and_save"):    init = tf.global_variables_initializer()    saver = tf.train.Saver()        n_epochs = 10    batch_size = 100    with tf.Session() as sess:    init.run()    start_time = time.time()        for epoch in range(n_epochs):        for X_batch, y_batch in shuffle_batch(X_train,y_train,batch_size):                        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})            acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})                    if epoch % 2 == 0 or epoch == 9:            # 每次输入1000个样本进行评估，然后求平均值            acc_val = []            for i in range(len(X_valid)//1000):                acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]}))                             acc_val = np.mean(acc_val)                        print('Epoch:{0:>4}, Train accuracy:{1:>7.2%},Validate accuracy:{2:7.2%}'.format(epoch,acc_train, acc_val))                time_dif = get_time_dif(start_time)    print("\nTime usage:", time_dif)    acc_test = []    # 每次输入1000个样本进行测试，再求平均值    for i in range(len(X_test)//1000):        acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]}))    acc_test = np.mean(acc_test)    print("\nTest_accuracy:{0:>7.2%}".format(acc_test))

Epoch:   0, Train accuracy: 98.00%,Validate accuracy: 97.12%Epoch:   2, Train accuracy: 97.00%,Validate accuracy: 98.34%Epoch:   4, Train accuracy:100.00%,Validate accuracy: 98.62%Epoch:   6, Train accuracy:100.00%,Validate accuracy: 98.84%Epoch:   8, Train accuracy: 99.00%,Validate accuracy: 98.68%Epoch:   9, Train accuracy:100.00%,Validate accuracy: 98.86%Time usage: 0:01:02Test_accuracy: 98.68%

二、用Batch Norm、Dropout和早停优化卷积神经网络

参考的这本书里用Dropout和早停来优化卷积神经网络的基本模型，没有用Batch Norm来优化。我觉得作者实现早停的代码太复杂了，推荐用我的这个代码来实现，清晰明了。

关于在卷积神经网络中运用Batch Norm的代码我暂时没找到，只能凭自己的理解来实现。Batch Norm在哪些层用呢？我觉得在卷积层和全连接层（包括输出层）用，在池化层就不用了，因为内部协变量偏移问题应该主要源自于层与层之间的非线性变换，而池化层的输出值并没有做非线性激活，因此在之后的全连接层做Batch Norm就行。

Dropout运用在池化层和全连接层，丢弃率分别为0.25和0.5，注意是按照Batch Norm—SELU函数激活—Dropout的顺序来进行。

同时将第2个卷积层的卷积步幅设置为1，以获得尺寸更大的特征图和更多参数。

设置迭代轮次为20，batch size = 100，做Batch Norm 时因为要求每个小批量的均值和方差，因此batch size 可以稍微设置得大一些。如果2000步以后验证精度仍然没有提升，那就中止训练。

结果，模型在第18轮、第9921步中止了训练，最好的验证精度为99.22%，测试精度为98.94%。

import tensorflow as tfimport numpy as npimport timefrom datetime import timedeltafrom functools import partial# 记录训练花费的时间def get_time_dif(start_time):    end_time = time.time()    time_dif = end_time - start_time    #timedelta是用于对间隔进行规范化输出，间隔10秒的输出为：00:00:10        return timedelta(seconds=int(round(time_dif)))# 准备训练数据集、验证集和测试集，并生成小批量样本(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0y_train = y_train.astype(np.int32)y_test = y_test.astype(np.int32)X_valid, X_train = X_train[:5000], X_train[5000:]y_valid, y_train = y_train[:5000], y_train[5000:]def shuffle_batch(X, y, batch_size):    rnd_idx = np.random.permutation(len(X))    n_batches = len(X) // batch_size    for batch_idx in np.array_split(rnd_idx, n_batches):        X_batch, y_batch = X[batch_idx], y[batch_idx]        yield X_batch, y_batchheight = 28width = 28channels = 1n_inputs = height * width# 第一个卷积层有16个卷积核# 卷积核的大小为（3,3）# 步幅为1# 通过补零让输入与输出的维度相同conv1_fmaps = 16conv1_ksize = 3conv1_stride = 1conv1_pad = "SAME"conv2_fmaps = 32conv2_ksize = 3conv2_stride = 1conv2_pad = "SAME"# 在池化层丢弃25%的神经元conv2_dropout_rate = 0.25pool3_fmaps = conv2_fmapsn_fc1 = 32# 在全连接层丢弃50%的神经元fc1_dropout_rate = 0.5n_outputs = 10with tf.name_scope("inputs"):    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])    y = tf.placeholder(tf.int32, shape=[None], name="y")    training = tf.placeholder_with_default(False, shape=[], name='training')# 构建一个batch norm层，便于复用。用移动平均求全局的样本均值和方差，动量参数取0.9my_batch_norm_layer = partial(tf.layers.batch_normalization,                              training=training, momentum=0.9)with tf.name_scope("conv"):    # batch norm之后在激活，所以这里不设定激活函数    conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,                         strides=conv1_stride, padding=conv1_pad,                         activation=None, name="conv1")    # 进行batch norm之后，再激活    batch_norm1 = tf.nn.selu(my_batch_norm_layer(conv1))    conv2 = tf.layers.conv2d(batch_norm1, filters=conv2_fmaps, kernel_size=conv2_ksize,                         strides=conv2_stride, padding=conv2_pad,                         activation=None, name="conv2")    batch_norm2 = tf.nn.selu(my_batch_norm_layer(conv2))   with tf.name_scope("pool3"):    pool3 = tf.nn.max_pool(batch_norm2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")    # 把特征图拉平成一个向量    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 14 * 14])    # 丢弃25%的神经元    pool3_flat_drop = tf.layers.dropout(pool3_flat, conv2_dropout_rate, training=training)    with tf.name_scope("fc1"):    fc1 = tf.layers.dense(pool3_flat_drop, n_fc1, activation=None, name="fc1")    # 在全连接层进行batch norm，然后激活    batch_norm4 = tf.nn.selu(my_batch_norm_layer(fc1))    # 丢弃50%的神经元    fc1_drop = tf.layers.dropout(batch_norm4, fc1_dropout_rate, training=training)    with tf.name_scope("output"):    logits = tf.layers.dense(fc1_drop, n_outputs, name="output")    logits_batch_norm = my_batch_norm_layer(logits)    Y_proba = tf.nn.softmax(logits_batch_norm, name="Y_proba")with tf.name_scope("loss_and_train"):    learning_rate = 0.01    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_batch_norm, labels=y)    loss = tf.reduce_mean(xentropy)    optimizer = tf.train.AdamOptimizer(learning_rate)    # 这是需要额外更新batch norm的参数    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)    # 模型参数的优化依赖与batch norm参数的更新    with tf.control_dependencies(extra_update_ops):        training_op = optimizer.minimize(loss)with tf.name_scope("eval"):    correct = tf.nn.in_top_k(logits, y, 1)    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))with tf.name_scope("init_and_save"):    init = tf.global_variables_initializer()    saver = tf.train.Saver()    n_epochs = 20batch_size = 100with tf.Session() as sess:    init.run()    start_time = time.time()        # 记录总迭代步数，一个batch算一步    # 记录最好的验证精度    # 记录上一次验证结果提升时是第几步。    # 如果迭代2000步后结果还没有提升就中止训练。    total_batch = 0    best_acc_val = 0.0    last_improved = 0    require_improvement = 2000        flag = False    for epoch in range(n_epochs):        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):                        sess.run(training_op, feed_dict={training:True, X: X_batch, y: y_batch})                        # 每次迭代10步就验证一次            if total_batch % 10 == 0:                acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})                            # 每次输入1000个样本进行评估，然后求平均值                acc_val = []                for i in range(len(X_valid)//1000):                    acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]}))                                 acc_val = np.mean(acc_val)                                # 如果验证精度提升了，就替换为最好的结果，并保存模型                if acc_val > best_acc_val:                    best_acc_val = acc_val                    last_improved = total_batch                    save_path = saver.save(sess, "./my_model_CNN_stop.ckpt")                    improved_str = 'improved!'                else:                    improved_str = ''                                # 记录训练时间，并格式化输出验证结果，如果提升了，会在后面提示：improved！                time_dif = get_time_dif(start_time)                msg = 'Epoch:{0:>4}, Iter: {1:>6}, Acc_Train: {2:>7.2%}, Acc_Val: {3:>7.2%}, Time: {4} {5}'                print(msg.format(epoch, total_batch, acc_batch, acc_val, time_dif, improved_str))                        # 记录总迭代步数                total_batch += 1                        # 如果2000步以后还没提升，就中止训练。            if total_batch - last_improved > require_improvement:                print("Early stopping in  ",total_batch," step! And the best validation accuracy is ",best_acc_val, '.')                # 跳出这个轮次的循环                flag = True                break        # 跳出所有训练轮次的循环        if flag:            break            with tf.Session() as sess:    saver.restore(sess, "./my_model_CNN_stop.ckpt")     # 每次输入1000个样本进行测试，再求平均值    acc_test = []    for i in range(len(X_test)//1000):        acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]}))    acc_test = np.mean(acc_test)    print("\nTest_accuracy:{0:>7.2%}".format(acc_test))

Early stopping in   9921  step! And the best validation accuracy is  0.9922 .INFO:tensorflow:Restoring parameters from ./my_model_CNN_stop.ckptTest_accuracy: 98.94%

参考资料：

《Hands on Machine Learning with Scikit-Learn and TensorFlow》

转载于:https://www.cnblogs.com/Luv-GEM/p/10783252.html

你可能感兴趣的文章