Deep Learning Cifar-10 Classifier Using Residual Neural Network

In this article, a deep learning Residual Neural Network will be designed in DLHUB to classify 10 different image types in the famous Cifar-10 dataset..

Tuan Nghia Nguyen · 6/28/2019 8:49:49 PM

Deep Learning Cifar-10 Classifier Using Residual Neural Network


1. Introduction 

Since deep learning has been introduced, it becomes an indispensable tool in pattern recognition applications with big datasets. We all know that a deep learning neural network consists of a lot of layers with a certain architecture, and we also know that a deeper layer neural network architecture might solve complicated problems that are involved in a very big dataset. The question is how deep a neural network model can be achieved. 

With traditional deep learning models such as multiple perception layers (MLP) and convolutional layers, we will face the training speed and saturate accuracy problems when we increase the number of layers. To solve this issue, Residual Neural Network architecture is introduced to allow us to increase the number of layers, yet improving accuracy and avoid the problem of vanishing gradients. More information on Residual Neural Network can be found here.   

This article aims to demonstrate how to design a deep learning Residual Neural Network in DLHUB software with ease.

2. Obtain the Cifar-10 dataset.

The first step is to prepare a training dataset. The Cifar-10 dataset is provided by Toronto University, and it can be download directly from the website. You can download this dataset and place them into the correct example folder. Alternatively, DLHUB is equipped with pre-shipped examples. Please read the MNIST blog to get these examples. Once the examples or dataset is downloaded, the correct structure should look like a bellow figure.


This Cifar-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 1000 test images. The dataset structure contains an image folder with training and test map files.

In this article, only a small subset of the Cifar dataset is used: 23973 images for training and 139 images for testing. 


3. Load the Cifar-10 dataset into DLHUB

DLHUB supports the Cifar-10 dataset described in part 2. Loading the Cifar-10 dataset can be done by few simple steps. First, we need to browse to the dataset location (1) and select the training map file (2), then we need to make sure that the input shape for the Cifar-10 image is set correctly (3). Since the Cifar-10 image is in 32x32 color format, the input shape will be 32x32x3. The image augmentation option is also supported in this loading page to improve accuracy during the training process.





4. Configure a deep learning Residual Neural Network in DLHUB

Constructing a Residual Neural Network (ResNet) in DLHUB is a simple task as DLHUB provides a built-in ResNet layer. First, select a ResNet layer in the Select Function palette (1). This ResNet consists of a deep learning Residual Neural Network architecture with deep layers. The ResNet parameters can be configured in the configuration panel (2). In this example, we only need to change the number of output nodes, but advanced users can tweak the ResNet architecture by ticking on the Advance checkbox in the configuration panel. After the NetNet layer is added, a fully connected (Dense) layer will be used as an output layer with 10 nodes for 10 different classes. The Softmax is chosen as the activation function for this Dense output layer (3). The model is then verified before we can move to the training page.




4. Train a deep learning ResNet in DLHUB

In this example, the training algorithm is the Stochastic Gradient Descent (SGD) with the learning rate of 0.0079. The Softmax Cross Entropy is used as the loss function (1). The mini-batch training technique is used, and the batch size is configured as 64. The training epochs are set to be 200 (2).

After around 29 minutes, the training process is finished. As can be seen in the training graph, the loss value continues to go down if we continue to train the deep ResNet, implying the accuracy will continue to improve.



5. Evaluate the trained deep learning ResNet in DLHUB

The trained deep ResNet can be evaluated with the test dataset with around 80% accuracy for only 200 training epochs. The accuracy will be improved if the number of training epochs increases.



6. Test and export the trained deep learning ResNet in DLHUB

The trained ResNet model can be tested in the built-in test page by browsing to the test image folder location (1), selecting an image for testing (2), visualizing the selected image (3), and confirming predicted result (4).  




Once the trained model is evaluated, tested and verified, it can be exported to a format that can be used in supported programming languages such as LabVIEW or C# (5). The exported model might also be viewed by the Netron software as bellows:


7. Construct deep leaning ResNet in Python code. 

To demonstrate how difficult to construct a similar deep learning ResNet in Python, the sample Python code provided by KeDengMs from Microsoft CNTK Github will be used.  

First, we need to construct a deep learning ResNet model in following Python code

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See file in the project root
# for full license information.
# ==============================================================================

import numpy as np
from cntk.initializer import he_normal, normal
from cntk.layers import AveragePooling, MaxPooling, BatchNormalization, Convolution, Dense
from cntk.ops import element_times, relu

# assembly components
def conv_bn(input, filter_size, num_filters, strides=(1, 1), init=he_normal(), bn_init_scale=1):
    c = Convolution(filter_size, num_filters, activation=None, init=init, pad=True, strides=strides, bias=False)(input)
    r = BatchNormalization(map_rank=1, normalization_time_constant=4096, use_cntk_engine=False, init_scale=bn_init_scale, disable_regularization=True)(c)
    return r

def conv_bn_relu(input, filter_size, num_filters, strides=(1, 1), init=he_normal()):
    r = conv_bn(input, filter_size, num_filters, strides, init, 1)
    return relu(r)

# ResNet components
def resnet_basic(input, num_filters):
    c1 = conv_bn_relu(input, (3, 3), num_filters)
    c2 = conv_bn(c1, (3, 3), num_filters, bn_init_scale=1)
    p = c2 + input
    return relu(p)

def resnet_basic_inc(input, num_filters, strides=(2, 2)):
    c1 = conv_bn_relu(input, (3, 3), num_filters, strides)
    c2 = conv_bn(c1, (3, 3), num_filters, bn_init_scale=1)
    s = conv_bn(input, (1, 1), num_filters, strides) # Shortcut
    p = c2 + s
    return relu(p)

def resnet_basic_stack(input, num_stack_layers, num_filters): 
    assert(num_stack_layers >= 0)
    l = input
    for _ in range(num_stack_layers):
        l = resnet_basic(l, num_filters)
    return l

def resnet_bottleneck(input, out_num_filters, inter_out_num_filters):
    c1 = conv_bn_relu(input, (1, 1), inter_out_num_filters)
    c2 = conv_bn_relu(c1, (3, 3), inter_out_num_filters)
    c3 = conv_bn(c2, (1, 1), out_num_filters, bn_init_scale=0)
    p = c3 + input
    return relu(p)

def resnet_bottleneck_inc(input, out_num_filters, inter_out_num_filters, stride1x1, stride3x3):
    c1 = conv_bn_relu(input, (1, 1), inter_out_num_filters, strides=stride1x1)
    c2 = conv_bn_relu(c1, (3, 3), inter_out_num_filters, strides=stride3x3)
    c3 = conv_bn(c2, (1, 1), out_num_filters, bn_init_scale=0)
    stride = np.multiply(stride1x1, stride3x3)
    s = conv_bn(input, (1, 1), out_num_filters, strides=stride) # Shortcut
    p = c3 + s
    return relu(p)

def resnet_bottleneck_stack(input, num_stack_layers, out_num_filters, inter_out_num_filters): 
    assert(num_stack_layers >= 0)
    l = input
    for _ in range(num_stack_layers):
        l = resnet_bottleneck(l, out_num_filters, inter_out_num_filters)
    return l

# Defines the residual network model for classifying images
def create_cifar10_model(input, num_stack_layers, num_classes):
    c_map = [16, 32, 64]

    conv = conv_bn_relu(input, (3, 3), c_map[0])
    r1 = resnet_basic_stack(conv, num_stack_layers, c_map[0])

    r2_1 = resnet_basic_inc(r1, c_map[1])
    r2_2 = resnet_basic_stack(r2_1, num_stack_layers-1, c_map[1])

    r3_1 = resnet_basic_inc(r2_2, c_map[2])
    r3_2 = resnet_basic_stack(r3_1, num_stack_layers-1, c_map[2])

    # Global average pooling and output
    pool = AveragePooling(filter_shape=(8, 8), name='final_avg_pooling')(r3_2)
    z = Dense(num_classes, init=normal(0.01))(pool)
    return z

def create_imagenet_model_basic(input, num_stack_layers, num_classes):
    c_map = [64, 128, 256, 512]

    conv = conv_bn_relu(input, (7, 7), c_map[0], strides=(2, 2))
    pool1 = MaxPooling((3, 3), strides=(2, 2), pad=True)(conv)
    r1 = resnet_basic_stack(pool1, num_stack_layers[0], c_map[0])

    r2_1 = resnet_basic_inc(r1, c_map[1])
    r2_2 = resnet_basic_stack(r2_1, num_stack_layers[1], c_map[1])

    r3_1 = resnet_basic_inc(r2_2, c_map[2])
    r3_2 = resnet_basic_stack(r3_1, num_stack_layers[2], c_map[2])

    r4_1 = resnet_basic_inc(r3_2, c_map[3])
    r4_2 = resnet_basic_stack(r4_1, num_stack_layers[3], c_map[3])

    # Global average pooling and output
    pool = AveragePooling(filter_shape=(7, 7), name='final_avg_pooling')(r4_2)
    z = Dense(num_classes, init=normal(0.01))(pool)
    return z

def create_imagenet_model_bottleneck(input, num_stack_layers, num_classes, stride1x1, stride3x3):
    c_map = [64, 128, 256, 512, 1024, 2048]

    # conv1 and max pooling
    conv1 = conv_bn_relu(input, (7, 7), c_map[0], strides=(2, 2))
    pool1 = MaxPooling((3,3), strides=(2,2), pad=True)(conv1)

    # conv2_x
    r2_1 = resnet_bottleneck_inc(pool1, c_map[2], c_map[0], (1, 1), (1, 1))
    r2_2 = resnet_bottleneck_stack(r2_1, num_stack_layers[0], c_map[2], c_map[0])

    # conv3_x
    r3_1 = resnet_bottleneck_inc(r2_2, c_map[3], c_map[1], stride1x1, stride3x3)
    r3_2 = resnet_bottleneck_stack(r3_1, num_stack_layers[1], c_map[3], c_map[1])

    # conv4_x
    r4_1 = resnet_bottleneck_inc(r3_2, c_map[4], c_map[2], stride1x1, stride3x3)
    r4_2 = resnet_bottleneck_stack(r4_1, num_stack_layers[2], c_map[4], c_map[2])

    # conv5_x
    r5_1 = resnet_bottleneck_inc(r4_2, c_map[5], c_map[3], stride1x1, stride3x3)
    r5_2 = resnet_bottleneck_stack(r5_1, num_stack_layers[3], c_map[5], c_map[3])

    # Global average pooling and output
    pool = AveragePooling(filter_shape=(7, 7), name='final_avg_pooling')(r5_2)
    z = Dense(num_classes, init=normal(0.01))(pool)
    return z


Then we need to train this ResNet

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See file in the project root
# for full license information.
# ==============================================================================

from __future__ import print_function
import os
import argparse
import cntk as C
import numpy as np

from cntk import cross_entropy_with_softmax, classification_error, reduce_mean
from cntk import Trainer, cntk_py
from import MinibatchSource, ImageDeserializer, StreamDef, StreamDefs
from cntk.learners import momentum_sgd, learning_parameter_schedule_per_sample, momentum_schedule
from cntk.debugging import *
from cntk.logging import *
from resnet_models import *
import as xforms

# Paths relative to current python file.
abs_path   = os.path.dirname(os.path.abspath(__file__))
data_path  = os.path.join(abs_path, "..", "..", "..", "DataSets", "CIFAR-10")

# model dimensions
image_height = 32
image_width  = 32
num_channels = 3 # RGB
num_classes  = 10

# Define the reader for both training and evaluation action.
def create_image_mb_source(map_file, mean_file, train, total_number_of_samples):
    if not os.path.exists(map_file) or not os.path.exists(mean_file):
        raise RuntimeError("File '%s' or '%s' does not exist. Please run from DataSets/CIFAR-10 to fetch them" %
                           (map_file, mean_file))

    # transformation pipeline for the features has jitter/crop only when training
    transforms = []
    if train:
        transforms += [
            xforms.crop(crop_type='randomside', side_ratio=(0.8, 1.0), jitter_type='uniratio') # train uses jitter
    transforms += [
        xforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'),
    # deserializer
    return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
        features=StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
        labels=StreamDef(field='label', shape=num_classes))),     # and second as 'label'

# Train and evaluate the network.
def train_and_evaluate(reader_train, reader_test, network_name, epoch_size, max_epochs, profiler_dir=None,
                       model_dir=None, log_dir=None, tensorboard_logdir=None, gen_heartbeat=False, fp16=False):


    # Input variables denoting the features and label data
    input_var = C.input_variable((num_channels, image_height, image_width), name='features')
    label_var = C.input_variable((num_classes))

    dtype = np.float16 if fp16 else np.float32
    if fp16:
        graph_input = C.cast(input_var, dtype=np.float16)
        graph_label = C.cast(label_var, dtype=np.float16)
        graph_input = input_var
        graph_label = label_var

    with C.default_options(dtype=dtype):
        # create model, and configure learning parameters
        if network_name == 'resnet20':
            z = create_cifar10_model(graph_input, 3, num_classes)
            lr_per_mb = [1.0]*80 + [0.1]*40 + [0.01]
        elif network_name == 'resnet110':
            z = create_cifar10_model(graph_input, 18, num_classes)
            lr_per_mb = [0.1]*1 + [1.0]*80 + [0.1]*40 + [0.01]
            raise RuntimeError("Unknown model name!")

        # loss and metric
        ce = cross_entropy_with_softmax(z, graph_label)
        pe = classification_error(z, graph_label)

    if fp16:
        ce = C.cast(ce, dtype=np.float32)
        pe = C.cast(pe, dtype=np.float32)

    # shared training parameters
    minibatch_size = 128
    l2_reg_weight = 0.0001

    # Set learning parameters
    lr_per_sample = [lr/minibatch_size for lr in lr_per_mb]
    lr_schedule = learning_parameter_schedule_per_sample(lr_per_sample, epoch_size=epoch_size)
    mm_schedule = momentum_schedule(0.9, minibatch_size)

    # progress writers
    progress_writers = [ProgressPrinter(tag='Training', log_to_file=log_dir, num_epochs=max_epochs, gen_heartbeat=gen_heartbeat)]
    tensorboard_writer = None
    if tensorboard_logdir is not None:
        tensorboard_writer = TensorBoardProgressWriter(freq=10, log_dir=tensorboard_logdir, model=z)

    # trainer object
    learner = momentum_sgd(z.parameters, lr_schedule, mm_schedule,
    trainer = Trainer(z, (ce, pe), learner, progress_writers)

    # define mapping from reader streams to network inputs
    input_map = {
        input_var: reader_train.streams.features,
        label_var: reader_train.streams.labels

    log_number_of_parameters(z) ; print()

    # perform model training
    if profiler_dir:
        start_profiler(profiler_dir, True)

    for epoch in range(max_epochs):       # loop over epochs
        sample_count = 0
        while sample_count < epoch_size:  # loop over minibatches in the epoch
            data = reader_train.next_minibatch(min(minibatch_size, epoch_size-sample_count), input_map=input_map) # fetch minibatch.
            trainer.train_minibatch(data)                                   # update model with it
            sample_count += trainer.previous_minibatch_sample_count         # count samples processed so far


        # Log mean of each parameter tensor, so that we can confirm that the parameters change indeed.
        if tensorboard_writer:
            for parameter in z.parameters:
                tensorboard_writer.write_value(parameter.uid + "/mean", reduce_mean(parameter).eval(), epoch)

        if model_dir:
  , network_name + "_{}.dnn".format(epoch)))
        enable_profiler() # begin to collect profiler data after first epoch

    if profiler_dir:

    # Evaluation parameters
    test_epoch_size = 10000
    minibatch_size = 16

    # process minibatches and evaluate the model
    metric_numer = 0
    metric_denom = 0
    sample_count = 0

    while sample_count < test_epoch_size:
        current_minibatch = min(minibatch_size, test_epoch_size - sample_count)
        # Fetch next test min batch.
        data = reader_test.next_minibatch(current_minibatch, input_map=input_map)
        # minibatch data to be trained with
        metric_numer += trainer.test_minibatch(data) * current_minibatch
        metric_denom += current_minibatch
        # Keep track of the number of samples processed so far.
        sample_count += data[label_var].num_samples


    return metric_numer/metric_denom

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-n', '--network', help='network type, resnet20 or resnet110', required=False, default='resnet20')
    parser.add_argument('-e', '--epochs', help='total epochs', type=int, required=False, default='160')
    parser.add_argument('-es', '--epoch_size', help='Size of epoch in samples', type=int, required=False, default='50000')
    parser.add_argument('-p', '--profiler_dir', help='directory for saving profiler output', required=False, default=None)
    parser.add_argument('-tensorboard_logdir', '--tensorboard_logdir', help='Directory where TensorBoard logs should be created', required=False, default=None)
    parser.add_argument('-datadir', '--datadir', help='Data directory where the CIFAR dataset is located', required=False, default=data_path)
    parser.add_argument('-outputdir', '--outputdir', help='Output directory for checkpoints and models', required=False, default=None)
    parser.add_argument('-logdir', '--logdir', help='Log file', required=False, default=None)
    parser.add_argument('-genheartbeat', '--genheartbeat', help="Turn on heart-beat for philly", action='store_true', default=False)
    parser.add_argument('-fp16', '--fp16', help="use float16", action='store_true', default=False)

    args = vars(parser.parse_args())
    epochs = args['epochs']
    epoch_size = args['epoch_size']
    network_name = args['network']

    model_dir = args['outputdir']
    if not model_dir:
        model_dir = os.path.join(abs_path, "Models")

    data_path = args['datadir']

    reader_train = create_image_mb_source(os.path.join(data_path, 'train_map.txt'), os.path.join(data_path, 'CIFAR-10_mean.xml'), True, total_number_of_samples=epochs * epoch_size)
    reader_test = create_image_mb_source(os.path.join(data_path, 'test_map.txt'), os.path.join(data_path, 'CIFAR-10_mean.xml'), False,

    train_and_evaluate(reader_train, reader_test, network_name, epoch_size, epochs, args['profiler_dir'], model_dir,
                       args['logdir'], args['tensorboard_logdir'], gen_heartbeat=args['genheartbeat'], fp16=args['fp16'])

8. Conclusion.

In this article, we have shown how to design a deep learning ResNet to classify 10 classes in the Cifar-10 dataset in DLHUB. Compared to similar Python code with Microsoft CNTK deep learning framework, the DLHUB is much simpler and easier to use. The trained model can be exported directly into supported programming languages such as LabVIEW or C# for real-time application/deployment.

Related Blogs

Keep in touch with the latest blogs & news.

Be the first to post a comment.

Please login to add your comment.

ANSCENTER provides elegant solutions to simplify machine learning and deep learning design and deployment for any applications.

We support multiple hardware platforms and programming languages, including LabVIEW, LabVIEW NXG, LabWindow CVI, C/C++/C#, and Arduino.