Fruit Recognition Using Transfer Learning

This article provides step by step tutorial to design a deep learning neural network to classify 4 different types of fruits with small dataset using Transfer Learning technique..

Tuan Nghia Nguyen · 6/28/2019 6:23:21 PM

Fruit Recognition Using Transfer Learning

Fruit_Transfer_Learning.jpg

 

Deep learning has become a powerful technique for pattern recognition and regression problems and a Convolution Neural Network has proven to be the most effective neural network structure for image recognition. However, deep learning often requires a very big dataset to achieve reasonable accuracy and performance.  Moreover, because of the complexity in deep learning architecture which contains a lot of layers, the training process often is very time-consuming.  We all know that in deep learning architecture, the last few layers often act as the main classifier, and previous layers act as a feature extractor that extracts important information of the dataset as features to feed to the main classifier.

Traditionally, in machine learning applications, feature extraction and classification often are separated into two processes, and feature extraction is always a difficult task as it depends on dataset structure, and this feature extractor' parameters are always fixed. In deep learning architecture, these two processes are combined in one deep learning model and feature extractor's parameters are adjusted during the training process. 

Thanks to that deep learning architecture, the transfer learning technique is introduced to solve the small dataset and training time problem. The idea is to use an existing deep learning model that has been trained for a similar problem and extract all the layers from the input layer to nearly last few layers to be used as a feature extractor.  New output layers will be added in this feature extractor for a certain classification task, and these output layers will form a classifier that will be trained on the training process. Note that the feature extractor will not be retrained, and its parameters are fixed. For more information on the transfer learning technique, please visit https://cntk.ai/pythondocs/CNTK_301_Image_Recognition_with_Deep_Transfer_Learning.html.

The purpose of this article is to demonstrate how easy to design deep learning using the transfer learning technique for fruit recognition applications.

 

1. Prepare a fruit dataset.

In this article, we are going to design a deep learning neural network that can classify 4 different types of fruits: Apple, Orange, Kiwi, and Grape. First, we need to prepare datasets for training and testing process. You can download fruit images from the internet, but we have prepared a small fruit dataset that contains only 10 images for each type of fruit for both training and testing data. This dataset can be download here.

 

Once this dataset is downloaded, please use unzip software to extract it and place it into DLHUBData\Examples folder shown as bellows:

Fruit_Dataset_For_Transfer_Learning.jpg

2. Download pre-trained deep learning model 

As stated above, the transfer learning technique required a pre-trained model. There are a number of websites that host popular pre-trained models such as AlexNet, ResNet, and VGG, and you can manually download these pre-trained models for this application. Microsft has provided pre-trained model images hosted on their website (https://github.com/microsoft/CNTK/blob/master/PretrainedModels/Image.md). However, DLHUB supports downloading popular pre-trained models when it cannot detect any pre-trained models in (C:\DLHUBData\PretrainedModels). It is done automatically when you first launch DLHUB, and it will ask you to download pre-train models for transfer learning shown as bellows:

Download_Pre_trained_Models_In_DLHUB.jpg

 

DLHUB will download required pre-trained models in parallel, and this process is only required one time when DLHUB is first used. After that, DLHUB will detect downloaded models in the correct folder location and use them.

 

3. Load fruit data-set (training set) into DLHUB.

The fruit dataset structure in this example is simply a folder structure with sub-folders act as fruit category. Each sub-folder contains images for this category as follows:

Fruit_Datastet_Structure.jpg

 

DLHUB supports this simple data structure, and loading this data-set into DLHUB is very simple: First, browse to the correct folder (1) and select the current folder that includes all sub-folders (2), then the select pre-trained model to be used (3) before proceeding to model configuration page.

In image classification applications, the image augmentation technique is often used during the training process to improve accuracy, and DLHUB supports this technique by simply ticking Image Augmentation option (3). More information on how image augmentation works, please visit this website.  

 

Load_Fruit_Dataset_Into_DLHUB.jpg

 

4. Configure a deep learning neural network that uses a pre-trained model in DLHUB.

Configure a deep learning neural network structure in DLHUB is so simple as all the hard work has been done inside the DLHUB engine to simplify the designing process. First, a TLModel layer is selected in the Select Functions palette (1) and be configured correctly (2). This pre-trained model is frozen that means its parameters will be reserved during the training process, and its output layer is removed. The new output layer is then added in the pre-trained model to form a new deep learning model for this fruit application. In this example, we simply use only 1 fully connected layer (or Dense layer) with 4 output nodes to classify 4 different fruit types. The linear activation function is figured for this Dense layer. The obtained deep learning model is shown in (3). The model is then can be verified by hitting the verify model button (4) before it can proceed to the next training page.

Configure_Deep_learning_In_DLHUB.jpg

   

To demonstrate how simple to configure deep learning model using transfer learning technique in DLHUB, a Python code that uses Microsoft CNTK engine with transfer learning to classify flowers shown as bellows:

# Copyright (c) Microsoft. All rights reserved.

# Licensed under the MIT license. See LICENSE.md file in the project root
# for full license information.
# ==============================================================================

from __future__ import print_function
import numpy as np
import cntk as C
import os
from PIL import Image
from cntk.device import try_set_default_device, gpu
from cntk import load_model, placeholder, Constant
from cntk import Trainer
from cntk.logging.graph import find_by_name, get_node_outputs
from cntk.io import MinibatchSource, ImageDeserializer, StreamDefs, StreamDef
import cntk.io.transforms as xforms
from cntk.layers import Dense
from cntk.learners import momentum_sgd, learning_parameter_schedule, momentum_schedule
from cntk.ops import combine, softmax
from cntk.ops.functions import CloneMethod
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.logging import log_number_of_parameters, ProgressPrinter


################################################
################################################
# general settings
make_mode = False
freeze_weights = False
base_folder = os.path.dirname(os.path.abspath(__file__))
tl_model_file = os.path.join(base_folder, "Output", "TransferLearning.model")
output_file = os.path.join(base_folder, "Output", "predOutput.txt")
features_stream_name = 'features'
label_stream_name = 'labels'
new_output_node_name = "prediction"

# Learning parameters
max_epochs = 20
mb_size = 50
lr_per_mb = [0.2]*10 + [0.1]
momentum_per_mb = 0.9
l2_reg_weight = 0.0005

# define base model location and characteristics
_base_model_file = os.path.join(base_folder, "..", "..", "..", "PretrainedModels", "ResNet18_ImageNet_CNTK.model")
_feature_node_name = "features"
_last_hidden_node_name = "z.x"
_image_height = 224
_image_width = 224
_num_channels = 3

# define data location and characteristics
_data_folder = os.path.join(base_folder, "..", "DataSets", "Flowers")
_train_map_file = os.path.join(_data_folder, "6k_img_map.txt")
_test_map_file = os.path.join(_data_folder, "1k_img_map.txt")
_num_classes = 102
################################################
################################################


# Creates a minibatch source for training or testing
def create_mb_source(map_file, image_width, image_height, num_channels, num_classes, randomize=True):
    transforms = [xforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear')] 
    return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
            features =StreamDef(field='image', transforms=transforms),
            labels   =StreamDef(field='label', shape=num_classes))),
            randomize=randomize)


# Creates the network model for transfer learning
def create_model(base_model_file, feature_node_name, last_hidden_node_name, num_classes, input_features, freeze=False):
    # Load the pretrained classification net and find nodes
    base_model   = load_model(base_model_file)
    feature_node = find_by_name(base_model, feature_node_name)
    last_node    = find_by_name(base_model, last_hidden_node_name)

    # Clone the desired layers with fixed weights
    cloned_layers = combine([last_node.owner]).clone(
        CloneMethod.freeze if freeze else CloneMethod.clone,
        {feature_node: placeholder(name='features')})

    # Add new dense layer for class prediction
    feat_norm  = input_features - Constant(114)
    cloned_out = cloned_layers(feat_norm)
    z          = Dense(num_classes, activation=None, name=new_output_node_name) (cloned_out)

    return z


# Trains a transfer learning model
def train_model(base_model_file, feature_node_name, last_hidden_node_name,
                image_width, image_height, num_channels, num_classes, train_map_file,
                num_epochs, max_images=-1, freeze=False):
    epoch_size = sum(1 for line in open(train_map_file))
    if max_images > 0:
        epoch_size = min(epoch_size, max_images)

    # Create the minibatch source and input variables
    minibatch_source = create_mb_source(train_map_file, image_width, image_height, num_channels, num_classes)
    image_input = C.input_variable((num_channels, image_height, image_width))
    label_input = C.input_variable(num_classes)

    # Define mapping from reader streams to network inputs
    input_map = {
        image_input: minibatch_source[features_stream_name],
        label_input: minibatch_source[label_stream_name]
    }

    # Instantiate the transfer learning model and loss function
    tl_model = create_model(base_model_file, feature_node_name, last_hidden_node_name, num_classes, image_input, freeze)
    ce = cross_entropy_with_softmax(tl_model, label_input)
    pe = classification_error(tl_model, label_input)

    # Instantiate the trainer object
    lr_schedule = learning_parameter_schedule(lr_per_mb)
    mm_schedule = momentum_schedule(momentum_per_mb)
    learner = momentum_sgd(tl_model.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
    progress_printer = ProgressPrinter(tag='Training', num_epochs=num_epochs)
    trainer = Trainer(tl_model, (ce, pe), learner, progress_printer)

    # Get minibatches of images and perform model training
    print("Training transfer learning model for {0} epochs (epoch_size = {1}).".format(num_epochs, epoch_size))
    log_number_of_parameters(tl_model)
    for epoch in range(num_epochs):       # loop over epochs
        sample_count = 0
        while sample_count < epoch_size:  # loop over minibatches in the epoch
            data = minibatch_source.next_minibatch(min(mb_size, epoch_size-sample_count), input_map=input_map)
            trainer.train_minibatch(data)                                    # update model with it
            sample_count += trainer.previous_minibatch_sample_count          # count samples processed so far
            if sample_count % (100 * mb_size) == 0:
                print ("Processed {0} samples".format(sample_count))

        trainer.summarize_training_progress()

    return tl_model


# Evaluates a single image using the provided model
def eval_single_image(loaded_model, image_path, image_width, image_height):
    # load and format image (resize, RGB -> BGR, CHW -> HWC)
    img = Image.open(image_path)
    if image_path.endswith("png"):
        temp = Image.new("RGB", img.size, (255, 255, 255))
        temp.paste(img, img)
        img = temp
    resized = img.resize((image_width, image_height), Image.ANTIALIAS)
    bgr_image = np.asarray(resized, dtype=np.float32)[..., [2, 1, 0]]
    hwc_format = np.ascontiguousarray(np.rollaxis(bgr_image, 2))

    ## Alternatively: if you want to use opencv-python
    # cv_img = cv2.imread(image_path)
    # resized = cv2.resize(cv_img, (image_width, image_height), interpolation=cv2.INTER_NEAREST)
    # bgr_image = np.asarray(resized, dtype=np.float32)
    # hwc_format = np.ascontiguousarray(np.rollaxis(bgr_image, 2))

    # compute model output
    arguments = {loaded_model.arguments[0]: [hwc_format]}
    output = loaded_model.eval(arguments)

    # return softmax probabilities
    sm = softmax(output[0])
    return sm.eval()


# Evaluates an image set using the provided model
def eval_test_images(loaded_model, output_file, test_map_file, image_width, image_height, max_images=-1, column_offset=0):
    num_images = sum(1 for line in open(test_map_file))
    if max_images > 0:
        num_images = min(num_images, max_images)
    print("Evaluating model output node '{0}' for {1} images.".format(new_output_node_name, num_images))

    pred_count = 0
    correct_count = 0
    np.seterr(over='raise')
    with open(output_file, 'wb') as results_file:
        with open(test_map_file, "r") as input_file:
            for line in input_file:
                tokens = line.rstrip().split('\t')
                img_file = tokens[0 + column_offset]
                probs = eval_single_image(loaded_model, img_file, image_width, image_height)

                pred_count += 1
                true_label = int(tokens[1 + column_offset])
                predicted_label = np.argmax(probs)
                if predicted_label == true_label:
                    correct_count += 1

                np.savetxt(results_file, probs[np.newaxis], fmt="%.3f")
                if pred_count % 100 == 0:
                    print("Processed {0} samples ({1} correct)".format(pred_count, (float(correct_count) / pred_count)))
                if pred_count >= num_images:
                    break

    print ("{0} out of {1} predictions were correct {2}.".format(correct_count, pred_count, (float(correct_count) / pred_count)))


if __name__ == '__main__':
    try_set_default_device(gpu(0))
    # check for model and data existence
    if not (os.path.exists(_base_model_file) and os.path.exists(_train_map_file) and os.path.exists(_test_map_file)):
        print("Please run 'python install_data_and_model.py' first to get the required data and model.")
        exit(0)

    # You can use the following to inspect the base model and determine the desired node names
    # node_outputs = get_node_outputs(load_model(_base_model_file))
    # for out in node_outputs: print("{0} {1}".format(out.name, out.shape))

    # Train only if no model exists yet or if make_mode is set to False
    if os.path.exists(tl_model_file) and make_mode:
        print("Loading existing model from %s" % tl_model_file)
        trained_model = load_model(tl_model_file)
    else:
        trained_model = train_model(_base_model_file, _feature_node_name, _last_hidden_node_name,
                                    _image_width, _image_height, _num_channels, _num_classes, _train_map_file,
                                    max_epochs, freeze=freeze_weights)
        trained_model.save(tl_model_file)
        print("Stored trained model at %s" % tl_model_file)

    # Evaluate the test set
    eval_test_images(trained_model, output_file, _test_map_file, _image_width, _image_height)

    print("Done. Wrote output to %s" % output_file)

 

5. Train a deep learning neural network that uses a pre-trained model in DLHUB.

Training the deep learning model in DLHUB is a very simple process. First, a training algorithm and its parameters are configured (1) and stopping criteria is specified (2), then we can start the training process by clicking on the Start Training button (3). If the host computer supports GPU, it will be used during the training process to improve training speed. As we can see, thanks for the transfer learning technique, the training is super fast as only last laster is trained on the small datasets. It takes only less than 7 seconds to complete the training. 

 

Train_Deep_Learning_Model_In_DLHUB.jpg 

 

6. Evaluate a deep learning neural network that uses a pre-trained model in DLHUB.

After the deep learning model is being trained, we can verify its generalization (the ability to correctly classify new dataset) by evaluating it on the test dataset which is not used during the training process. The evaluation can be done simply by browsing to test dataset (1), performing evaluation (2), and verifying an accuracy result (3).  

Evaluate_Fruit_dataset_In_DLHUB.jpg

6. Test the trained deep learning model in DLHUB.

Before deciding to use a trained model in production, the user can test how this trained model performs with a new dataset. That process gives the user look and feel how the trained model behaves. This is can be easily done in the built-in test interface by choosing the test dataset folder (1), selecting an image to be evaluated (2), visualizing the selected image (3), and confirming classification results (4). 

 

Test_Fruit_Dataset_in_DLHUB.jpg 

7. Export the trained deep learning model in DLHUB for deployment.

Once the trained model has been evaluated, tested and verified, it can be exported directly into supported programming languages for real-time applications/deployment.

Export_Trained_Model_For_Production_In_DLHUB.jpg

 

 

Conclusion

In this article, we have demonstrated the capability of the DLHUB software for the fruit recognition application using the transfer learning technique. All heavy works have been done in the DLHUB engine to simplify a deep learning design process. Just only a few clicks, a proper deep learning model using the transfer technique has been constructed on a very small training dataset and high accuracy is also achieved on a test dataset (97.5%). The training speed is incredibly fast with less than 7 seconds.

 

 

Related Blogs

Keep in touch with the latest blogs & news.

Be the first to post a comment.

Please login to add your comment.

ANSCENTER provides elegant solutions to simplify machine learning and deep learning design and deployment for any applications.

We support multiple hardware platforms and programming languages, including LabVIEW, LabVIEW NXG, LabWindow CVI, C/C++/C#, and Arduino.