Digit Recognition: Kaggle Challenge

Digit recognition is a classic problem in the field of computer vision, with a wide range of practical applications such as optical character recognition (OCR), handwriting recognition, and digit-based security systems. The challenge in digit recognition is to train a machine learning model that can accurately classify handwritten digits from a given dataset.

Recently, the Kaggle Digit Recognizer competition has gained widespread attention from machine learning enthusiasts and practitioners alike. In this competition, participants are tasked with building a model that can correctly identify handwritten digits from a dataset of tens of thousands of training images. The competition dataset is a subset of the larger MNIST database, which is widely used as a benchmark for digit recognition models.

In this blog post, we will explore the process of building a winning solution for the Kaggle Digit Recognizer competition. We will walk through the key steps involved in developing a high-performance digit recognition model, from data preprocessing and feature engineering to model selection and hyperparameter tuning. We will also dive into the details of the winning model architecture and explain the strategies that helped it achieve top results.

By the end of this blog post, you will have a solid understanding of how to approach the digit recognition problem and the tools and techniques required to build a high-performance model. So let’s dive in!

Loading Data
Python
def data_Preprocessing():
    # load the data
    train = pd.read_csv("data/train.csv")
    test = pd.read_csv("data/test.csv")

    # Drop 'label' column
    X_train = train.drop(labels = ["label"],axis = 1)
    Y_train = train["label"]
    X_test = test.values
    # free some space
    del train 

    # normalize the data
    X_train = X_train.values.reshape(-1,28,28,1)/255.0
    X_test = X_test.reshape(-1,28,28,1)/255.0

    # convert to one-hot-encoding
    Y_train = to_categorical(Y_train, num_classes=10)

    # use 10% of data for testing and 90% fro training
    X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=2)
    return Y_train,X_train,X_val,Y_val,X_test

This code defines a function called data_Preprocessing() that performs several preprocessing steps on the digit recognition data before feeding it to a machine learning model. Let’s walk through the steps one by one:

  1. Load the data: The function reads in the training and test datasets from CSV files in a local directory using pandas’ read_csv() method.
  2. Drop the “label” column: Since the label column represents the ground truth labels for each image, it is separated from the features and stored in a separate variable called Y_train.
  3. Normalize the data: The pixel values of the images in the dataset are in the range of 0 to 255. To normalize these values, the function divides each pixel value by 255. This scales the pixel values to the range of 0 to 1, making it easier for the machine learning algorithm to learn from the data.
  4. Reshape the data: The images in the dataset are originally stored as a 1D array of 784 (28×28) pixel values. To represent the images in a more suitable format for the model, the function reshapes each image into a 28x28x1 3D matrix.
  5. Convert labels to one-hot encoding: The function converts the label values from a single integer representing the digit to a one-hot encoded vector of length 10, where each element of the vector represents one possible digit.
  6. Split the data: The function splits the training data into training and validation sets using scikit-learn’s train_test_split() method. The test set is not split from the original data as it is provided separately. The validation set is used during training to evaluate the model’s performance on unseen data and prevent overfitting.
  7. Return the processed data: The function returns the processed data in the order of Y_train, X_train, X_val, Y_val, and X_test.

By performing these preprocessing steps, the data is now in a suitable format for training a machine learning model for the digit recognition task.

Data Augmentation

Data augmentation is a common technique used in computer vision to artificially increase the size of a training dataset by applying various transformations to the original images. This helps to improve the model’s ability to generalize to new data and reduce overfitting.

Python
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images

In this code, the ImageDataGenerator() function from the keras.preprocessing.image module is used to apply various data augmentation techniques to the training dataset. The purpose of each parameter is as follows:

  • featurewise_center: This parameter sets the mean of the input to 0 over the dataset.
  • samplewise_center: This parameter sets the mean of each sample to 0.
  • featurewise_std_normalization: This parameter divides the input by the standard deviation of the dataset.
  • samplewise_std_normalization: This parameter divides each input by its standard deviation.
  • zca_whitening: This parameter applies ZCA whitening to the input.
  • rotation_range: This parameter randomly rotates images in the specified range of degrees (0 to 180).
  • zoom_range: This parameter randomly zooms images by a factor of the specified range.
  • width_shift_range: This parameter randomly shifts images horizontally by a fraction of the total width.
  • height_shift_range: This parameter randomly shifts images vertically by a fraction of the total height.
  • horizontal_flip: This parameter randomly flips images horizontally.
  • vertical_flip: This parameter randomly flips images vertically.

All of these parameters serve to augment the training data and increase its variability, making it more difficult for the model to overfit to the training set. By randomly applying these transformations during training, the model learns to be robust to variations in the input data.

The resulting ImageDataGenerator object can then be passed to a Keras model during training using the fit_generator() method.

Define Mode

The model is created using the Keras Sequential API. It starts with three convolutional layers, each followed by a PReLU activation function and a batch normalization layer. The first convolutional layer has 32 filters of size 3×3, and takes an input shape of 28x28x1 (i.e., a grayscale image of size 28×28). The second and third convolutional layers have 32 filters of size 3×3 and 5×5, respectively, with the third layer also using a stride of 1 and “same” padding. A max pooling layer with pool size 2 and stride 1 is added after the third convolutional layer, followed by a PReLU activation function and a batch normalization layer. A dropout layer with rate 0.4 is then added to help prevent overfitting.

The next three layers are similar to the previous three, but with 64, 128, and 256 filters, respectively. The second and third convolutional layers of these three layers also have stride 1 and “same” padding, and there is another max pooling layer after the third convolutional layer. The final layer is a dense layer with 10 units and a softmax activation function, which produces the probabilities for each of the 10 possible digits (0-9).

The kernel_regularizer argument in the last layer applies L2 regularization to the kernel weights of the layer, with a regularization strength of 0.01.

Finally, the function returns the compiled model for later use in training and testing.

Python
def model_Generator():
    model = Sequential()
    model.add(Conv2D(32, (3, 3),  input_shape=(28, 28, 1)))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Conv2D(32, (3, 3)))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Conv2D(32, (5, 5), strides=1, padding='same'))
    model.add(MaxPooling2D(pool_size = 2, strides = 1))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Dropout(0.4))

    model.add(Conv2D(64, (3, 3)))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Conv2D(64, (3, 3)))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Conv2D(64, (5, 5), strides=1, padding='same'))
    model.add(MaxPooling2D(pool_size = 2, strides = 1))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Dropout(0.4))

    model.add(Conv2D(128, (4, 4)))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Flatten())
    model.add(Dropout(0.4))
    model.add(Dense(10, activation="softmax", kernel_regularizer= tf.keras.regularizers.l2(0.01)))
    return model
Compile the Mode
Python
model.compile(optimizer = 'adam' , loss = "categorical_crossentropy", metrics=["accuracy"])

This code compiles the neural network model generated by the model_Generator() function using the Keras API.

The optimizer argument specifies the optimization algorithm to be used during training, in this case the Adam optimizer. The Adam optimizer is a popular optimization algorithm for deep learning models that adapts the learning rate during training and is known for its fast convergence.

The loss argument specifies the loss function to be used during training, in this case categorical cross-entropy. Categorical cross-entropy is commonly used for multi-class classification problems like digit recognition, where each input can belong to one of multiple classes.

The metrics argument specifies the evaluation metric(s) to be used during training and testing, in this case accuracy. The accuracy metric is a common evaluation metric for classification problems and measures the fraction of correctly classified samples out of the total number of samples.

Once the model is compiled, it can be trained on the training data using the fit() method and evaluated on the test data using the evaluate() method.

Prevent Overfitting

This code adds three callbacks to the model training process to improve the model’s performance and prevent overfitting.

Python
callbacks = [
    EarlyStopping(patience=10, restore_best_weights=True),
    ModelCheckpoint("best_model.h5", save_best_only=True),
    ReduceLROnPlateau(factor=0.5, patience=3, min_lr=0.00001, verbose=1)
]

The EarlyStopping callback stops the training process if the validation loss does not improve for a specified number of epochs (patience argument). The restore_best_weights argument ensures that the best weights of the model are used, which are determined based on the validation loss.

The ModelCheckpoint callback saves the best weights of the model during training to a file named “best_model.h5” using the save_best_only argument. This allows the best performing model to be loaded and used later without the need for retraining.

The ReduceLROnPlateau callback reduces the learning rate of the optimizer if the validation loss does not improve for a specified number of epochs (patience argument). The factor argument specifies the factor by which the learning rate is reduced, and the min_lr argument specifies the minimum learning rate allowed. The verbose argument specifies whether to print a message when the learning rate is reduced.

These three callbacks can help to prevent overfitting, improve the generalization of the model, and increase its accuracy.

Apply Data Augmentation

Following code applies data augmentation to the training data before feeding it to the neural network model.

Python
datagen.fit(X_train)

The datagen object is an instance of the ImageDataGenerator class from the Keras API, which provides a way to apply data augmentation to image data. The fit() method of the ImageDataGenerator class calculates the statistics required for data augmentation based on the training data.

Data augmentation is a technique used to artificially increase the size of the training set by creating new versions of the existing images. This can help to improve the model’s ability to generalize to new data by exposing it to a wider range of variations in the training data.

Some common data augmentation techniques include random rotations, horizontal and vertical flips, zooming, and shifting. The specific data augmentation techniques used are specified when creating the ImageDataGenerator object.

After the data augmentation statistics are calculated using datagen.fit(), the actual data augmentation is applied during training using the flow() method of the ImageDataGenerator class. This generates batches of augmented data on the fly during training, rather than storing them all in memory at once.

Training the Model

The goal of training a neural network is to teach it to perform a specific task by adjusting its weights and biases based on examples of input-output pairs, with the ultimate goal of creating a model that can accurately generalize to new, unseen data.

Python
history = model.fit(datagen.flow(X_train,Y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (X_val,Y_val)
                              , callbacks=callbacks)

This code trains the neural network model using the fit() method from the Keras API. The datagen.flow() method is used to generate batches of augmented data during training, with the X_train and Y_train data used as inputs.

The batch_size argument specifies the number of samples per batch. Using batches during training can help to speed up the training process and reduce the memory requirements.

The epochs argument specifies the number of times the entire training dataset is passed through the model during training. Each epoch consists of multiple batches of data being fed through the model.

The validation_data argument specifies the validation dataset, which is used to evaluate the model’s performance after each epoch. This helps to monitor the model’s ability to generalize to new data and prevent overfitting.

The callbacks argument specifies the callbacks to be used during training, which were defined earlier in the code. These callbacks can help to improve the performance of the model and prevent overfitting.

The fit() method returns a history object, which contains information about the training process, such as the loss and accuracy on the training and validation data at each epoch. This object can be used to plot graphs and analyze the performance of the model during training.

Submit the Resullt
Python
# predict results
prediction = model.predict(X_test)

# select the index with the maximum probability
Y_pred = np.argmax(prediction,axis = 1)

results = pd.Series(Y_pred,name="Label")
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)
submission.to_csv("submission.csv",index=False)

The code above is making predictions on the test dataset using the trained neural network model.

The model.predict(X_test) line predicts the probabilities of each class for each test image in the X_test dataset. The np.argmax(prediction,axis = 1) line selects the index of the class with the highest probability for each test image, which will be used as the predicted label.

The predicted labels are then stored in a Pandas Series called results, with the name “Label”. A submission file is created by concatenating the image ID numbers (ranging from 1 to 28,000) with the predicted labels using pd.concat(), and then saving the resulting dataframe as a CSV file called “submission.csv”. The submission file can then be uploaded to the Kaggle competition for evaluation of the model’s performance.

You can download and inspect the implementation in this post in this GitHub repository . Once you’ve reviewed the contents, feel free to share the link with others who may be interested. Happy exploring!