InΒ [1]:
# Install necessary libraries if not already present in Colab environment
!pip install tensorflow scikit-learn matplotlib seaborn numpy
Requirement already satisfied: tensorflow in /usr/local/lib/python3.11/dist-packages (2.18.0)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.6.1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.10.0)
Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (2.0.2)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.4.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.6.3)
Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (25.2.10)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.6.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (18.1.1)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.4.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow) (24.2)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (5.29.5)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.32.3)
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from tensorflow) (75.2.0)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.1.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.14.1)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.2)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.73.1)
Requirement already satisfied: tensorboard<2.19,>=2.18 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.18.0)
Requirement already satisfied: keras>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.8.0)
Requirement already satisfied: h5py>=3.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.14.0)
Requirement already satisfied: ml-dtypes<0.5.0,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.4.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.37.1)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.15.3)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.6.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.58.5)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (11.2.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.11/dist-packages (from seaborn) (2.2.2)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from astunparse>=1.6.0->tensorflow) (0.45.1)
Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (13.9.4)
Requirement already satisfied: namex in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.1.0)
Requirement already satisfied: optree in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.16.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2025.7.9)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.8.2)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard<2.19,>=2.18->tensorflow) (3.0.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.5.0->tensorflow) (0.1.2)
InΒ [3]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# TensorFlow and Keras for CNNs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10 # A common image dataset

# Scikit-learn for preprocessing and evaluation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print(f"TensorFlow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")
print(f"NumPy Version: {np.__version__}")
TensorFlow Version: 2.18.0
Keras Version: 3.8.0
NumPy Version: 2.0.2

Part 1: Data Preparation for CNNs (CIFAR-10 Dataset)ΒΆ

We'll use the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. This dataset is a good step up from MNIST, as images are colored and slightly more complex.

Tasks:

  • Load the CIFAR-10 dataset.
  • Normalize pixel values to the range [0, 1].
  • Convert target labels to one-hot encoded format.
  • Verify data shapes and types.

Data Loading and SplittingΒΆ

own_dataset/
β”‚
β”œβ”€β”€ airplane/
β”‚   β”œβ”€β”€ img001.jpg
β”‚   β”œβ”€β”€ img002.jpg
β”‚   └── ...
β”‚
β”œβ”€β”€ automobile/
β”‚   β”œβ”€β”€ img001.jpg
β”‚   β”œβ”€β”€ img002.jpg
β”‚   └── ...
β”‚
β”œβ”€β”€ .../
  • Counts images in each class folder (e.g., airplane, automobile, etc.) and prints the number of images per class.

  • Splits the dataset into:

    • 80% training set
    • 20% test set
  • Creates lists containing the file paths of images for the training and test sets, along with corresponding numeric labels for each class (e.g., 0 for airplane, 1 for automobile, etc.).

InΒ [Β ]:
# use this when you are using your own dataset

# import os
# import random
# from sklearn.model_selection import train_test_split

# # Set seed for reproducibility
# random.seed(42)

# # Path to your dataset folder containing class folders
# dataset_path = 'path_to_your_dataset_folder'

# # List all class folders (assumed to be the class names)
# class_folders = sorted([f for f in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, f))])

# # Dictionary to hold all image file paths per class
# image_paths_per_class = {}

# for class_name in class_folders:
#     class_folder_path = os.path.join(dataset_path, class_name)
#     images = os.listdir(class_folder_path)
#     image_files = [os.path.join(class_folder_path, f) for f in images if f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]
#     image_paths_per_class[class_name] = image_files
#     print(f"Class '{class_name}' has {len(image_files)} images.")

# # Now split the dataset into train/test sets (80/20 split)
# train_paths = []
# train_labels = []
# test_paths = []
# test_labels = []

# for class_index, class_name in enumerate(class_folders):
#     paths = image_paths_per_class[class_name]

#     # Shuffle paths (optional because train_test_split shuffles anyway)
#     random.shuffle(paths)

#     # Split
#     train, test = train_test_split(paths, test_size=0.2, random_state=42)

#     # Append to global train/test lists with labels
#     train_paths.extend(train)
#     train_labels.extend([class_index] * len(train))

#     test_paths.extend(test)
#     test_labels.extend([class_index] * len(test))

# print(f"Total training images: {len(train_paths)}")
# print(f"Total testing images: {len(test_paths)}")
InΒ [4]:
# 1. Load the CIFAR-10 dataset
(X_train_raw, y_train_raw), (X_test_raw, y_test_raw) = cifar10.load_data()

# Define class names for visualization
cifar10_class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                       'dog', 'frog', 'horse', 'ship', 'truck']

print(f"Original Training data shape: {X_train_raw.shape}") # (50000, 32, 32, 3) -> N, H, W, C
print(f"Original Test data shape: {X_test_raw.shape}")   # (10000, 32, 32, 3)
print(f"Original Training labels shape: {y_train_raw.shape}")
print(f"Original Test labels shape: {y_test_raw.shape}")
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170498071/170498071 ━━━━━━━━━━━━━━━━━━━━ 11s 0us/step
Original Training data shape: (50000, 32, 32, 3)
Original Test data shape: (10000, 32, 32, 3)
Original Training labels shape: (50000, 1)
Original Test labels shape: (10000, 1)
InΒ [5]:
# Display a few sample images
plt.figure(figsize=(10, 5))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train_raw[i])
    plt.title(f"{cifar10_class_names[y_train_raw[i][0]]}")
    plt.axis('off')
plt.suptitle('Sample CIFAR-10 Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
No description has been provided for this image
InΒ [6]:
# 2. Normalize pixel values to [0, 1]
# Image pixel values typically range from 0-255. Normalizing helps training stability.
X_train = X_train_raw.astype('float32') / 255.0
X_test = X_test_raw.astype('float32') / 255.0

print(f"\nNormalized X_train min: {X_train.min()}, max: {X_train.max()}")
Normalized X_train min: 0.0, max: 1.0
InΒ [7]:
# 3. Convert target labels to one-hot encoded format
# This is necessary for multi-class classification with categorical cross-entropy loss.
num_classes = len(cifar10_class_names)
y_train = to_categorical(y_train_raw, num_classes)
y_test = to_categorical(y_test_raw, num_classes)

print(f"One-hot encoded y_train shape: {y_train.shape}")
print(f"One-hot encoded y_test shape: {y_test.shape}")
print(f"First 5 one-hot encoded labels:\n{y_train[:5]}")
One-hot encoded y_train shape: (50000, 10)
One-hot encoded y_test shape: (10000, 10)
First 5 one-hot encoded labels:
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

Discussion Point:

  • Explain why normalizing pixel values (scaling them to [0, 1]) is a common and beneficial preprocessing step for CNNs.
  • Why is one-hot encoding necessary for the target labels when training a multi-class classification CNN with a Softmax output layer?

Part 2: Building and Training a Simple CNNΒΆ

We'll construct a basic CNN model, combining convolutional, pooling, and fully connected layers.

Tasks:

  • Define the CNN model architecture using keras.Sequential, including:
    • Conv2D layers (specifying filters, kernel size, activation, input shape).
    • MaxPooling2D layers.
    • Flatten layer.
    • Dense (fully connected) layers for classification.
  • Compile the model with an appropriate optimizer, loss function, and metrics.
  • Train the model and visualize the training history.

Architecture of a traditional CNN:ΒΆ

Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:

architecture-cnn-en.jpg

The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.

Convolution layer (CONV):ΒΆ

The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input Iwith respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map. convolution-layer-a.png

Remark: the convolution step can be generalized to the 1D and 3D cases as well.

Pooling (POOL)ΒΆ

The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.

Pooling Types in CNNΒΆ

Type Max Pooling Average Pooling
Purpose Each pooling operation selects the maximum value of the current view Each pooling operation averages the values of the current view
Illustration max-pooling-a.png average-pooling-a.png
Comments β€’ Preserves detected features
β€’ Most commonly used
β€’ Downsamples feature map
β€’ Used in LeNet

Fully Connected (FC)ΒΆ

The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores. fully-connected-ltr.png

For More Reading LinkΒΆ

InΒ [8]:
# Define the CNN model architecture
# A sequential model is a linear stack of layers.
cnn_model = Sequential([
    # Input Layer: Must match the shape of our images (32, 32, 3)
    # Conv2D: Applies 2D convolution over images.
    #   - filters: Number of output filters in the convolution.
    #   - kernel_size: Dimensions of the convolution window (e.g., 3x3).
    #   - activation: Rectified Linear Unit (ReLU) is standard for hidden layers.
    #   - input_shape: Only for the first layer, defines the shape of input images.
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),

    # MaxPooling2D: Downsamples the input representation by taking the maximum value over
    # a spatial window (e.g., 2x2). Reduces computational load and helps with translation invariance.
    MaxPooling2D((2, 2)),

    # Second Convolutional Block
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),

    # Third Convolutional Block (optional, can make it deeper)
    Conv2D(64, (3, 3), activation='relu'),

    # Flatten Layer: Converts the 3D output of the convolutional layers (height, width, filters)
    # into a 1D vector to be fed into the fully connected layers.
    Flatten(),

    # Fully Connected Layers: Standard dense layers for classification.
    #   - Dropout: Regularization technique that randomly sets a fraction of input units to 0
    #     at each update during training time, which helps prevent overfitting.
    Dropout(0.5), # Add dropout before the final classification layer
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax') # Output layer: Softmax for multi-class classification
])

# Display the model summary
cnn_model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
β”‚ conv2d (Conv2D)                 β”‚ (None, 30, 30, 32)     β”‚           896 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ max_pooling2d (MaxPooling2D)    β”‚ (None, 15, 15, 32)     β”‚             0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ conv2d_1 (Conv2D)               β”‚ (None, 13, 13, 64)     β”‚        18,496 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ max_pooling2d_1 (MaxPooling2D)  β”‚ (None, 6, 6, 64)       β”‚             0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ conv2d_2 (Conv2D)               β”‚ (None, 4, 4, 64)       β”‚        36,928 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ flatten (Flatten)               β”‚ (None, 1024)           β”‚             0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout (Dropout)               β”‚ (None, 1024)           β”‚             0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense (Dense)                   β”‚ (None, 64)             β”‚        65,600 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_1 (Dense)                 β”‚ (None, 10)             β”‚           650 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 122,570 (478.79 KB)
 Trainable params: 122,570 (478.79 KB)
 Non-trainable params: 0 (0.00 B)
InΒ [9]:
# Visualize the model architecture
# Install graphviz if needed
!pip install graphviz
!apt-get install graphviz

tf.keras.utils.plot_model(cnn_model, show_shapes=True, show_layer_names=True)
Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (0.21)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
graphviz is already the newest version (2.42.2-6ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
Out[9]:
No description has been provided for this image
InΒ [10]:
# Compile the model
# - Optimizer: 'adam' is generally a good default choice.
# - Loss Function: 'categorical_crossentropy' for multi-class classification with one-hot encoded labels.
# - Metrics: 'accuracy' to monitor performance.
cnn_model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

# Train the model
print("\nStarting CNN model training...")
# Use a smaller subset of training data for faster experimentation in Colab,
# or train on full data if you have more time/resources.
# For CIFAR-10, even 10 epochs can take a few minutes on CPU.
history_cnn = cnn_model.fit(X_train, y_train,
                            epochs=20, # Increased epochs slightly for better learning
                            batch_size=64, # Standard batch size
                            validation_split=0.1, # Use 10% of training data for validation
                            verbose=1)          # Display training progress

print("\nCNN training complete.")
Starting CNN model training...
Epoch 1/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 78s 105ms/step - accuracy: 0.2989 - loss: 1.8766 - val_accuracy: 0.5050 - val_loss: 1.3695
Epoch 2/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 80s 114ms/step - accuracy: 0.4979 - loss: 1.3774 - val_accuracy: 0.5334 - val_loss: 1.2891
Epoch 3/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 68s 95ms/step - accuracy: 0.5514 - loss: 1.2479 - val_accuracy: 0.6138 - val_loss: 1.0839
Epoch 4/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 72s 81ms/step - accuracy: 0.6002 - loss: 1.1337 - val_accuracy: 0.6386 - val_loss: 1.0173
Epoch 5/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 83s 82ms/step - accuracy: 0.6273 - loss: 1.0506 - val_accuracy: 0.6686 - val_loss: 0.9494
Epoch 6/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 80s 79ms/step - accuracy: 0.6444 - loss: 1.0099 - val_accuracy: 0.6812 - val_loss: 0.9310
Epoch 7/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 85s 84ms/step - accuracy: 0.6640 - loss: 0.9486 - val_accuracy: 0.6970 - val_loss: 0.8666
Epoch 8/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 57s 81ms/step - accuracy: 0.6751 - loss: 0.9265 - val_accuracy: 0.7092 - val_loss: 0.8358
Epoch 9/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 82s 81ms/step - accuracy: 0.6859 - loss: 0.8797 - val_accuracy: 0.6994 - val_loss: 0.8780
Epoch 10/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 80s 79ms/step - accuracy: 0.6965 - loss: 0.8606 - val_accuracy: 0.7200 - val_loss: 0.8078
Epoch 11/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 56s 79ms/step - accuracy: 0.7044 - loss: 0.8365 - val_accuracy: 0.7198 - val_loss: 0.8166
Epoch 12/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 82s 79ms/step - accuracy: 0.7152 - loss: 0.8109 - val_accuracy: 0.7230 - val_loss: 0.8053
Epoch 13/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 83s 81ms/step - accuracy: 0.7188 - loss: 0.7978 - val_accuracy: 0.7204 - val_loss: 0.8233
Epoch 14/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 57s 81ms/step - accuracy: 0.7222 - loss: 0.7799 - val_accuracy: 0.7362 - val_loss: 0.7693
Epoch 15/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 81s 81ms/step - accuracy: 0.7280 - loss: 0.7674 - val_accuracy: 0.7426 - val_loss: 0.7525
Epoch 16/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 57s 81ms/step - accuracy: 0.7419 - loss: 0.7317 - val_accuracy: 0.7504 - val_loss: 0.7376
Epoch 17/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 82s 80ms/step - accuracy: 0.7439 - loss: 0.7247 - val_accuracy: 0.7478 - val_loss: 0.7400
Epoch 18/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 83s 82ms/step - accuracy: 0.7494 - loss: 0.7052 - val_accuracy: 0.7280 - val_loss: 0.7789
Epoch 19/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 57s 81ms/step - accuracy: 0.7537 - loss: 0.6951 - val_accuracy: 0.7524 - val_loss: 0.7426
Epoch 20/20
704/704 ━━━━━━━━━━━━━━━━━━━━ 82s 81ms/step - accuracy: 0.7569 - loss: 0.6891 - val_accuracy: 0.7422 - val_loss: 0.7813

CNN training complete.
InΒ [16]:
# Save entire model
cnn_model.save('cnn_model.keras')
print(" Model saved to cnn_model.keras")
 Model saved to cnn_model.keras
InΒ [17]:
from tensorflow.keras.models import load_model
# Load the model
cnn_model = load_model('cnn_model.keras')
print(" Model loaded from cnn_model.keras")
 Model loaded from cnn_model.keras
InΒ [18]:
# Retrain the model
history_cnn = cnn_model.fit(X_train, y_train,
                            epochs=1,         # Add more epochs as needed
                            batch_size=64,
                            validation_split=0.1,
                            verbose=1)
704/704 ━━━━━━━━━━━━━━━━━━━━ 57s 80ms/step - accuracy: 0.7560 - loss: 0.6965 - val_accuracy: 0.7482 - val_loss: 0.7403
InΒ [11]:
# Visualize training history (loss and accuracy)
plt.figure(figsize=(12, 5))

# Plot training & validation accuracy values
plt.subplot(1, 2, 1)
plt.plot(history_cnn.history['accuracy'])
plt.plot(history_cnn.history['val_accuracy'])
plt.title('CNN Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history_cnn.history['loss'])
plt.plot(history_cnn.history['val_loss'])
plt.title('CNN Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()
plt.show()
No description has been provided for this image

Discussion Points:

  • Describe the flow of data through the CNN architecture you defined, from the input image to the final prediction. How do the Conv2D, MaxPooling2D, and Flatten layers transform the data's shape?
  • What is the purpose of MaxPooling2D? How does it contribute to translation invariance and computational efficiency?
  • Analyze the training history plots. Does the model appear to be learning effectively? Is there any sign of overfitting (e.g., large gap between training and validation accuracy/loss)?

Part 3: Model Evaluation and PredictionΒΆ

After training, it's crucial to evaluate the model's performance on the unseen test set to assess its generalization capabilities.

Tasks:

  • Evaluate the trained CNN model on the test set using cnn_model.evaluate.
  • Make predictions on a subset of the test data.
  • Convert probabilistic predictions back to class labels.
  • Generate and plot a confusion matrix.
  • Display example misclassified images to understand model errors.
InΒ [19]:
# Evaluate the model on the test set
print("\nEvaluating CNN model on test data...")
test_loss, test_accuracy = cnn_model.evaluate(X_test, y_test, verbose=1)

print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
Evaluating CNN model on test data...
313/313 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.7407 - loss: 0.7654

Test Loss: 0.7672
Test Accuracy: 0.7419
InΒ [20]:
# Make predictions on the test set
y_pred_proba_cnn = cnn_model.predict(X_test)
y_pred_labels_cnn = np.argmax(y_pred_proba_cnn, axis=1) # Convert probabilities to class labels
y_true_labels_cnn = np.argmax(y_test, axis=1) # Convert one-hot true labels back to single integers

print(f"\nFirst 5 predicted labels: {y_pred_labels_cnn[:5].tolist()}")
print(f"First 5 true labels: {y_true_labels_cnn[:5].tolist()}")

# Classification Report
print("\nClassification Report (CNN):")
print(classification_report(y_true_labels_cnn, y_pred_labels_cnn, target_names=cifar10_class_names))

# Confusion Matrix
cm_cnn = confusion_matrix(y_true_labels_cnn, y_pred_labels_cnn)
plt.figure(figsize=(10, 8))
sns.heatmap(cm_cnn, annot=True, fmt='d', cmap='Blues',
            xticklabels=cifar10_class_names, yticklabels=cifar10_class_names)
plt.title('Confusion Matrix (CNN)')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
313/313 ━━━━━━━━━━━━━━━━━━━━ 7s 21ms/step

First 5 predicted labels: [3, 1, 8, 0, 6]
First 5 true labels: [3, 8, 8, 0, 6]

Classification Report (CNN):
              precision    recall  f1-score   support

    airplane       0.78      0.78      0.78      1000
  automobile       0.88      0.83      0.86      1000
        bird       0.71      0.59      0.64      1000
         cat       0.54      0.60      0.57      1000
        deer       0.72      0.69      0.71      1000
         dog       0.60      0.67      0.63      1000
        frog       0.81      0.79      0.80      1000
       horse       0.71      0.84      0.77      1000
        ship       0.88      0.81      0.84      1000
       truck       0.85      0.82      0.83      1000

    accuracy                           0.74     10000
   macro avg       0.75      0.74      0.74     10000
weighted avg       0.75      0.74      0.74     10000

No description has been provided for this image
InΒ [14]:
# Display some misclassified images (optional)
print("\n--- Displaying Some Misclassified Images ---")
misclassified_indices = np.where(y_pred_labels_cnn != y_true_labels_cnn)[0]
num_display = min(10, len(misclassified_indices)) # Display up to 10 misclassified images

plt.figure(figsize=(12, 8))
for i, idx in enumerate(np.random.choice(misclassified_indices, num_display, replace=False)):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test_raw[idx])
    plt.title(f"True: {cifar10_class_names[y_true_labels_cnn[idx]]}\nPred: {cifar10_class_names[y_pred_labels_cnn[idx]]}")
    plt.axis('off')
plt.suptitle('Misclassified CIFAR-10 Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
--- Displaying Some Misclassified Images ---
No description has been provided for this image

Discussion Points:

  • Based on the classification report and confusion matrix, which classes did the CNN perform well on, and which did it struggle with? What might be reasons for these differences (e.g., similarity between classes)?
  • Observe some of the misclassified images. Can you identify any common patterns in why the model might have made incorrect predictions (e.g., blurry images, unusual angles, background clutter)?

Part 4: Advanced Topics & DiscussionΒΆ

This section is for broader discussion and conceptual understanding, extending beyond direct coding.

Discussion Topics:

  • Hierarchical Feature Learning: Explain how CNNs learn features at different levels of abstraction across their layers.

  • Data Augmentation: What is data augmentation, why is it crucial for training CNNs, and what are some common augmentation techniques?

  • Transfer Learning: Explain the concept of transfer learning in the context of CNNs. When and why would you use it, and how is it typically implemented?

  • Popular CNN Architectures: Briefly discuss the key ideas behind famous CNN architectures like VGGNet, ResNet, and Inception (GoogLeNet).

  • Limitations of CNNs: What are some inherent limitations or challenges when using CNNs, and how are researchers trying to address them?

  • Hierarchical Feature Learning: CNNs excel at hierarchical feature learning, meaning they learn features at different levels of abstraction as data progresses through their layers:

    • Early Layers (closer to input): These layers tend to learn very basic, low-level features, such as edges (horizontal, vertical, diagonal), corners, blobs, and simple color gradients. The receptive fields of neurons in these layers are small, focusing on local patterns.
    • Middle Layers: As the data passes through more convolutional layers, neurons in these layers combine the simpler features from previous layers to detect more complex, mid-level patterns. For example, combinations of edges might form textures, circles, or parts of objects like eyes, noses, or wheels. Their receptive fields are larger, encompassing wider regions of the input.
    • Later Layers (closer to output): The deepest convolutional layers learn highly abstract, high-level features. These features might correspond to entire objects (e.g., "cat face," "car body") or complex parts of objects. The neurons in these layers have very large receptive fields, effectively seeing a large portion of the original input image.
    • Analogy: Think of building blocks: basic shapes (edges) are combined to form more complex shapes (windows, doors), which are then combined to form an entire house. This automatic, layered feature extraction is a significant advantage over traditional methods that required manual feature engineering.
  • Data Augmentation:

    • What it is: Data augmentation is a technique used to artificially increase the size and diversity of a training dataset by applying various transformations to the existing data. For images, this means creating new training samples by modifying the original images in ways that preserve their class label.
    • Why it's crucial for CNNs:
      • Prevent Overfitting: Deep CNNs have millions of parameters and require vast amounts of data to learn robustly without overfitting. Augmentation expands the dataset, making the model more generalized.
      • Improve Generalization: By exposing the model to variations of the same image (e.g., rotated, flipped, slightly zoomed), it learns to recognize objects irrespective of minor changes in their appearance, position, or orientation.
      • Enhance Robustness: Makes the model more resilient to real-world variations in lighting, scale, position, etc.
    • Common Techniques for Images:
      • Flipping: Horizontal or vertical flips.
      • Rotation: Rotating the image by a certain degree.
      • Scaling/Zooming: Zooming in or out.
      • Translation (Shifting): Shifting the image horizontally or vertically.
      • Shearing: Tilting the image.
      • Brightness/Contrast Adjustment: Changing lighting conditions.
      • Adding Noise: Introducing random noise.
      • Cutout/Random Erasing: Masking out a random square region of the image.
  • Transfer Learning:

    • Concept: Transfer learning involves taking a pre-trained model (a CNN that has already been trained on a very large and general dataset, like ImageNet, which contains millions of images across 1000 categories) and reusing it as a starting point for a new, related task.
    • When and Why to Use It:
      • Limited Data: When you have a relatively small dataset for your specific task, training a deep CNN from scratch is often impossible due to overfitting. A pre-trained model has already learned powerful, generic features (edges, textures, shapes) from a vast dataset.
      • Reduced Training Time: Fine-tuning a pre-trained model is significantly faster and requires less computational power than training from scratch.
      • Improved Performance: Leveraging the knowledge gained from a large dataset often leads to better performance on the new task, even with limited task-specific data.
    • How it's Implemented:
      • Feature Extraction (Frozen Layers): The most common approach. The convolutional base (all layers except the final classification layers) of the pre-trained model is kept frozen (its weights are not updated). Only new, randomly initialized fully connected layers are added on top and trained on the new dataset. This uses the pre-trained model as a powerful feature extractor.
      • Fine-Tuning: Unfreeze some (typically the top few) or all of the layers of the pre-trained convolutional base and train them along with the new classification layers, usually with a very small learning rate. This allows the pre-trained features to adapt slightly to the new task.
  • Popular CNN Architectures:

    • LeNet-5 (1998): One of the earliest CNNs, used for digit recognition. Established the basic Conv-Pool-FC pattern.
    • AlexNet (2012): Broke ImageNet records, popularizing CNNs. Deeper than LeNet, used ReLU activation, and employed dropout.
    • VGGNet (2014): Emphasized simplicity by using mostly 3x3 convolutional filters and max-pooling, but stacked many layers to achieve great depth (e.g., VGG-16, VGG-19). Showed that depth is crucial.
    • GoogLeNet (Inception) (2014): Introduced "Inception Modules" which perform multiple parallel convolutions (1x1, 3x3, 5x5 filters, and pooling) at the same level, concatenating their outputs. This allowed for wider and deeper networks without a massive increase in parameters.
    • ResNet (Residual Networks) (2015): Solved the "degradation problem" (accuracy saturating and then degrading with increasing depth) in very deep networks. Introduced "skip connections" (or residual connections) that allow the input of a layer to be added directly to its output, helping gradients flow through hundreds of layers.
    • DenseNet (2017): Connects each layer to every other layer in a feed-forward fashion, leading to very efficient feature propagation and parameter usage.
  • Limitations of CNNs:

    • Data Hunger: Despite advantages like transfer learning, deep CNNs still require substantial amounts of labeled data for optimal performance, which can be expensive or unavailable for niche tasks.
    • Computational Cost: Training very deep CNNs is computationally intensive and requires powerful hardware (GPUs/TPUs) and significant time.
    • Lack of Spatial Invariance to Rotation/Scale (Beyond Local): While max-pooling helps with small translations, CNNs are not inherently robust to large rotations, changes in scale, or other geometric transformations unless specifically trained with extensive data augmentation.
    • Interpretability (Black Box): While techniques like Grad-CAM can highlight important regions, understanding why a CNN makes a specific decision is still challenging, leading to "black box" concerns in critical applications.
    • Vulnerability to Adversarial Attacks: CNNs can be surprisingly vulnerable to tiny, imperceptible perturbations in input images (adversarial examples) that cause them to misclassify with high confidence.
    • Hierarchy, but not always "Understanding": CNNs excel at pattern recognition but don't possess genuine "understanding" of objects or scenes in the human sense (e.g., understanding causality, 3D structure, or physics).
    • Fixed Resolution Inputs: Standard CNNs typically require fixed-size input images. Preprocessing (resizing, cropping) is often needed, which can sometimes lead to loss of information or distortion.

Prepared By

Md. Atikuzzaman
Lecturer
Department of Computer Science and Engineering
Green University of Bangladesh
Email: atik@cse.green.edu.bd