# Install necessary libraries if not already present in Colab environment
!pip install tensorflow scikit-learn matplotlib seaborn numpy
Requirement already satisfied: tensorflow in /usr/local/lib/python3.11/dist-packages (2.18.0) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.6.1) Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.10.0) Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.2) Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (2.0.2) Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.4.0) Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.6.3) Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (25.2.10) Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.6.0) Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0) Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (18.1.1) Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.4.0) Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow) (24.2) Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (5.29.5) Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.32.3) Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from tensorflow) (75.2.0) Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.0) Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.1.0) Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.14.1) Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.2) Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.73.1) Requirement already satisfied: tensorboard<2.19,>=2.18 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.18.0) Requirement already satisfied: keras>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.8.0) Requirement already satisfied: h5py>=3.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.14.0) Requirement already satisfied: ml-dtypes<0.5.0,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.4.1) Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.37.1) Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.15.3) Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.5.1) Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.6.0) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.3.2) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.58.5) Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.8) Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (11.2.1) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.3) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.11/dist-packages (from seaborn) (2.2.2) Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from astunparse>=1.6.0->tensorflow) (0.45.1) Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (13.9.4) Requirement already satisfied: namex in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.1.0) Requirement already satisfied: optree in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.16.0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2025.2) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.4.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2025.7.9) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.8.2) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (0.7.2) Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.1.3) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard<2.19,>=2.18->tensorflow) (3.0.2) Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (2.19.2) Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.5.0->tensorflow) (0.1.2)
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# TensorFlow and Keras for CNNs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10 # A common image dataset
# Scikit-learn for preprocessing and evaluation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')
print(f"TensorFlow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")
print(f"NumPy Version: {np.__version__}")
TensorFlow Version: 2.18.0 Keras Version: 3.8.0 NumPy Version: 2.0.2
Part 1: Data Preparation for CNNs (CIFAR-10 Dataset)ΒΆ
We'll use the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. This dataset is a good step up from MNIST, as images are colored and slightly more complex.
Tasks:
- Load the CIFAR-10 dataset.
- Normalize pixel values to the range [0, 1].
- Convert target labels to one-hot encoded format.
- Verify data shapes and types.
Data Loading and SplittingΒΆ
own_dataset/
β
βββ airplane/
β βββ img001.jpg
β βββ img002.jpg
β βββ ...
β
βββ automobile/
β βββ img001.jpg
β βββ img002.jpg
β βββ ...
β
βββ .../
Counts images in each class folder (e.g., airplane, automobile, etc.) and prints the number of images per class.
Splits the dataset into:
- 80% training set
- 20% test set
Creates lists containing the file paths of images for the training and test sets, along with corresponding numeric labels for each class (e.g., 0 for airplane, 1 for automobile, etc.).
# use this when you are using your own dataset
# import os
# import random
# from sklearn.model_selection import train_test_split
# # Set seed for reproducibility
# random.seed(42)
# # Path to your dataset folder containing class folders
# dataset_path = 'path_to_your_dataset_folder'
# # List all class folders (assumed to be the class names)
# class_folders = sorted([f for f in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, f))])
# # Dictionary to hold all image file paths per class
# image_paths_per_class = {}
# for class_name in class_folders:
# class_folder_path = os.path.join(dataset_path, class_name)
# images = os.listdir(class_folder_path)
# image_files = [os.path.join(class_folder_path, f) for f in images if f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]
# image_paths_per_class[class_name] = image_files
# print(f"Class '{class_name}' has {len(image_files)} images.")
# # Now split the dataset into train/test sets (80/20 split)
# train_paths = []
# train_labels = []
# test_paths = []
# test_labels = []
# for class_index, class_name in enumerate(class_folders):
# paths = image_paths_per_class[class_name]
# # Shuffle paths (optional because train_test_split shuffles anyway)
# random.shuffle(paths)
# # Split
# train, test = train_test_split(paths, test_size=0.2, random_state=42)
# # Append to global train/test lists with labels
# train_paths.extend(train)
# train_labels.extend([class_index] * len(train))
# test_paths.extend(test)
# test_labels.extend([class_index] * len(test))
# print(f"Total training images: {len(train_paths)}")
# print(f"Total testing images: {len(test_paths)}")
# 1. Load the CIFAR-10 dataset
(X_train_raw, y_train_raw), (X_test_raw, y_test_raw) = cifar10.load_data()
# Define class names for visualization
cifar10_class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
print(f"Original Training data shape: {X_train_raw.shape}") # (50000, 32, 32, 3) -> N, H, W, C
print(f"Original Test data shape: {X_test_raw.shape}") # (10000, 32, 32, 3)
print(f"Original Training labels shape: {y_train_raw.shape}")
print(f"Original Test labels shape: {y_test_raw.shape}")
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 ββββββββββββββββββββ 11s 0us/step Original Training data shape: (50000, 32, 32, 3) Original Test data shape: (10000, 32, 32, 3) Original Training labels shape: (50000, 1) Original Test labels shape: (10000, 1)
# Display a few sample images
plt.figure(figsize=(10, 5))
for i in range(10):
plt.subplot(2, 5, i + 1)
plt.imshow(X_train_raw[i])
plt.title(f"{cifar10_class_names[y_train_raw[i][0]]}")
plt.axis('off')
plt.suptitle('Sample CIFAR-10 Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
# 2. Normalize pixel values to [0, 1]
# Image pixel values typically range from 0-255. Normalizing helps training stability.
X_train = X_train_raw.astype('float32') / 255.0
X_test = X_test_raw.astype('float32') / 255.0
print(f"\nNormalized X_train min: {X_train.min()}, max: {X_train.max()}")
Normalized X_train min: 0.0, max: 1.0
# 3. Convert target labels to one-hot encoded format
# This is necessary for multi-class classification with categorical cross-entropy loss.
num_classes = len(cifar10_class_names)
y_train = to_categorical(y_train_raw, num_classes)
y_test = to_categorical(y_test_raw, num_classes)
print(f"One-hot encoded y_train shape: {y_train.shape}")
print(f"One-hot encoded y_test shape: {y_test.shape}")
print(f"First 5 one-hot encoded labels:\n{y_train[:5]}")
One-hot encoded y_train shape: (50000, 10) One-hot encoded y_test shape: (10000, 10) First 5 one-hot encoded labels: [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Discussion Point:
- Explain why normalizing pixel values (scaling them to [0, 1]) is a common and beneficial preprocessing step for CNNs.
- Why is one-hot encoding necessary for the target labels when training a multi-class classification CNN with a Softmax output layer?
Part 2: Building and Training a Simple CNNΒΆ
We'll construct a basic CNN model, combining convolutional, pooling, and fully connected layers.
Tasks:
- Define the CNN model architecture using
keras.Sequential
, including:Conv2D
layers (specifying filters, kernel size, activation, input shape).MaxPooling2D
layers.Flatten
layer.Dense
(fully connected) layers for classification.
- Compile the model with an appropriate optimizer, loss function, and metrics.
- Train the model and visualize the training history.
Architecture of a traditional CNN:ΒΆ
Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:
The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.
Convolution layer (CONV):ΒΆ
The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I
with respect to its dimensions. Its hyperparameters include the filter size F
and stride S
. The resulting output O
is called feature map or activation map.
Remark: the convolution step can be generalized to the 1D and 3D cases as well.
Pooling (POOL)ΒΆ
The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.
Pooling Types in CNNΒΆ
Type | Max Pooling | Average Pooling |
---|---|---|
Purpose | Each pooling operation selects the maximum value of the current view | Each pooling operation averages the values of the current view |
Illustration | ||
Comments | β’ Preserves detected features β’ Most commonly used | β’ Downsamples feature map β’ Used in LeNet |
Fully Connected (FC)ΒΆ
The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.
# Define the CNN model architecture
# A sequential model is a linear stack of layers.
cnn_model = Sequential([
# Input Layer: Must match the shape of our images (32, 32, 3)
# Conv2D: Applies 2D convolution over images.
# - filters: Number of output filters in the convolution.
# - kernel_size: Dimensions of the convolution window (e.g., 3x3).
# - activation: Rectified Linear Unit (ReLU) is standard for hidden layers.
# - input_shape: Only for the first layer, defines the shape of input images.
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
# MaxPooling2D: Downsamples the input representation by taking the maximum value over
# a spatial window (e.g., 2x2). Reduces computational load and helps with translation invariance.
MaxPooling2D((2, 2)),
# Second Convolutional Block
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
# Third Convolutional Block (optional, can make it deeper)
Conv2D(64, (3, 3), activation='relu'),
# Flatten Layer: Converts the 3D output of the convolutional layers (height, width, filters)
# into a 1D vector to be fed into the fully connected layers.
Flatten(),
# Fully Connected Layers: Standard dense layers for classification.
# - Dropout: Regularization technique that randomly sets a fraction of input units to 0
# at each update during training time, which helps prevent overfitting.
Dropout(0.5), # Add dropout before the final classification layer
Dense(64, activation='relu'),
Dense(num_classes, activation='softmax') # Output layer: Softmax for multi-class classification
])
# Display the model summary
cnn_model.summary()
Model: "sequential"
βββββββββββββββββββββββββββββββββββ³βββββββββββββββββββββββββ³ββββββββββββββββ β Layer (type) β Output Shape β Param # β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β conv2d (Conv2D) β (None, 30, 30, 32) β 896 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β max_pooling2d (MaxPooling2D) β (None, 15, 15, 32) β 0 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β conv2d_1 (Conv2D) β (None, 13, 13, 64) β 18,496 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β max_pooling2d_1 (MaxPooling2D) β (None, 6, 6, 64) β 0 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β conv2d_2 (Conv2D) β (None, 4, 4, 64) β 36,928 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β flatten (Flatten) β (None, 1024) β 0 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β dropout (Dropout) β (None, 1024) β 0 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β dense (Dense) β (None, 64) β 65,600 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β dense_1 (Dense) β (None, 10) β 650 β βββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄ββββββββββββββββ
Total params: 122,570 (478.79 KB)
Trainable params: 122,570 (478.79 KB)
Non-trainable params: 0 (0.00 B)
# Visualize the model architecture
# Install graphviz if needed
!pip install graphviz
!apt-get install graphviz
tf.keras.utils.plot_model(cnn_model, show_shapes=True, show_layer_names=True)
Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (0.21) Reading package lists... Done Building dependency tree... Done Reading state information... Done graphviz is already the newest version (2.42.2-6ubuntu0.1). 0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
# Compile the model
# - Optimizer: 'adam' is generally a good default choice.
# - Loss Function: 'categorical_crossentropy' for multi-class classification with one-hot encoded labels.
# - Metrics: 'accuracy' to monitor performance.
cnn_model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Train the model
print("\nStarting CNN model training...")
# Use a smaller subset of training data for faster experimentation in Colab,
# or train on full data if you have more time/resources.
# For CIFAR-10, even 10 epochs can take a few minutes on CPU.
history_cnn = cnn_model.fit(X_train, y_train,
epochs=20, # Increased epochs slightly for better learning
batch_size=64, # Standard batch size
validation_split=0.1, # Use 10% of training data for validation
verbose=1) # Display training progress
print("\nCNN training complete.")
Starting CNN model training... Epoch 1/20 704/704 ββββββββββββββββββββ 78s 105ms/step - accuracy: 0.2989 - loss: 1.8766 - val_accuracy: 0.5050 - val_loss: 1.3695 Epoch 2/20 704/704 ββββββββββββββββββββ 80s 114ms/step - accuracy: 0.4979 - loss: 1.3774 - val_accuracy: 0.5334 - val_loss: 1.2891 Epoch 3/20 704/704 ββββββββββββββββββββ 68s 95ms/step - accuracy: 0.5514 - loss: 1.2479 - val_accuracy: 0.6138 - val_loss: 1.0839 Epoch 4/20 704/704 ββββββββββββββββββββ 72s 81ms/step - accuracy: 0.6002 - loss: 1.1337 - val_accuracy: 0.6386 - val_loss: 1.0173 Epoch 5/20 704/704 ββββββββββββββββββββ 83s 82ms/step - accuracy: 0.6273 - loss: 1.0506 - val_accuracy: 0.6686 - val_loss: 0.9494 Epoch 6/20 704/704 ββββββββββββββββββββ 80s 79ms/step - accuracy: 0.6444 - loss: 1.0099 - val_accuracy: 0.6812 - val_loss: 0.9310 Epoch 7/20 704/704 ββββββββββββββββββββ 85s 84ms/step - accuracy: 0.6640 - loss: 0.9486 - val_accuracy: 0.6970 - val_loss: 0.8666 Epoch 8/20 704/704 ββββββββββββββββββββ 57s 81ms/step - accuracy: 0.6751 - loss: 0.9265 - val_accuracy: 0.7092 - val_loss: 0.8358 Epoch 9/20 704/704 ββββββββββββββββββββ 82s 81ms/step - accuracy: 0.6859 - loss: 0.8797 - val_accuracy: 0.6994 - val_loss: 0.8780 Epoch 10/20 704/704 ββββββββββββββββββββ 80s 79ms/step - accuracy: 0.6965 - loss: 0.8606 - val_accuracy: 0.7200 - val_loss: 0.8078 Epoch 11/20 704/704 ββββββββββββββββββββ 56s 79ms/step - accuracy: 0.7044 - loss: 0.8365 - val_accuracy: 0.7198 - val_loss: 0.8166 Epoch 12/20 704/704 ββββββββββββββββββββ 82s 79ms/step - accuracy: 0.7152 - loss: 0.8109 - val_accuracy: 0.7230 - val_loss: 0.8053 Epoch 13/20 704/704 ββββββββββββββββββββ 83s 81ms/step - accuracy: 0.7188 - loss: 0.7978 - val_accuracy: 0.7204 - val_loss: 0.8233 Epoch 14/20 704/704 ββββββββββββββββββββ 57s 81ms/step - accuracy: 0.7222 - loss: 0.7799 - val_accuracy: 0.7362 - val_loss: 0.7693 Epoch 15/20 704/704 ββββββββββββββββββββ 81s 81ms/step - accuracy: 0.7280 - loss: 0.7674 - val_accuracy: 0.7426 - val_loss: 0.7525 Epoch 16/20 704/704 ββββββββββββββββββββ 57s 81ms/step - accuracy: 0.7419 - loss: 0.7317 - val_accuracy: 0.7504 - val_loss: 0.7376 Epoch 17/20 704/704 ββββββββββββββββββββ 82s 80ms/step - accuracy: 0.7439 - loss: 0.7247 - val_accuracy: 0.7478 - val_loss: 0.7400 Epoch 18/20 704/704 ββββββββββββββββββββ 83s 82ms/step - accuracy: 0.7494 - loss: 0.7052 - val_accuracy: 0.7280 - val_loss: 0.7789 Epoch 19/20 704/704 ββββββββββββββββββββ 57s 81ms/step - accuracy: 0.7537 - loss: 0.6951 - val_accuracy: 0.7524 - val_loss: 0.7426 Epoch 20/20 704/704 ββββββββββββββββββββ 82s 81ms/step - accuracy: 0.7569 - loss: 0.6891 - val_accuracy: 0.7422 - val_loss: 0.7813 CNN training complete.
# Save entire model
cnn_model.save('cnn_model.keras')
print(" Model saved to cnn_model.keras")
Model saved to cnn_model.keras
from tensorflow.keras.models import load_model
# Load the model
cnn_model = load_model('cnn_model.keras')
print(" Model loaded from cnn_model.keras")
Model loaded from cnn_model.keras
# Retrain the model
history_cnn = cnn_model.fit(X_train, y_train,
epochs=1, # Add more epochs as needed
batch_size=64,
validation_split=0.1,
verbose=1)
704/704 ββββββββββββββββββββ 57s 80ms/step - accuracy: 0.7560 - loss: 0.6965 - val_accuracy: 0.7482 - val_loss: 0.7403
# Visualize training history (loss and accuracy)
plt.figure(figsize=(12, 5))
# Plot training & validation accuracy values
plt.subplot(1, 2, 1)
plt.plot(history_cnn.history['accuracy'])
plt.plot(history_cnn.history['val_accuracy'])
plt.title('CNN Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)
# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history_cnn.history['loss'])
plt.plot(history_cnn.history['val_loss'])
plt.title('CNN Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Discussion Points:
- Describe the flow of data through the CNN architecture you defined, from the input image to the final prediction. How do the
Conv2D
,MaxPooling2D
, andFlatten
layers transform the data's shape? - What is the purpose of
MaxPooling2D
? How does it contribute to translation invariance and computational efficiency? - Analyze the training history plots. Does the model appear to be learning effectively? Is there any sign of overfitting (e.g., large gap between training and validation accuracy/loss)?
Part 3: Model Evaluation and PredictionΒΆ
After training, it's crucial to evaluate the model's performance on the unseen test set to assess its generalization capabilities.
Tasks:
- Evaluate the trained CNN model on the test set using
cnn_model.evaluate
. - Make predictions on a subset of the test data.
- Convert probabilistic predictions back to class labels.
- Generate and plot a confusion matrix.
- Display example misclassified images to understand model errors.
# Evaluate the model on the test set
print("\nEvaluating CNN model on test data...")
test_loss, test_accuracy = cnn_model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
Evaluating CNN model on test data... 313/313 ββββββββββββββββββββ 4s 14ms/step - accuracy: 0.7407 - loss: 0.7654 Test Loss: 0.7672 Test Accuracy: 0.7419
# Make predictions on the test set
y_pred_proba_cnn = cnn_model.predict(X_test)
y_pred_labels_cnn = np.argmax(y_pred_proba_cnn, axis=1) # Convert probabilities to class labels
y_true_labels_cnn = np.argmax(y_test, axis=1) # Convert one-hot true labels back to single integers
print(f"\nFirst 5 predicted labels: {y_pred_labels_cnn[:5].tolist()}")
print(f"First 5 true labels: {y_true_labels_cnn[:5].tolist()}")
# Classification Report
print("\nClassification Report (CNN):")
print(classification_report(y_true_labels_cnn, y_pred_labels_cnn, target_names=cifar10_class_names))
# Confusion Matrix
cm_cnn = confusion_matrix(y_true_labels_cnn, y_pred_labels_cnn)
plt.figure(figsize=(10, 8))
sns.heatmap(cm_cnn, annot=True, fmt='d', cmap='Blues',
xticklabels=cifar10_class_names, yticklabels=cifar10_class_names)
plt.title('Confusion Matrix (CNN)')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
313/313 ββββββββββββββββββββ 7s 21ms/step First 5 predicted labels: [3, 1, 8, 0, 6] First 5 true labels: [3, 8, 8, 0, 6] Classification Report (CNN): precision recall f1-score support airplane 0.78 0.78 0.78 1000 automobile 0.88 0.83 0.86 1000 bird 0.71 0.59 0.64 1000 cat 0.54 0.60 0.57 1000 deer 0.72 0.69 0.71 1000 dog 0.60 0.67 0.63 1000 frog 0.81 0.79 0.80 1000 horse 0.71 0.84 0.77 1000 ship 0.88 0.81 0.84 1000 truck 0.85 0.82 0.83 1000 accuracy 0.74 10000 macro avg 0.75 0.74 0.74 10000 weighted avg 0.75 0.74 0.74 10000
# Display some misclassified images (optional)
print("\n--- Displaying Some Misclassified Images ---")
misclassified_indices = np.where(y_pred_labels_cnn != y_true_labels_cnn)[0]
num_display = min(10, len(misclassified_indices)) # Display up to 10 misclassified images
plt.figure(figsize=(12, 8))
for i, idx in enumerate(np.random.choice(misclassified_indices, num_display, replace=False)):
plt.subplot(2, 5, i + 1)
plt.imshow(X_test_raw[idx])
plt.title(f"True: {cifar10_class_names[y_true_labels_cnn[idx]]}\nPred: {cifar10_class_names[y_pred_labels_cnn[idx]]}")
plt.axis('off')
plt.suptitle('Misclassified CIFAR-10 Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
--- Displaying Some Misclassified Images ---
Discussion Points:
- Based on the classification report and confusion matrix, which classes did the CNN perform well on, and which did it struggle with? What might be reasons for these differences (e.g., similarity between classes)?
- Observe some of the misclassified images. Can you identify any common patterns in why the model might have made incorrect predictions (e.g., blurry images, unusual angles, background clutter)?
Part 4: Advanced Topics & DiscussionΒΆ
This section is for broader discussion and conceptual understanding, extending beyond direct coding.
Discussion Topics:
Hierarchical Feature Learning: Explain how CNNs learn features at different levels of abstraction across their layers.
Data Augmentation: What is data augmentation, why is it crucial for training CNNs, and what are some common augmentation techniques?
Transfer Learning: Explain the concept of transfer learning in the context of CNNs. When and why would you use it, and how is it typically implemented?
Popular CNN Architectures: Briefly discuss the key ideas behind famous CNN architectures like VGGNet, ResNet, and Inception (GoogLeNet).
Limitations of CNNs: What are some inherent limitations or challenges when using CNNs, and how are researchers trying to address them?
Hierarchical Feature Learning: CNNs excel at hierarchical feature learning, meaning they learn features at different levels of abstraction as data progresses through their layers:
- Early Layers (closer to input): These layers tend to learn very basic, low-level features, such as edges (horizontal, vertical, diagonal), corners, blobs, and simple color gradients. The receptive fields of neurons in these layers are small, focusing on local patterns.
- Middle Layers: As the data passes through more convolutional layers, neurons in these layers combine the simpler features from previous layers to detect more complex, mid-level patterns. For example, combinations of edges might form textures, circles, or parts of objects like eyes, noses, or wheels. Their receptive fields are larger, encompassing wider regions of the input.
- Later Layers (closer to output): The deepest convolutional layers learn highly abstract, high-level features. These features might correspond to entire objects (e.g., "cat face," "car body") or complex parts of objects. The neurons in these layers have very large receptive fields, effectively seeing a large portion of the original input image.
- Analogy: Think of building blocks: basic shapes (edges) are combined to form more complex shapes (windows, doors), which are then combined to form an entire house. This automatic, layered feature extraction is a significant advantage over traditional methods that required manual feature engineering.
Data Augmentation:
- What it is: Data augmentation is a technique used to artificially increase the size and diversity of a training dataset by applying various transformations to the existing data. For images, this means creating new training samples by modifying the original images in ways that preserve their class label.
- Why it's crucial for CNNs:
- Prevent Overfitting: Deep CNNs have millions of parameters and require vast amounts of data to learn robustly without overfitting. Augmentation expands the dataset, making the model more generalized.
- Improve Generalization: By exposing the model to variations of the same image (e.g., rotated, flipped, slightly zoomed), it learns to recognize objects irrespective of minor changes in their appearance, position, or orientation.
- Enhance Robustness: Makes the model more resilient to real-world variations in lighting, scale, position, etc.
- Common Techniques for Images:
- Flipping: Horizontal or vertical flips.
- Rotation: Rotating the image by a certain degree.
- Scaling/Zooming: Zooming in or out.
- Translation (Shifting): Shifting the image horizontally or vertically.
- Shearing: Tilting the image.
- Brightness/Contrast Adjustment: Changing lighting conditions.
- Adding Noise: Introducing random noise.
- Cutout/Random Erasing: Masking out a random square region of the image.
Transfer Learning:
- Concept: Transfer learning involves taking a pre-trained model (a CNN that has already been trained on a very large and general dataset, like ImageNet, which contains millions of images across 1000 categories) and reusing it as a starting point for a new, related task.
- When and Why to Use It:
- Limited Data: When you have a relatively small dataset for your specific task, training a deep CNN from scratch is often impossible due to overfitting. A pre-trained model has already learned powerful, generic features (edges, textures, shapes) from a vast dataset.
- Reduced Training Time: Fine-tuning a pre-trained model is significantly faster and requires less computational power than training from scratch.
- Improved Performance: Leveraging the knowledge gained from a large dataset often leads to better performance on the new task, even with limited task-specific data.
- How it's Implemented:
- Feature Extraction (Frozen Layers): The most common approach. The convolutional base (all layers except the final classification layers) of the pre-trained model is kept frozen (its weights are not updated). Only new, randomly initialized fully connected layers are added on top and trained on the new dataset. This uses the pre-trained model as a powerful feature extractor.
- Fine-Tuning: Unfreeze some (typically the top few) or all of the layers of the pre-trained convolutional base and train them along with the new classification layers, usually with a very small learning rate. This allows the pre-trained features to adapt slightly to the new task.
Popular CNN Architectures:
- LeNet-5 (1998): One of the earliest CNNs, used for digit recognition. Established the basic Conv-Pool-FC pattern.
- AlexNet (2012): Broke ImageNet records, popularizing CNNs. Deeper than LeNet, used ReLU activation, and employed dropout.
- VGGNet (2014): Emphasized simplicity by using mostly 3x3 convolutional filters and max-pooling, but stacked many layers to achieve great depth (e.g., VGG-16, VGG-19). Showed that depth is crucial.
- GoogLeNet (Inception) (2014): Introduced "Inception Modules" which perform multiple parallel convolutions (1x1, 3x3, 5x5 filters, and pooling) at the same level, concatenating their outputs. This allowed for wider and deeper networks without a massive increase in parameters.
- ResNet (Residual Networks) (2015): Solved the "degradation problem" (accuracy saturating and then degrading with increasing depth) in very deep networks. Introduced "skip connections" (or residual connections) that allow the input of a layer to be added directly to its output, helping gradients flow through hundreds of layers.
- DenseNet (2017): Connects each layer to every other layer in a feed-forward fashion, leading to very efficient feature propagation and parameter usage.
Limitations of CNNs:
- Data Hunger: Despite advantages like transfer learning, deep CNNs still require substantial amounts of labeled data for optimal performance, which can be expensive or unavailable for niche tasks.
- Computational Cost: Training very deep CNNs is computationally intensive and requires powerful hardware (GPUs/TPUs) and significant time.
- Lack of Spatial Invariance to Rotation/Scale (Beyond Local): While max-pooling helps with small translations, CNNs are not inherently robust to large rotations, changes in scale, or other geometric transformations unless specifically trained with extensive data augmentation.
- Interpretability (Black Box): While techniques like Grad-CAM can highlight important regions, understanding why a CNN makes a specific decision is still challenging, leading to "black box" concerns in critical applications.
- Vulnerability to Adversarial Attacks: CNNs can be surprisingly vulnerable to tiny, imperceptible perturbations in input images (adversarial examples) that cause them to misclassify with high confidence.
- Hierarchy, but not always "Understanding": CNNs excel at pattern recognition but don't possess genuine "understanding" of objects or scenes in the human sense (e.g., understanding causality, 3D structure, or physics).
- Fixed Resolution Inputs: Standard CNNs typically require fixed-size input images. Preprocessing (resizing, cropping) is often needed, which can sometimes lead to loss of information or distortion.
Prepared By
Md. Atikuzzaman
Lecturer
Department of Computer Science and Engineering
Green University of Bangladesh
Email: atik@cse.green.edu.bd