# Install necessary libraries if not already present in Colab environment
!pip install tensorflow scikit-learn matplotlib seaborn pandas numpy

Requirement already satisfied: tensorflow in /usr/local/lib/python3.11/dist-packages (2.18.0)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.6.1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.10.0)
Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (2.0.2)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.4.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.6.3)
Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (25.2.10)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.6.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (18.1.1)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.4.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow) (24.2)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (5.29.5)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.32.3)
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from tensorflow) (75.2.0)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.1.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.14.1)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.17.2)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.73.1)
Requirement already satisfied: tensorboard<2.19,>=2.18 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.18.0)
Requirement already satisfied: keras>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.8.0)
Requirement already satisfied: h5py>=3.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.14.0)
Requirement already satisfied: ml-dtypes<0.5.0,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.4.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.37.1)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.15.3)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.6.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.58.5)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (11.2.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from astunparse>=1.6.0->tensorflow) (0.45.1)
Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (13.9.4)
Requirement already satisfied: namex in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.1.0)
Requirement already satisfied: optree in /usr/local/lib/python3.11/dist-packages (from keras>=3.5.0->tensorflow) (0.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorflow) (2025.6.15)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.8.2)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.19,>=2.18->tensorflow) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard<2.19,>=2.18->tensorflow) (3.0.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras>=3.5.0->tensorflow) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.5.0->tensorflow) (0.1.2)

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Scikit-learn for data preprocessing and evaluation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print(f"TensorFlow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")

TensorFlow Version: 2.18.0
Keras Version: 3.8.0

# Load the Iris dataset from scikit-learn
from sklearn.datasets import load_iris
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)
target_names = iris.target_names

print(f"Original dataset shape: {X.shape}, Target shape: {y.shape}")
print("\nFirst 5 rows of features (X):")
print(X.head())
print("\nFirst 5 rows of target (y):")
print(y.head())
print(f"Target classes: {target_names}")

Original dataset shape: (150, 4), Target shape: (150,)

First 5 rows of features (X):
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2

First 5 rows of target (y):
0    0
1    0
2    0
3    0
4    0
dtype: int64
Target classes: ['setosa' 'versicolor' 'virginica']

# Label Encoding: Convert string labels to numerical (e.g., 'setosa' -> 0)
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
# One-hot Encoding: Convert numerical labels to binary vectors (e.g., 0 -> [1,0,0])
# This is required for multi-class classification with categorical cross-entropy loss.
num_classes = len(np.unique(y_encoded))
y_one_hot = to_categorical(y_encoded, num_classes=num_classes)

print(f"\nOriginal y (first 5): {y.head().tolist()}")
print(f"Encoded y (first 5): {y_encoded[:5].tolist()}")
print(f"One-hot encoded y (first 5):\n{y_one_hot[:5]}")
print(f"Number of classes: {num_classes}")

Original y (first 5): [0, 0, 0, 0, 0]
Encoded y (first 5): [0, 0, 0, 0, 0]
One-hot encoded y (first 5):
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]
Number of classes: 3

X_train, X_test, y_train_oh, y_test_oh = train_test_split(X, y_one_hot, test_size=0.2, random_state=42, stratify=y_encoded)
# Keep original y_test for non-one-hot evaluation later if needed
_, _, _, y_test_original = train_test_split(X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded)


print(f"\nTraining features shape: {X_train.shape}")
print(f"Testing features shape: {X_test.shape}")
print(f"Training target (one-hot) shape: {y_train_oh.shape}")
print(f"Testing target (one-hot) shape: {y_test_oh.shape}")

Training features shape: (120, 4)
Testing features shape: (30, 4)
Training target (one-hot) shape: (120, 3)
Testing target (one-hot) shape: (30, 3)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # Use transform (not fit_transform) on test set

print("\nFirst 5 rows of scaled X_train:")
print(X_train_scaled[:5])

First 5 rows of scaled X_train:
[[-1.72156775 -0.33210111 -1.34572231 -1.32327558]
 [-1.12449223 -1.22765467  0.41450518  0.6517626 ]
 [ 1.14439475 -0.5559895   0.58484978  0.25675496]
 [-1.12449223  0.11567567 -1.28894078 -1.45494479]
 [-0.40800161 -1.22765467  0.13059752  0.12508575]]

# Define the model architecture
# We'll create a simple MLP with one hidden layer.
model = keras.Sequential([
    # Input Layer: Corresponds to the number of features in X_train_scaled
    keras.Input(shape=(X_train_scaled.shape[1],), name='input_layer'),

    # Hidden Layer 1: A dense (fully connected) layer with ReLU activation
    # ReLU is a common choice for hidden layers due to its computational efficiency
    # and ability to mitigate vanishing gradients.
    layers.Dense(units=10, activation='relu', name='hidden_layer_1'), # 10 neurons in the hidden layer

    # Output Layer:
    # Number of units = num_classes (3 for Iris)
    # Softmax activation: For multi-class classification, outputs probabilities that sum to 1.
    layers.Dense(units=num_classes, activation='softmax', name='output_layer')
])

# Display the model summary
model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ hidden_layer_1 (Dense)          │ (None, 10)             │            50 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ output_layer (Dense)            │ (None, 3)              │            33 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 83 (332.00 B)

 Trainable params: 83 (332.00 B)

 Non-trainable params: 0 (0.00 B)

# Visualize the model architecture
# Install graphviz if needed
!pip install graphviz
!apt-get install graphviz

tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)

Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (0.21)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
graphviz is already the newest version (2.42.2-6ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.

# Compile the model
# Optimizer: 'adam' is a popular choice for its efficiency and good performance in many scenarios.
# Loss Function: 'categorical_crossentropy' is used for multi-class classification with one-hot encoded labels.
# Metrics: 'accuracy' to monitor classification performance during training.
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
# epochs: Number of complete passes through the training dataset.
# batch_size: Number of samples per gradient update.
# validation_split: Reserve a portion of training data for validation during training.
print("\nStarting model training...")
history = model.fit(X_train_scaled, y_train_oh,
                    epochs=50,          # You can adjust the number of epochs
                    batch_size=8,       # You can adjust the batch size
                    validation_split=0.1, # Use 10% of training data for validation
                    verbose=1)          # Display training progress

print("\nTraining complete.")

# Visualize training history (loss and accuracy)
plt.figure(figsize=(12, 5))

# Plot training & validation accuracy values
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()
plt.show()

Starting model training...
Epoch 1/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.4286 - loss: 1.2129 - val_accuracy: 0.1667 - val_loss: 1.3825
Epoch 2/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.4289 - loss: 1.1320 - val_accuracy: 0.1667 - val_loss: 1.2994
Epoch 3/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.3839 - loss: 1.0725 - val_accuracy: 0.1667 - val_loss: 1.2259
Epoch 4/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.4727 - loss: 1.0241 - val_accuracy: 0.1667 - val_loss: 1.1575
Epoch 5/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.4596 - loss: 0.9863 - val_accuracy: 0.1667 - val_loss: 1.0955
Epoch 6/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.4351 - loss: 0.9783 - val_accuracy: 0.1667 - val_loss: 1.0339
Epoch 7/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.5875 - loss: 0.8820 - val_accuracy: 0.5833 - val_loss: 0.9838
Epoch 8/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6318 - loss: 0.8646 - val_accuracy: 0.6667 - val_loss: 0.9308
Epoch 9/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7471 - loss: 0.7785 - val_accuracy: 0.6667 - val_loss: 0.8882
Epoch 10/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.8589 - loss: 0.7464 - val_accuracy: 0.6667 - val_loss: 0.8491
Epoch 11/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.8647 - loss: 0.7181 - val_accuracy: 0.6667 - val_loss: 0.8081
Epoch 12/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8059 - loss: 0.6907 - val_accuracy: 0.6667 - val_loss: 0.7759
Epoch 13/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7805 - loss: 0.6511 - val_accuracy: 0.6667 - val_loss: 0.7452
Epoch 14/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8307 - loss: 0.6328 - val_accuracy: 0.6667 - val_loss: 0.7144
Epoch 15/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8453 - loss: 0.6126 - val_accuracy: 0.6667 - val_loss: 0.6846
Epoch 16/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8133 - loss: 0.5941 - val_accuracy: 0.6667 - val_loss: 0.6571
Epoch 17/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8226 - loss: 0.5505 - val_accuracy: 0.6667 - val_loss: 0.6335
Epoch 18/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8380 - loss: 0.5384 - val_accuracy: 0.6667 - val_loss: 0.6117
Epoch 19/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8355 - loss: 0.5144 - val_accuracy: 0.6667 - val_loss: 0.5885
Epoch 20/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8794 - loss: 0.4980 - val_accuracy: 0.6667 - val_loss: 0.5641
Epoch 21/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8824 - loss: 0.4869 - val_accuracy: 0.6667 - val_loss: 0.5508
Epoch 22/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8748 - loss: 0.4540 - val_accuracy: 0.6667 - val_loss: 0.5346
Epoch 23/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8836 - loss: 0.4292 - val_accuracy: 0.7500 - val_loss: 0.5153
Epoch 24/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.8936 - loss: 0.4162 - val_accuracy: 0.7500 - val_loss: 0.5047
Epoch 25/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9032 - loss: 0.4188 - val_accuracy: 0.7500 - val_loss: 0.4912
Epoch 26/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8915 - loss: 0.4022 - val_accuracy: 0.7500 - val_loss: 0.4731
Epoch 27/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9097 - loss: 0.3614 - val_accuracy: 0.7500 - val_loss: 0.4623
Epoch 28/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8596 - loss: 0.4068 - val_accuracy: 0.7500 - val_loss: 0.4455
Epoch 29/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8976 - loss: 0.3322 - val_accuracy: 0.7500 - val_loss: 0.4357
Epoch 30/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8939 - loss: 0.3512 - val_accuracy: 0.7500 - val_loss: 0.4263
Epoch 31/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9249 - loss: 0.3455 - val_accuracy: 0.7500 - val_loss: 0.4155
Epoch 32/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9131 - loss: 0.3301 - val_accuracy: 0.7500 - val_loss: 0.4107
Epoch 33/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9279 - loss: 0.2981 - val_accuracy: 0.7500 - val_loss: 0.4037
Epoch 34/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9267 - loss: 0.3077 - val_accuracy: 0.7500 - val_loss: 0.3956
Epoch 35/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8800 - loss: 0.3189 - val_accuracy: 0.7500 - val_loss: 0.3830
Epoch 36/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9254 - loss: 0.2802 - val_accuracy: 0.7500 - val_loss: 0.3795
Epoch 37/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8814 - loss: 0.3024 - val_accuracy: 0.7500 - val_loss: 0.3699
Epoch 38/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9250 - loss: 0.2953 - val_accuracy: 0.8333 - val_loss: 0.3579
Epoch 39/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9557 - loss: 0.2545 - val_accuracy: 0.7500 - val_loss: 0.3619
Epoch 40/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9378 - loss: 0.2643 - val_accuracy: 0.8333 - val_loss: 0.3527
Epoch 41/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9461 - loss: 0.2831 - val_accuracy: 0.8333 - val_loss: 0.3446
Epoch 42/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9430 - loss: 0.2745 - val_accuracy: 0.8333 - val_loss: 0.3375
Epoch 43/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9585 - loss: 0.2403 - val_accuracy: 0.8333 - val_loss: 0.3328
Epoch 44/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8882 - loss: 0.3083 - val_accuracy: 0.8333 - val_loss: 0.3265
Epoch 45/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9477 - loss: 0.2334 - val_accuracy: 0.8333 - val_loss: 0.3126
Epoch 46/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9295 - loss: 0.2738 - val_accuracy: 0.8333 - val_loss: 0.3084
Epoch 47/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9360 - loss: 0.2578 - val_accuracy: 0.8333 - val_loss: 0.3021
Epoch 48/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9511 - loss: 0.2280 - val_accuracy: 0.8333 - val_loss: 0.3018
Epoch 49/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8980 - loss: 0.2701 - val_accuracy: 0.9167 - val_loss: 0.2966
Epoch 50/50
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9096 - loss: 0.2676 - val_accuracy: 0.9167 - val_loss: 0.2916

Training complete.

# Make predictions on the test set
# predict returns probabilities for each class
y_pred_proba = model.predict(X_test_scaled)
print(f"\nShape of predicted probabilities: {y_pred_proba.shape}")
print(f"First 5 predicted probabilities:\n{y_pred_proba[:5]}")

# Convert probabilities to class labels (index of the highest probability)
y_pred_labels = np.argmax(y_pred_proba, axis=1)
print(f"\nFirst 5 predicted labels: {y_pred_labels[:5].tolist()}")
print(f"First 5 true labels (original encoded): {y_test_original[:5].tolist()}")

# Evaluate the model
print(f"\n--- Model Evaluation (on Test Set) ---")
print(f"Accuracy: {accuracy_score(y_test_original, y_pred_labels):.4f}")
print(f"Precision (macro): {precision_score(y_test_original, y_pred_labels, average='macro'):.4f}")
print(f"Recall (macro): {recall_score(y_test_original, y_pred_labels, average='macro'):.4f}")
print(f"F1-Score (macro): {f1_score(y_test_original, y_pred_labels, average='macro'):.4f}")

# Classification Report provides detailed per-class metrics
print("\nClassification Report:")
print(classification_report(y_test_original, y_pred_labels, target_names=target_names))

# Plot Confusion Matrix
cm = confusion_matrix(y_test_original, y_pred_labels)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=target_names, yticklabels=target_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 63ms/step

Shape of predicted probabilities: (30, 3)
First 5 predicted probabilities:
[[0.9673828  0.0260301  0.00658699]
 [0.01299555 0.22098392 0.76602054]
 [0.22556487 0.71684253 0.05759247]
 [0.11088667 0.84604895 0.04306443]
 [0.9775967  0.01553797 0.0068653 ]]

First 5 predicted labels: [0, 2, 1, 1, 0]
First 5 true labels (original encoded): [0, 2, 1, 1, 0]

--- Model Evaluation (on Test Set) ---
Accuracy: 0.8667
Precision (macro): 0.8750
Recall (macro): 0.8667
F1-Score (macro): 0.8653

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       0.88      0.70      0.78        10
   virginica       0.75      0.90      0.82        10

    accuracy                           0.87        30
   macro avg       0.88      0.87      0.87        30
weighted avg       0.88      0.87      0.87        30

model_deep = keras.Sequential([
    keras.Input(shape=(X_train_scaled.shape[1],)),
    layers.Dense(units=64, activation='relu', name='hidden_layer_1_deep'), # More neurons
    layers.Dense(units=32, activation='relu', name='hidden_layer_2_deep'), # Second hidden layer
    layers.Dense(units=num_classes, activation='softmax')
])
model_deep.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model_deep.summary()

# Use Early Stopping to prevent overfitting
# Monitors 'val_loss' and stops if it doesn't improve for 'patience' epochs
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

print("\nStarting training for Deeper Network with Early Stopping...")
history_deep = model_deep.fit(X_train_scaled, y_train_oh,
                              epochs=100, # Set a higher max epoch, EarlyStopping will manage it
                              batch_size=8,
                              validation_split=0.1,
                              callbacks=[early_stopping], # Add the EarlyStopping callback
                              verbose=0) # Set to 0 to suppress per-epoch output for brevity

print(f"Deeper network training stopped at epoch {len(history_deep.history['loss'])} (best epoch was restored).")

# Evaluate the deeper model
y_pred_deep = np.argmax(model_deep.predict(X_test_scaled), axis=1)
print(f"Deeper Model Accuracy: {accuracy_score(y_test_original, y_pred_deep):.4f}")
print("Deeper Model Classification Report:\n", classification_report(y_test_original, y_pred_deep, target_names=target_names))

# Plot training history for the deeper model
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history_deep.history['accuracy'])
plt.plot(history_deep.history['val_accuracy'])
plt.title('Deeper Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)

plt.subplot(1, 2, 2)
plt.plot(history_deep.history['loss'])
plt.plot(history_deep.history['val_loss'])
plt.title('Deeper Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

Model: "sequential_2"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ hidden_layer_1_deep (Dense)     │ (None, 64)             │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ hidden_layer_2_deep (Dense)     │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 3)              │            99 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 2,499 (9.76 KB)

 Trainable params: 2,499 (9.76 KB)

 Non-trainable params: 0 (0.00 B)

Starting training for Deeper Network with Early Stopping...
Deeper network training stopped at epoch 83 (best epoch was restored).
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step
Deeper Model Accuracy: 0.9333
Deeper Model Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       0.90      0.90      0.90        10
   virginica       0.90      0.90      0.90        10

    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30

model_dropout = keras.Sequential([
    keras.Input(shape=(X_train_scaled.shape[1],)),
    layers.Dense(units=64, activation='relu', name='hidden_layer_1_dropout'),
    layers.Dropout(0.3), # Dropout layer: randomly sets a fraction of input units to 0 at each update during training
    layers.Dense(units=32, activation='relu', name='hidden_layer_2_dropout'),
    layers.Dropout(0.3),
    layers.Dense(units=num_classes, activation='softmax')
])
model_dropout.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model_dropout.summary()

print("\nStarting training for Dropout Network with Early Stopping...")
history_dropout = model_dropout.fit(X_train_scaled, y_train_oh,
                                    epochs=100,
                                    batch_size=8,
                                    validation_split=0.1,
                                    callbacks=[early_stopping], # Re-use the early stopping callback
                                    verbose=0)

print(f"Dropout network training stopped at epoch {len(history_dropout.history['loss'])} (best epoch was restored).")

# Evaluate the dropout model
y_pred_dropout = np.argmax(model_dropout.predict(X_test_scaled), axis=1)
print(f"Dropout Model Accuracy: {accuracy_score(y_test_original, y_pred_dropout):.4f}")
print("Dropout Model Classification Report:\n", classification_report(y_test_original, y_pred_dropout, target_names=target_names))

# Plot training history for the dropout model
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history_dropout.history['accuracy'])
plt.plot(history_dropout.history['val_accuracy'])
plt.title('Dropout Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)

plt.subplot(1, 2, 2)
plt.plot(history_dropout.history['loss'])
plt.plot(history_dropout.history['val_loss'])
plt.title('Dropout Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

Model: "sequential_3"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ hidden_layer_1_dropout (Dense)  │ (None, 64)             │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ hidden_layer_2_dropout (Dense)  │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 3)              │            99 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 2,499 (9.76 KB)

 Trainable params: 2,499 (9.76 KB)

 Non-trainable params: 0 (0.00 B)

Starting training for Dropout Network with Early Stopping...
Dropout network training stopped at epoch 86 (best epoch was restored).
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 65ms/step
Dropout Model Accuracy: 0.9333
Dropout Model Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       0.90      0.90      0.90        10
   virginica       0.90      0.90      0.90        10

    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30

Part 1: Data Preparation for ANN¶

1. Encode the target variable (y)¶

2. Split data into training and testing sets¶

3. Scale numerical features¶

Discussion Point:¶

Part 2: Building and Training a Simple Feedforward ANN (MLP)¶

Discussion Points:¶

Part 3: Model Evaluation and Prediction¶

Discussion Points:¶

Part 4: Hyperparameter Tuning and Overfitting/Regularization¶

Experiment 1: Deeper Network with More Neurons¶

Experiment 2: Network with Dropout for Regularization¶

Discussion Points:¶

Part 5: Advanced Topics & Discussion¶

1. Activation Functions:¶

2. Optimizers:¶

3. Loss Functions:¶

4. Common Challenges when Training Neural Networks and Solutions:¶

5. Beyond MLPs (When to use other architectures):¶