Post

American Sign Language(ASL) Recognition Using CNN

American Sign Language(ASL) Recognition Using CNN

Overview

American Sign Language (ASL) is a lifeline for millions of deaf and hard-of-hearing individuals. Yet, for many, communication barriers remain when others don’t understand ASL.
This project tackles that gap by developing a Convolutional Neural Network (CNN) model capable of recognizing 36 static ASL gestures (A–Z, 0–9) from images with 94% accuracy.

By replacing an earlier & less effective landmark detection approach with CNNs, we gained:

  • Higher accuracy across all classes.
  • Better robustness to lighting, angles, and hand orientation.
  • A scalable base for real-time sign language translation tools.

Problem Statement

  • The Challenge: Enable machines to interpret ASL hand gestures from images.
  • Goal: Build a system that can classify static ASL gestures accurately and efficiently.
  • Why It Matters: This technology can empower inclusive communication for everyone.

πŸ› οΈ Tools & Dependencies

Core Libraries:

  • TensorFlow / Keras β†’ Designing, training, and evaluating the CNN.
  • Matplotlib / Seaborn β†’ Visualizing training curves & confusion matrices.
  • Scikit-learn β†’ Accuracy, precision, recall, F1-score.
  • NumPy / Pandas β†’ Data handling and preprocessing.

System Requirements:

  • Python 3.7+
  • GPU-enabled system (T4 GPU recommended in Google Colab)
  • 20 GB storage & 8 GB RAM

Dataset:

  • 36 classes (A–Z, 0–9)
  • 80% training10% validation10% testing

🧩 Model Architecture

CNN Architecture A Sequential CNN with three convolutional blocks, regularization, and fully connected layers.

Highlights:

  • Input: 200Γ—200 RGB images
  • Conv Blocks: 3 sets of two convolutional layers + ReLU + MaxPooling + Dropout
  • Dense Layers: Flatten β†’ Dense(512) β†’ Dense(128) β†’ Output(36, softmax)
  • Optimizer: Adam
  • Loss Function: Categorical Cross-Entropy
  • Regularization: Dropout (0.2–0.4)
  • Callbacks: EarlyStopping, ReduceLROnPlateau

πŸ”„ Methodology

  1. Data Preparation
    • Resize images β†’ 200Γ—200
    • Rescale pixel values β†’ [0,1]
    • Split into train, val, and test sets
  2. Training
    • 30 epochs
    • Early stopping to avoid overfitting
    • Dynamic learning rate adjustment
  3. Evaluation
    • Accuracy, precision, recall, F1-score
    • Confusion matrix analysis

πŸ“Š Results

Performance Metrics:

  • Test Accuracy: 94%
  • Macro Avg: Precision 96%, Recall 94%, F1-score 95%

Accuracy & Loss Trends Figure 1: Stable convergence with no signs of overfitting.

Confusion Matrix Figure 2: Minimal misclassifications across 36 classes.


πŸš€ Conclusion & Future Work

This model is a solid foundation for ASL recognition tools and can be extended for:

  • Dynamic gesture recognition using sequences.
  • Mobile/web real-time applications.
  • Robustness improvements via diverse training datasets.

β€œTechnology’s real power lies in making the world more inclusive.”

πŸ“‚ GitHub Repository

The Github Repo can be found here: American Sign Language DECODER

This post is licensed under CC BY 4.0 by the author.