Streaming digit classifier (lightweight)

How to Use ▶

1. Click "Start Recording" and clearly say a digit (0-9)

2. Select a processing method from the cabinets below

3. Watch the real-time audio visualization

4. Compare inference times and accuracy across methods

5. Use the Robustness toggle to test with noise

Select Processing Pipeline

MFCC + Dense NN

Mel Frequency Cepstral Coefficients

Predicted Digit: ?

Confidence: --

Inference Time: 0.0ms

Testing Acc: 98.52%

Mel CNN (2D)

Mel Spectrogram Convolutional Neural Network

Predicted Digit: ?

Confidence: --

Inference Time: 0.0ms

Testing Acc: 97.22%

Raw CNN (1D)

Raw Waveform Convolutional Neural Network

Predicted Digit: ?

Confidence: --

Inference Time: 0.0ms

Testing Acc: 91.30%

External API

Whisper API (Disabled)

Predicted Digit: N/A

Confidence: --

Inference Time: N/A

Status: Disabled

Model Architecture & Training Metrics

MFCC + Dense NN

Architecture	13 MFCC → Dense(128) → Dense(64) → Dense(10)
Test Accuracy	98.52%
Validation Accuracy	97.89%
Parameters	10,314
Training Time	3.2 minutes
Inference Time	~1-2ms

Mel CNN (2D)

Architecture	2D CNN → MaxPool → Dense(128) → Dense(10)
Test Accuracy	97.22%
Validation Accuracy	96.45%
Parameters	45,782
Training Time	8.7 minutes
Inference Time	~3-5ms

Raw CNN (1D)

Architecture	1D CNN → Conv1D → GlobalMaxPool → Dense(10)
Test Accuracy	91.30%
Validation Accuracy	89.67%
Parameters	28,954
Training Time	12.1 minutes
Inference Time	~5-8ms

External API

Model	Whisper (Hugging Face)
Parameters	~39M (External)
Language	English Speech Recognition
Connection	HTTPS API Call
Latency	~1-3 seconds
Test Accuracy	Variable (Network dependent)

Raw Spectrogram (Dropped)

Architecture	STFT → 2D CNN → Dense Layers
Status	Not Implemented
Reason	High Dimensionality
Issue	Memory intensive (~64k features)
Alternative	Mel-scale features used instead
Estimated Performance	Similar to Mel CNN but slower

MFCC + SVM (Alternative)

Architecture	13 MFCC → Support Vector Machine
Status	Not Implemented
Estimated Accuracy	85-90%
Advantages	Lightweight, Fast Training
Parameters	~1000 Support Vectors
Inference Time	~0.5ms (Estimated)

Streaming Digit classifier (lightweight)

Audio Input Monitor

Select Processing Pipeline

MFCC + Dense NN

Mel CNN (2D)

Raw CNN (1D)

External API

Model Architecture & Training Metrics

MFCC + Dense NN

Mel CNN (2D)

Raw CNN (1D)

External API

Raw Spectrogram (Dropped)

MFCC + SVM (Alternative)

Performance Monitor

Streaming Digit classifier (lightweight)

Audio Input Monitor

Audio Robustness Settings

Select Processing Pipeline

MFCC + Dense NN

Mel CNN (2D)

Raw CNN (1D)

External API

Model Architecture & Training Metrics

MFCC + Dense NN

Mel CNN (2D)

Raw CNN (1D)

External API

Raw Spectrogram (Dropped)

MFCC + SVM (Alternative)

Performance Monitor