SignLingo AI

01 · Problem The Challenge

Learning sign language in Indonesia is prohibitively expensive due to the high cost of private tutors. The only free alternatives are passive methods like static videos, where learners receive no feedback on whether they are signing correctly.

Humans are social creatures, and communication is a fundamental right — being disabled should not mean being disconnected from society. However, while modern Computer Vision is accurate, it typically requires high computational power that the average Indonesian, often relying on budget devices, simply does not possess.

02 · Solution The Approach

We built SignLingo to make learning accessible to everyone, everywhere. Since it is web-based, anyone can access it easily. To solve the hardware constraint, I researched and implemented a specialized lightweight Hybrid CNN-LSTM architecture. This ensures high accuracy without the need for expensive GPUs, allowing the AI to act as a personal tutor that gives instant feedback directly in the browser.

03 · Leadership My Role

SignLingo started with a simple goal: making sign language education accessible. As Product Owner, Scrum Master, Lead Developer, and Researcher, I managed a 10-person team across our business, research, and engineering tracks. While I guided our agile sprints and product strategy, my impact was heavily technical. I personally spearheaded the AI development, building the temporal feature extraction models using MediaPipe, LSTM, and GRU architectures. Alongside the machine learning, I engineered the entire backend architecture from the ground up. By bridging deep technical execution with our business roadmap, I ensured our vision translated into a robust, working application.

Me and the team behind SignLingo

04 · Platform The SignLingo Ecosystem

A holistic, two-part platform designed to combine state-of-the-art AI inference with an engaging user experience — real-time detection paired with gamified learning mechanics.

Real-time inference and active predictions

Personalized user dashboard

Real-Time Landmark Detection

Ultra-Low Latency

Inference times of 40–50ms for seamless, real-time sign language translation in the browser.

Robust Tracking

Accurate landmark detection even in low-light conditions and cluttered visual backgrounds.

Spatial-Temporal Logic

Combines spatial point coordinates extracted via MediaPipe with temporal gesture dynamics.

05 · Architecture Key Technical Architecture

To achieve high performance on constrained devices, we designed a lightweight hybrid architecture. Spatial landmarks extracted via MediaPipe are fed into a sequential CNN-LSTM network. For an in-depth dive into the methodology, data processing, and evaluation metrics, you can read the connected research paper.

Hybrid CNN-LSTM Architecture overview

Hybrid Model

MobileNetV2 for efficient spatial feature extraction combined with LSTM for temporal gesture dynamics.

Optimized for Low-Resource

L2 regularization and post-training quantization compressed the model to under 25MB — runs on 4GB RAM devices.

Read the Connected Research Paper

Gamified Learning

To move away from boring, passive learning, we designed SignLingo to be entertaining. We infused interactive "Duolingo-style" mechanics to keep users engaged while our AI acts as a private tutor giving instant feedback on their gestures.

Daily Streaks & EXP

Keep motivation high by tracking continuous learning days and rewarding experience points.

Health System

A forgiving yet challenging life system that requires active performance accuracy to progress.

Leaderboards

Compete with friends and the broader community for the top spot globally.

06 · Results The Outcome

The optimized model proved that inclusive communication technology can be made accessible to low-resource communities — without sacrificing accuracy or responsiveness.

95.3%

Validation Accuracy

BISINDO alphabet

45ms

Avg Inference Time

40–50ms range

<25MB

Model Size

Post-quantization

4GB

Min RAM Target

No GPU required

07 · Roadmap Future Works

The journey with SignLingo doesn't stop here. Currently, I am expanding the system to support Korean Sign Language (KSL) during my exchange program at Sungkyunkwan University (SKKU). For my undergraduate thesis, I am actively upgrading the core model architecture from the current CNN-LSTM approach to a more efficient MediaPipe-GRU hybrid, aiming to further decrease latency while retaining high spatial-temporal accuracy.

Role

Timeline

Team Size

Core Tech