AI-Driven Real Time
Sign Language Translate App

HandsTalk

HandsTalk is a mobile app that uses advanced computer vision technology and Generative AI (Large Language Model) to provide real-time translation from sign language to text. It utilizes built-in cameras without the need for specialized hardware, making it accessible to a wider audience.

image

Current Progress

We are collaborating with HKSTP and HKUST as a startup (InnovateAI Ltd.). HandsTalk began as a university student project focused on American Sign Language translation.

We have now paused work on English Sign Language translation to concentrate on Cantonese Sign Language translation, especially for medical applications.

Our aim is to create a real-world solution. In the future, we plan to explore other themes and aspects of Cantonese Sign Language translation.

System Architecture

HandsTalk is composed of four main components that work together to facilitate effective communication:

  • Seamless User Interface
  • Real-time Body Posture Detection
  • Sign Language Deep Learning
  • AI-Driven Sentence Completion
Image
Image

Sign Language Translation (Sentences)

HandsTalk not only translates the words of sign language, but also uses Generative AI (Large Language Model) to convert identified words into coherent sentences for precise translation. It effectively handles the grammar and structure of sign language.

Two-way Communication

HandsTalk aims to enable both signers and non-signers to communicate effectively, allowing each to use their own methods of communication. Non-signers can communicate without needing to learn sign language.

Image
Image

Translation AI Model

We used a dataset consisting of 100 words and phrases, including 26 alphabets, for training. Our trained model boasts an impressive accuracy of 91.8%, meaning it correctly predicts most of the signs.

In real-world scenarios, the model also demonstrates proficiency in recognizing a significant portion of the trained words, phrases, and alphabets.

“Select” by users

When the app identifies words or phrases from the users, the results appear on the screen, and the upper background turns yellow, indicating that this translated result is “Ready to Select.”

If both of the user’s hands are out of the screen, the “select” action is triggered to ensure the translation is correct.

Image

Real-Time AI Translation

HandsTalk employs advanced computer vision technology to offer precise real-time translation from sign language to text. Generative AI (Large Language Model) is utilized to refine the detected sign language into coherent sentences.

Seamless Platform

We provide a seamless platform for sign language learners to improve their skills and for sign language users to communicate, with a user-friendly interface.

Data-driven Insights

Train the sign language translation model by using deep learning.

Gratitude to WLASL for Their Invaluable Resource

Our prototype for American Sign Language Translation was developed using the WLASL dataset, the largest video dataset for Word-Level American Sign Language (ASL) recognition, as part of our dataset for model training. We would like to express our gratitude to the creators of WLASL for making this resource available.