Indonesian Language Sign Detection using Mediapipe with Long Short-Term Memory (LSTM) Algorithm
Main Article Content
Abstract
People with disabilities mostly communicate using sign language, but the public still has little understanding of the Indonesian Sign Language System (ISLS). This causes obstacles in daily interactions. Advances in artificial intelligence technology, especially artificial neural networks, open opportunities in sign language recognition, but are still in the development stage. This study aims to build a ISLS sign language recognition model using the LSTM approach and MediaPipe Hands. The method of collecting hand keypoint data, 25 sequences per gesture, and 36 alphabetic and numeric gestures. The dataset is divided into three categories, namely 80% training, 10% validation, and 10% testing. The model developed to handle sequential data from hand gestures using the LSTM architecture. The results of the study can be shown model accuracy of 97.1%, average macro precision of 97%, recall of 96.6%, and F1-score of 96.4% and weighted average precision of 97.4%, recall of 97.1%, and F1-score of 97%. The results show that the combination of LSTM and MediaPipe can detect ISLS gestures with high accuracy. This can be used as a potential solution in automatic sign language translation, so that this model can improve the inclusiveness of communication for people with disabilities. Further research can be developed using a more accurate hand recognition framework, as well as improving data pre-processing, and exploring deep learning (DL) methods such as SSD, YOLO, or Faster-RCNN. In addition, pose and facial recognition can be added to improve accuracy in gesture recognition more comprehensively.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to
- Share — copy and redistribute the material in any medium or format under the following conditions:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
- NonCommercial — You may not use the material for commercial purposes;
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
References
A.R. Syulistyo, D.S. Hormansyah, and P.Y. Saputra, “SIBI (Sistem Isyarat Bahasa Indonesia) translation using Convolutional Neural Network (CNN),” IOP Conf. Ser. Mater. Sci. Eng., vol. 732, no. 1, 2020, doi: 10.1088/1757-899X/732/1/012082.
A. Aljabar and Suharjito, “BISINDO (Bahasa isyarat indonesia) sign language recognition using CNN and LSTM,” Adv. Sci. Technol. Eng. Syst., vol. 5, no. 5, pp. 282–287, 2020, doi: 10.25046/AJ050535.
M. Al-Qurishi, T. Khalid, and R. Souissi, “Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues,” IEEE Access, vol. 9, pp. 126917–126951, 2021, doi: 10.1109/ACCESS.2021.3110912.
A. M. Buttar, U. Ahmad, A. H. Gumaei, A. Assiri, M.A. Akbar, and B.F. Alkhamees, “Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs,” Mathematics, vol. 11, no. 17, pp. 1–20, 2023, doi: 10.3390/math11173729.
Z. Zhou, V.W.L. Tam, and E.Y. Lam, “SignBERT: A BERT-Based Deep Learning Framework for Continuous Sign Language Recognition,” IEEE Access, vol. 9, pp. 161669–161682, 2021, doi: 10.1109/ACCESS.2021.3132668.
Z. Zhang, Y. Tang, S. Zhao, and X. Zhang, “Real-time surface EMG pattern recognition for hand gestures based on support vector machine,” IEEE Int. Conf. Robot. Biomimetics, ROBIO 2019, pp. 1258–1262, 2019, doi: 10.1109/ROBIO49542.2019.8961436.
M. Haroon, S. Altaf, S. Ahmad, M. Zaindin, S. Huda, and S. Iqbal, “Hand Gesture Recognition with Symmetric Pattern under Diverse Illuminated Conditions Using Artificial Neural Network,” Symmetry (Basel)., vol. 14, no. 10, 2022, doi: 10.3390/sym14102045.
Y. Wang, Y. Zhao, and S. Addepalli, “Practical Options for Adopting Recurrent Neural Network and Its Variants on Remaining Useful Life Prediction,” Chinese J. Mech. Eng. (English Ed., vol. 34, no. 1, 2021, doi: 10.1186/s10033-021-00588-x.
M. Zulqarnain, R. Ghazali, M. G. Ghouse, Y.M.M. Hassim, and I. Javid, “Predicting Financial Prices of Stock Market using Recurrent Convolutional Neural Networks,” Int. J. Intell. Syst. Appl., vol. 12, no. 6, pp. 21–32, 2020, doi: 10.5815/ijisa.2020.06.02.
J. Liu and X. Gong, “Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–11, 2019, doi: 10.1186/s12859-019-3199-1.
Q. Ma, Z. Lin, E. Chen, and G.W. Cottrell, “Temporal pyramid recurrent neural network,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., pp. 5061–5068, 2020, doi: 10.1609/aaai.v34i04.5947.
C. Avci, B. Tekinerdogan, and C. Catal, “Analyzing the performance of long short-term memory architectures for malware detection models,” Concurr. Comput. Pract. Exp., vol. 35, no. 6, p. 1, 2023, doi: 10.1002/cpe.7581.
H.M. Radha, A.K.A. Hassan, and A.H. Al-Timemy, “Enhanced Prosthesis Control Through Improved Shoulder Girdle Motion Recognition Using Time-Dependent Power Spectrum Descriptors and Long Short-Term Memory,” Math. Model. Eng. Probl., vol. 10, no. 3, pp. 861–870, 2023, doi: 10.18280/mmep.100316.
M.H. Ismail, S.A. Dawwd, and F.H. Ali, “Dynamic hand gesture recognition of Arabic sign language by using deep convolutional neural networks,” Indones. J. Electr. Eng. Comput. Sci., vol. 25, no. 2, pp. 952–962, 2022, doi: 10.11591/ijeecs.v25.i2.pp952-962.
J.R. Gonzalez-Rodriguez, D.M. Cordova-Esparza, J.Terven, and J.A. Romero-Gonzalez, “Towards a Bidirectional Mexican Sign Language–Spanish Translation System: A Deep Learning Approach,” Technologies, vol. 12, no. 1, pp. 1–16, 2024, doi: 10.3390/technologies12010007.
M. De Coster, E. Rushe, R. Holmes, A. Ventresque, and J. Dambre, “Towards the extraction of robust sign embeddings for low resource sign language recognition,” 2023. [Online]. Available: http://arxiv.org/abs/2306.17558.
G.H. Samaan et al., “MediaPipe’s Landmarks with RNN for Dynamic Sign Language Recognition,” Electron., vol. 11, no. 19, pp. 1–15, 2022, doi: 10.3390/electronics11193228.
J. Shin, A.S.M. Miah, K. Suzuki, K. Hirooka, and M.A.M. Hasan, “Dynamic Korean Sign Language Recognition Using Pose Estimation Based and Attention-Based Neural Network,” IEEE Access, vol. 11, no. December, pp. 143501–143513, 2023, doi: 10.1109/ACCESS.2023.3343404.
D.K. Singh, “3D-CNN based Dynamic Gesture Recognition for Indian Sign Language Modeling,” Procedia CIRP, vol. 189, pp. 76–83, 2021, doi: 10.1016/j.procs.2021.05.071.
M. Kakizaki, A.S.M. Miah, K. Hirooka, and J. Shin, “Dynamic Japanese Sign Language Recognition Throw Hand Pose Estimation Using Effective Feature Extraction and Classification Approach,” Sensors, vol. 24, no. 3, 2024, doi: 10.3390/s24030826.
X. Dang, W. Li, J. Zou, B. Cong, and Y. Guan, “Assessing the impact of body location on the accuracy of detecting daily activities with accelerometer data,” iScience, vol. 27, no. 2, pp. 108626, 2024, doi: 10.1016/j.isci.2023.108626.
M.A. Khatun et al., “Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor,” IEEE J. Transl. Eng. Heal. Med., vol. 10, no. May, pp. 1–16, 2022, doi: 10.1109/JTEHM.2022.3177710.
H. Heydarian, P.V. Rouast, M.T.P. Adam, T. Burrows, C.E. Collins, and M.E. Rollo, “Deep learning for intake gesture detection from wrist-worn inertial sensors: The effects of data preprocessing, sensor modalities, and sensor positions,” IEEE Access, vol. 8, pp. 164936–164949, 2020, doi: 10.1109/ACCESS.2020.3022042.
J. Homepage, I. Gusti, A. Oka Aryananda, and F. Samopa, “Comparison of the Accuracy of The Bahasa Isyarat Indonesia (BISINDO) Detection System Using CNN and RNN Algorithm for Implementation on Android,” MALCOM: Indonesian Journal of Machine Learning and Computer Science 4, no. 3, pp. 1111-1119, 2024, doi: 10.57152/malcom.v4i3.1465.
I.D.M.B. Atmaja Darmawan, Linawati, G. Sukadarmika, N.M.A.E.D. Wirastuti, and R. Pulungan, “Temporal Action Segmentation in Sign Language System for Bahasa Indonesia (SIBI) Videos Using Optical Flow-Based Approach,” J. Ilmu Komput. dan Inf., vol. 17, no. 2, pp. 195–202, 2024, doi: 10.21609/jiki.v17i2.1284.
F. Wijaya, L. Dahendra, E.S. Purwanto, and M.K. Ario, “Quantitative analysis of sign language translation using artificial neural network model,” Procedia Comput. Sci., vol. 245, no. C, pp. 998–1009, 2024, doi: 10.1016/j.procs.2024.10.328.
A.S.M. Miah, M.A.M. Hasan, S. Nishimura, and J. Shin, “Sign Language Recognition Using Graph and General Deep Neural Network Based on Large Scale Dataset,” IEEE Access, vol. 12, no. February, pp. 34553–34569, 2024, doi: 10.1109/ACCESS.2024.3372425.
D. Kumari, and R.S. Anand, “AC2_Fiestas_Axel,” Electron. 2024, Vol. 13, Page 1229, vol. 13, no. 7, pp. 1229, 2024.
A.S.M. Miah, M.A.M. Hasan, Y. Tomioka, and J. Shin, “Hand Gesture Recognition for Multi-Culture Sign Language Using Graph and General Deep Learning Network,” IEEE Open J. Comput. Soc., vol. 5, no. February, pp. 144–155, 2024, doi: 10.1109/OJCS.2024.3370971.
S. Arooj, S. Altaf, S. Ahmad, H. Mahmoud, and A.S.N. Mohamed, “Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 2, pp. 101934, 2024, doi: 10.1016/j.jksuci.2024.101934.
N. Sign, and L. Recognition, “Norwegian Sign Language Recognition,” 2023.
J.P. Sahoo, A.J. Prakash, P. Plawiak, and S. Samantray, “Real-time hand gesture recognition using fine-tuned convolutional neural network, ” Sensors, 22(3), 706, 2022, doi: 10.3390/s22030706.
M.A. Rahim, J. Shin, and K.S. Yun, “Hand gesture-based sign alphabet recognition and sentence interpretation using a convolutional neural network,” Ann. Emerg. Technol. Comput., vol. 4, no. 4, pp. 20–27, 2020, doi: 10.33166/AETiC.2020.04.003.
W. Jintanachaiwat et al., “Using LSTM to translate Thai sign language to text in real time,” Discov. Artif. Intell., vol. 4, no. 1, 2024, doi: 10.1007/s44163-024-00113-8.
Q.M. Areeb, M. Nadeem, R. Alroobaea, and F. Anwer, “Helping Hearing-Impaired in Emergency Situations : A Deep Learning-Based Approach,” IEEE Access, vol. 10, 2022, doi: 10.1109/ACCESS.2022.3142918.
D. Das Chakladar, P. Kumar, S. Mandal, P.P. Roy, M. Iwamura, and B.G. Kim, “3D Avatar Approach for Continuous Sign Movement Using Speech/Text,” Appl. Sci., vol. 11, no. 8, 2021, doi: 10.3390/app11083439.
I. Jayaweerage et al., “Motion Capturing in cricket with bare minimum hardware and optimised software: A comparison of MediaPipe and OpenPose,” 2024 1st Int. Conf. Software, Syst. Inf. Technol. SSITCON 2024, December, 2024, doi: 10.1109/SSITCON62437.2024.10796169.
M. Latyshev, G. Lopatenko, V. Shandryhos, O. Yarmoliuk, M. Pryimak, and I. Kvasnytsia, “Computer Vision Technologies for Human Pose Estimation in Exercise: Accuracy and Practicality,” Soc. Integr. Educ. Proc. Int. Sci. Conf., vol. 2, pp. 626–636, 2024, doi: 10.17770/sie2024vol2.7842.
N.H.M. Dhuzuki et al., “Design and Implementation of a Deep Learning Based Hand Gesture Recognition System for Rehabilitation Internet-of-Things (Riot) Environments Using Mediapipe,” IIUM Eng. J., vol. 26, no. 1, pp. 353–372, 2025, doi: 10.31436/IIUMEJ.V26I1.3455.
J.M. Joshi, and D.U. Patel, “Dynamic Indian Sign Language Recognition Based on Enhanced LSTM with Custom Attention Mechanism,” SSRG Int. J. Electron. Commun. Eng., vol. 11, no. 2, pp. 60–68, 2024, doi: 10.14445/23488549/IJECE-V11I2P107.
A. Sugiharto et al., “Comparison of SVM, Random Forest and KNN Classification By Using HOG on Traffic Sign Detection,” 2022 6th Int. Conf. Informatics Comput. Sci., pp. 60–65, 2022, doi: 10.1109/ICICoS56336.2022.9930588.
A.G. Mahmoud, A.M. Hasan, and N.M. Hassan, “Convolutional neural networks framework for human hand gesture recognition,” Bull. Electr. Eng. Informatics, vol. 10, no. 4, pp. 2223–2230, 2021, doi: 10.11591/EEI.V10I4.2926.
I.D. Mienye, T.G. Swart, and G. Obaido, “Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications,” Information, vol. 15, no. 9, pp. 517, 2024, doi: 10.3390/info15090517.
M.R. Abdurrahman, H. Al-aziz, F.A. Zayn, and M. Agus, “Development of Robot Feature for Stunting Analysis Using Long-Short Term Memory (LSTM) Algorithm,” J. Informatics Web Eng., 3(3), pp. 164–175, doi: 10.33093/jiwe.2024.3.3.10.
S.K. Paul et al., “An Adam based CNN and LSTM approach for sign language recognition in real time for deaf people,” Bull. Electr. Eng. Informatics, vol. 13, no. 1, pp. 499–509, 2024, doi: 10.11591/eei.v13i1.6059.
A. Khan, K. Khan, W. Khan, S.N. Khan, and R. Haq, “Knowledge-based Word Tokenization System for Urdu,” J. Informatics Web Eng., vol. 3, no. 2, pp. 86–97, 2024, doi: 10.33093/jiwe.2024.3.2.6.