ARCHIVES
Original Article
Human Action Recognition (HAR) and Speech Recognition (SR) using Data Science
Dr. Sumithra Devi K A1
Swet raj Shrivastava2
Pranav Ranjan3
Romit Dev4
1Dean Academics and Head, Computer Engineering & Engineering in Data Science, Dayananda Sagar Academy of Technology and Management, Bengaluru, Karnataka, India. 234Students, Computer Engineering & Engineering in Data Science, Dayananda Sagar Academy of Technology and Management, Bengaluru, Karnataka, India.
Published Online: May-August 2025
Pages: 206-209
Cite this article
↗ https://www.doi.org/10.59256/indjcst.20250402026References
[1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A LargeScale Hierarchical Image Database," in IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2009.
[2] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, and S. Vijayanarasimhan et al., "The Kinetics Human Action Video Dataset,"
arXiv:1705.06950, 2017.
[3] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A Framework for Self- Supervised Learning of Speech
Representations," in NeurIPS, 2020.
[4] A. Graves, A.-R. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," in IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez et al., "Attention Is All You Need," in NeurIPS, 2017.
[6] A. Karpathy and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image Descriptions," in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2015.
[7] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, and E. Elsen et al., "Deep Speech: Scaling Up End-to-End Speech
Recognition," arXiv:1412.5567, 2014.
on Computer Vision and Pattern Recognition (CVPR), 2009.
[2] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, and S. Vijayanarasimhan et al., "The Kinetics Human Action Video Dataset,"
arXiv:1705.06950, 2017.
[3] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A Framework for Self- Supervised Learning of Speech
Representations," in NeurIPS, 2020.
[4] A. Graves, A.-R. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," in IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez et al., "Attention Is All You Need," in NeurIPS, 2017.
[6] A. Karpathy and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image Descriptions," in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2015.
[7] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, and E. Elsen et al., "Deep Speech: Scaling Up End-to-End Speech
Recognition," arXiv:1412.5567, 2014.
Related Articles
2025
Transforming Cyber-Physical Systems: Machine Learning for Secure and Efficient Solutions
2025
Exploring AI Techniques for Quantum Threat Detection and Prevention
2025
Maturity Models for Business Intelligence: An Overview
2025
INSPIRO: An AI Driven Institution Auditor
2025
Adaptive AI Framework for Anomaly Detection and DDoS Mitigation in Distributed Systems
2025