Binit Bhattarai

ML Engineer, Speech & NLP Lab.

Hello there! I'm Binit Bhattarai, a Machine Learning Engineer passionate about advancing audio and speech technologies.

With expertise gained through extensive research internships at A*STAR Singapore and Samsung Research Institute, I have developed and fine-tuned state-of-the-art models for Automatic Speech Recognition (ASR), singing voice synthesis, audio classification, and speech enhancement. My work has included training models like Wav2Vec2, Whisper, WavLM, AST, and Bark, using large-scale datasets such as the Indonesian corpus, ATC, and Singing corpus for different tasks as well as classification for early MCI detection and audio manipulation via splicing.

In addition to my professional engagements, I have a deep interest in neural networks, particularly in designing creative solutions for real-world challenges. My academic background in Computer Science and Engineering from VIT, supported by a COMPEX Scholarship, has provided me a strong foundation in algorithms, statistics, and machine learning principles.

My research interests include:

ASR for low resource and multilingual language
Speech Enhancement
Audio classification for early MCI and audio manipulation
Generative models for creative AI, including music and voice synthesis

In my free time, I enjoy contributing to open-source projects, exploring advancements in AI, and creating innovative solutions to complex problems.

Find me here.

Selected Publications

Self-Attention Siamese Network for Unsupervised Few-Shot Learning Tasks ISPR 2024
VietSing: A High-quality Vietnamese Singing Voice Corpus APSIPA 2024