Bio
I am a first-year Ph.D. student at Language Technologies Institute, Carnegie Mellon University, advised by Prof. Shinji Watanabe. My research interest mainly focuses on speech and language, and recently I am interested developing spoken language models. Previously, I was a research assistant at Speech Processing Lab, National Taiwan University. I was also a R&D engineer at MediaTek Inc. working on computer vision tasks such as super-solution and frame-rate conversion (MEMC). I designed and trained lightweight networks which can be run on mobile devices in real-time. I received the M.S. degree from National Taiwan University in 2021. During the time, I joined the Speech Processing Laboratory led by Prof. Lin-shan Lee and Prof. Hung-yi Lee.
Publications
Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions
Preprint 2026[paper]DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Preprint 2025[paper]Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
The Thirteenth International Conference on Learning Representations 2025[paper]SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025[paper]A Preliminary Exploration with GPT-4o Voice Mode
Preprint 2025[paper]Fusion Of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
IEEE Spoken Language Technology Workshop 2024[paper]Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2023[paper]Toward Degradation-Robust Voice Conversion
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021How Far Are We from Robust Voice Conversion: A Survey
IEEE Spoken Language Technology Workshop 2021[paper]
Honors
2nd Place, M2VoC Challenge
Advanced Speech Technologies Scholarship
Excellence Achievement in AI CUP Competition
Dean’s List (Twice)
