M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image RetrievalLayne Berry,
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath
International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2023 SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelYi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath
IEEE Spoken Language Technology Workshop (SLT) 2022 Theme Transformer: Symbolic Music Generation with Theme-Conditioned TransformerYi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Müller, Yi-Hsuan Yang
IEEE Transactions on Multimedia (TMM) 2022