Biography

I am currently a Research Scientist at Google Deepmind working on audio-visual generation from the Paris office. I graduated from my PhD program at Telecom Paris under the supervision of Slim Essid and Titouan Parcollet in March 2024. Before that, I graduated from Ecole Polytechnique in Applied Mathematics and Computer Science, and I also hold a masters degree from ENS Paris-Saclay (MVA Program) in Machine Learning.

During my PhD, I had the chance to spend two internships abroad. One with Google Research in Zurich, supervised by Zalan Borsos and Félix de Chaumont-Quitry on generative audio technologies. The second in Canada, supervised by Mirco Ravanelli at MILA, Montréal, working on speech self-supervision evaluation and use, within the SpeechBrain Library. I have been also trying to contribute to the SpeechBrain library through my works, if you start in speech, or are fed up with your current deep learning for speech framework, please have a look here [SpeechBrain]

Research

I am interested generally in Machine Learning applied to Language, Speech, and Audio. More precisely, my PhD work was focused on understanding and motivating the choices in Self-supervised learning pipelines for speech. My current work focuses on video-conditioned audio generation.

Recent Works

I have been involved recently in two audio-visual generative works at Google Deepmind:

Veo 2: State-of-the-Art video generation.[Veo 2: Blogpost link]
Video-to-audio generation. [Audio for Video: Blogpost link]

Salah Zaiem

Biography

Research

Recent Works