Hey, I'm Gallil Maimon

I am a Computer Science PhD student at SLP lab at the Hebrew University of Jerusalem, under the supervision of Dr. Yossi Adi. I have a broad research in Lnaguage Modelling and learning. The focus of my current research is Speech Language Modelling - from evaluation, to representation and modelling. I'm also a self-proclaimed beer-geek and homebrewer.


Publications

An image illustrating the SALMon benchmark.

A Suite for Acoustic Language Model Evaluation

ICASSP 2025
Gallil Maimon*, Amit Roth* Yossi Adi

Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap, we introduce SALMonšŸ£, a novel evaluation suite encompassing background noise, emotion, speaker identity and room impulse response. The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones. This approach makes the benchmark fast to compute even for large models. We evaluated several speech language models on SALMonšŸ£, thus highlighting the strengths and weaknesses of each evaluated method. Code and data are publicly available.

An image illustrating speaking style conversion vs. traditional VC.

Speaking Style Conversion With Discrete Self-Supervised Units

EMNLP 2023 (Findings)
Gallil Maimon, Yossi Adi

Voice Conversion (VC) is the task of making a spoken utterance by one speaker sound as if uttered by a different speaker, while keeping other aspects like content unchanged. Current VC methods, focus primarily on spectral features like timbre, while ignoring the unique speaking style of people which often impacts prosody. In this study, we introduce a method for converting not only the timbre, but also prosodic information (i.e., rhythm and pitch changes) to those of the target speaker. The proposed approach is based on a pretrained, self-supervised, model for encoding speech to discrete units, which make it simple, effective, and easy to optimise. We consider the many-to-many setting with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate the proposed approach is significantly superior to the evaluated baselines. Code and samples can be found under the project page.

An image illustrating the sillent_killer attack.

Silent Killer: A Stealthy, Clean-Label, Black-Box Backdoor Attack

Arxiv preprint
Tzvi Lederer, Gallil Maimon, Lior Rokach

Backdoor poisoning attacks pose a well-known risk to neural networks. However, most studies have focused on lenient threat models. We introduce Silent Killer, a novel attack that operates in clean-label, black-box settings, uses a stealthy poison and trigger and outperforms existing methods. We investigate the use of universal adversarial perturbations as triggers in clean-label attacks, following the success of such approaches under poison-label settings. We analyze the success of a naive adaptation and find that gradient alignment for crafting the poison is required to ensure high success rates. We conduct thorough experiments on MNIST, CIFAR10, and a reduced version of ImageNet and achieve state-of-the-art results.

An image illustrating Universal Adversarial Policies.

A Universal Adversarial Policy for Text Classifiers

Neural Networks 2022
Gallil Maimon, Lior Rokach

We introduce and formally define a new adversarial setup against text classifiers named universal adversarial policies. Under this setup one learns a single perturbation policy which given a text and a classifier selects the optimal pertubations (which words to replace) in order to reach an adversarial text. It is universal in the sense that one policy must generalise to many unseen texts. We introcde LUNATC which learns such a policy with reinforcement learning and succesfully generalises to unseen texts from as little as 500 texts.