Hi, I'm Jonathan Kahana

I am a Computer Science PhD student at the Hebrew University of Jerusalem, under the supervision of Prof. Yedid Hoshen. My research interests lie in the intersection of machine learning and computer vision. Currently, I am working on weight space learning, aiming to understand what information can be extracted from pre-trained neural networks. Previously, I focused on representation learning, specifically learning disentangled representations.


Publications

An image showing an overview of ProbeGen

Deep Linear Probe Generators for Weight Space Learning

Arxiv Preprint
Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen

We conduct a study of weight space analysis methods and observe that probing is a promising approach for such tasks. However, we find that a vanilla probing approach performs no better than probing a neural network with random data. To address this, we propose "Deep Linear Probe Generators" (ProbeGen), a simple and effective modification to probing-based methods of weight space analysis. ProbeGen introduces a shared generator module with a deep linear architecture, providing an inductive bias toward structured probes. ProbeGen significantly outperforms the state-of-the-art and is highly efficient, requiring 30 to 1,000 times fewer FLOPs than other leading approaches.

An image showing an overview of ProbeX

Representing Model Weights with Language using Tree Experts

Arxiv Preprint
Eliahu Horwitz*, Bar Cavia*, Jonathan Kahana*, Yedid Hoshen

We identify a key property of real-world models: most public models belong to a small set of Model Trees, where all models within a tree are fine-tuned from a common ancestor (e.g., a foundation model). Importantly, we find that within each tree there is less nuisance variation between models. We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Notably, ProbeX is the first probing method designed to learn from the weights of just a single model layer. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space. Furthermore, we demonstrate the impressive generalization of our method, achieving zero-shot model classification and retrieval.

Data Size Recovery from Lora Weights

Arxiv Preprint
Mohammad Salama, Jonathan Kahana, Eliahu Horwitz, Yedid Hoshen

We introduce the task of dataset size recovery that aims to determine the number of samples used to train a model based on its weights. We then propose DSiRe, a method for recovering the number of images used to fine-tune a model, in the common case where fine-tuning uses LoRA. We discover that both the norm and the spectrum of the LoRA matrices are closely linked to the fine-tuning dataset size. To evaluate dataset size recovery of LoRA weights, we develop and release a new benchmark, LoRA-WiSE, consisting of over 25000 weight snapshots.

Recovering the Pre-Fine-Tuning Weights of Generative Models

ICML 2024
Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.

Improving Zero-Shot Models with Label Distribution Priors

Arxiv Preprint
Jonathan Kahana, Niv Cohen Yedid Hoshen

We propose a new approach for zero-shot labeling of large image datasets, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Our method is effective and presents a significant improvement over the original model.

An image showing anomalies and psuedo-anomalies from Red PANDA

Red PANDA: Disambiguating Anomaly Detection by Removing Nuisance Factors

ICLR 2023
Niv Cohen, Jonathan Kahana, Yedid Hoshen

We present a new anomaly detection method that allows operators to exclude an attribute from being considered as relevant for anomaly detection. Our approach then learns representations which do not contain information over the nuisance attributes. Anomaly scoring is performed using a density-based approach. Importantly, our approach does not require specifying the attributes that are relevant for detecting anomalies, which is typically impossible in anomaly detection, but only attributes to ignore. An empirical investigation is presented verifying the effectiveness of our approach.

An image illustrating DCoDR representations.

A Contrastive Objective for Learning Disentangled Representations

ECCV 2022
Jonathan Kahana, Yedid Hoshen

We present a new approach for domain-disentanglement, proposing a new domain-wise contrastive objective for ensuring invariant representations. In an extensive evaluation, our method convincingly outperforms the state-of-the-art in terms of representation invariance, representation informativeness, and training speed. Furthermore, we find that in some cases our method can achieve excellent results even without the reconstruction constraint, leading to a much faster and resource efficient training.