要旨
Date and Time: May 21, 2025: 10:00 – 11:00 (JST)
Venue: Online and Meeting Room5 at Nihonbashi
*Open Space is available to AIP researchers only
Title: Towards Efficient Prune-Retrain Pipelines: Sparse Model Soups and Parameter-Efficient Retraining after Pruning
Speaker: Max Zimmer (Technische Universität Berlin)
Abstract: Large neural networks, such as LLMs, come with substantial computational and memory costs due to their massive parameter counts. A common strategy to mitigate these challenges is pruning—removing parameters from pretrained networks to reduce their size—which typically requires retraining to recover pruning-induced performance degradation. My recent research focuses on improving the efficiency of this process through two complementary contributions. First, in “Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging”, we show that sparse subnetworks, obtained by varying hyperparameters along prune-retrain trajectories, can be averaged into a single model without compromising sparsity, leading to improved generalization and robustness. Second, in “PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs”, we demonstrate that retraining only a tiny but highly expressive subset of parameters—often less than 1% of the entire model—can be sufficient to recover or even surpass the performance of full retraining, drastically reducing the compute and memory burden. Together, these methods contribute toward making large-scale neural networks more resource-efficient and sustainable without sacrificing accuracy.
Bio: I am a fourth-year PhD candidate in Mathematics at Technische Universität Berlin, supervised by Prof. Dr. Sebastian Pokutta. Since Summer 2024, I am the Research Area Lead of iol.LEARN, the Machine Learning subgroup of Sebastian Pokutta’s IOL Lab at the Zuse Institute Berlin (ZIB). My research primarily focuses on enhancing the computational and memory efficiency of large neural networks, including LLMs, through techniques such as sparsity, pruning, and quantization. Beyond improving the sustainability of deep learning, I leverage large models to tackle environmental challenges, including the development of global canopy height maps for monitoring deforestation and forest degradation. I am further interested in using deep learning for scientific discovery, particularly in solving pure mathematical problems. Since 2022, I have been a member of the Berlin Mathematical School (BMS) graduate school, part of the MATH+ Cluster of Excellence. You can find my CV and publication list on my website at maxzimmer.org.
詳細情報
日時 | 2025/05/21(水) 10:00 - 11:00 |
URL | https://c5dc59ed978213830355fc8978.doorkeeper.jp/events/184306 |