Accelerating Deep Learning for Efficient Inference on Resource-Constrained Devices: Optimization of Quantization and Pruning Techniques

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-1352

Type:

NAISS Small Compute

Principal Investigator:

Obed Mogaka

Affiliation:

Mälardalens universitet

Start Date:

2024-10-21

End Date:

2025-11-01

Primary Classification:

10201: Computer Sciences

Webpage:

Allocation

Alvis at C3SE: 500 GPU-h/month
Mimer at C3SE: 250 GiB

Abstract

Deep learning models, especially those based on large datasets such as ImageNet, require extensive computational resources for training and optimization. In the context of hardware acceleration, FPGAs offer energy-efficient and customizable platforms for deploying these models. However, designing deep learning models for FPGAs using techniques like quantization and pruning introduces additional computational overhead. My PhD research focuses on optimizing deep learning models for FPGA accelerators, using fixed-point quantization, pruning, and model compression techniques. Quantization-Aware Training (QAT), model pruning, and dynamic fixed-point representation require intensive GPU resources due to the size of models and datasets. These optimizations involve extensive retraining and fine-tuning, making access to high-performance compute resources essential.