Deep learning models, especially those based on large datasets such as ImageNet, require extensive computational resources for training and optimization. In the context of hardware acceleration, FPGAs offer energy-efficient and customizable platforms for deploying these models. However, designing deep learning models for FPGAs using techniques like quantization and pruning introduces additional computational overhead.
My PhD research focuses on optimizing deep learning models for FPGA accelerators, using fixed-point quantization, pruning, and model compression techniques. Quantization-Aware Training (QAT), model pruning, and dynamic fixed-point representation require intensive GPU resources due to the size of models and datasets. These optimizations involve extensive retraining and fine-tuning, making access to high-performance compute resources essential.