The Swemper project aims to develop and deploy advanced AI models for efficient and accurate metadata extraction and comprehensive image description generation. This will significantly enhance our ability to process large visual datasets of historical medical print, improving search functionalities and automated content organization.
Our core objectives are:
- Object Detection for Metadata Extraction: To train robust object detection models to identify and categorize key image elements, automating metadata extraction. This involves leveraging and fine-tuning pre-trained object detection architectures like Detectron2 and Co-DETR.
- Image Description Generation using Vision-Language Models (VLMs): To finetune and infer with VLMs for generating human-like image descriptions. Access to the alvis cluster will allow us in exploring various open source SOTA VLMs.