We want to participate in a shared task investigating literal and metaphorical uses of words in a multimodal context. https://semeval2025-task1.github.io. We wish to investigate how well large multimodal language models perform on this task, in particular we intend to use the LLAMA model. The model will be prompted to perform the desired task which involves ranking pictures according to how well they match the literal or metaphorical use of a phrase such as "bad apple". Using a multimodal LLM for this task seems like a reasonable methodology to see to what degree these models have this ability to understand metaphor innately. We wish to use the resources of SUPR NAISS as we do not possess servers with enough GPUs to run the model with its full parameter set.