This project aims to develop a machine-learning model that estimates acoustic room impulse responses (RIRs) from recorded speech signals. RIRs capture all linear time-invariant characteristics of an acoustic environment, including properties such as reverberation and reflection patterns. By estimating RIRs, our system can effectively recreate the acoustic “fingerprint” of any environment, enabling more immersive experiences in remote communication and augmented reality (AR). For example, in a telepresence application, accurately reproducing the acoustics of a room can enhance the sense of presence and natural interaction. Similarly, in VR/AR, replicating realistic acoustics contributes to an immersive and lifelike experience. However, estimating RIRs directly from audio is a challenging task due to the complex interaction of sound with different surfaces and objects within a room. To address this, we propose leveraging a neural network architecture capable of extracting and modeling these subtle acoustic features from speech recordings. Access to high-performance computing resources will allow us to train and test our models on larger datasets, refining our approach to achieve high accuracy and generalizability.