The interaction between WSI and molecular data generated from multi-omics data is not clear. Here we tried to present multi-modal co-attention transformer framework using omics data and WSI from longitudinal biopsies. The metrics generated from the multi-modal would be used to predict the response to treatment and survival.