Robust prediction of citywide traffic flows at different periods plays a crucial role in intelligent transportation systems. While previous work has made great efforts to model spatio-temporal correlations, existing methods still suffer from two key limitations: i) Most models collectively predict all regions’ flows without accounting for spatial heterogeneity, i.e., different regions may have skewed traffic flow distributions. ii) These models fail to capture the temporal heterogeneity induced by time-varying traffic patterns, as they typically model temporal correlations with a shared parameterized space for all periods. To tackle these challenges, we propose a novel spatio-temporal Self-Supervised Learning traffic prediction framework that enhances the traffic pattern representations to be reflective of both spatial and temporal heterogeneity, with auxiliary self-supervised learning paradigms. The framework includes Graph Convolutional Networks(GCN) and Transformer to capture and enhance traffic-level Augmentation which requires GPU resources.