Road traffic management system is an essential component of modern Intelligent Transport Systems (ITS). The arterial roads near large Swedish cities, especially motorways near Stockholm, are conventionally equipped with different types of sensors for traffic monitoring and advanced traffic management purposes. Compared to urban road networks with dense connectivity, the highway’s serial topology hinders the prediction through spatial interaction, which motivates us to shift from graph to a paradigm of image-based approach. In this study, we treat motorway traffic patterns as images with axes of location and time (so called ST-image). We subsequently convert the traffic forecasting task to the conditional image generation task. Nevertheless, we point out the inherent property of ST-image from the perspective of physical meaning and traffic dynamics, and propose an innovative architecture to process the ST-image. Furthermore, we present generative deep learning models including a diffusion model entitled Difforecast to obtain traffic forecasts by generating the future ST-image based on the historical ST-image. In addition, we resort to enhance the deep learning models by representation learning, which may also provide some insights on the patterns of motorway traffic flows. The project is funded by Swedish Transport Administration (STA) and works with a huge amount of traffic data owned by STA. The algorithm developed may directly contribute to the next-generation traffic information forecasting algorithms for Swedish Motorways and major arterials in large cities.