SUPR
Efficient storage and stay detection of big human mobility data for urban analytics (ESHUDA)
Dnr:

NAISS 2023/23-267

Type:

NAISS Small Storage

Principal Investigator:

Yuan Liao

Affiliation:

Chalmers tekniska högskola

Start Date:

2023-05-10

End Date:

2024-06-01

Primary Classification:

50701: Human Geography

Webpage:

Allocation

Abstract

Increasingly available big geolocation data with extensive population coverage has offered new opportunities in urban analytics. For example, they reveal the fundamental mechanisms of human mobility, the dynamics of social phenomena, such as socio-spatial segregation accounting for large populations’ daily activities, and fine-grained measures of urban spaces, such as land-use patterns. Big geolocation data on human mobility come from various sources, including call detail records (CDR), smartphone tracking apps, GPS-enabled devices, and geotagged social media. They can cover geolocations of millions of individuals over months and years at the resolution of meters and seconds. For instance, mobile application data is a source of anonymized population mobility data, consisting of GPS records collected through location-enabled applications installed on people’s smartphones. These novel datasets bring new opportunities but impose challenges due to their large volume and varying form and sparsity, which call for efficient storage and robust processing tools. One essential processing of human mobility data is to detect stays from individual trajectories: a stay is a geolocation where an individual spends some time. They are used for further analysis to characterize mobility, inform land-use patterns, etc. There are several tools for detecting stays, e.g., infostop, DBSCAN, and sklearn-mobility. But it requires careful design to scale them up to process large datasets, e.g., billions of trajectories. This project aims to develop transferable and sustainable data storage and preprocessing pipeline for big geolocation data on human mobility, such as the mobile application data for urban analytics. It will also test the impact of different parameters of different stay detection approaches on the identified stays.