Fine-grained urban land use simulation : integrating spatial dynamic modeling with a pre-trained vision-language model

Cai, Zipan; Karvonen, Andrew; Cong, Cong; Huang, Weiming

Fine-grained urban land use simulation : integrating spatial dynamic modeling with a pre-trained vision-language model

Mark

Cai, Zipan ; Karvonen, Andrew ^LU ; Cong, Cong and Huang, Weiming (2026) In Computers, Environment and Urban Systems 126.

Abstract: Accurate prediction of urban land use changes at fine spatial scales is essential for developing healthy and sustainable cities, yet traditional simulation models struggle to capture local dynamics due to limited availability of fine-grained data and insufficient complexity in modeling urban systems. To address these limitations, we propose a novel approach that leverages advances in pre-trained vision-language foundation models combined with spatial dynamic modeling to forecast detailed urban land use patterns. Specifically, we collected a spatially dense collection of street view images (SVIs) throughout Shenzhen, China, and applied UrbanCLIP, a specialized vision-language prompting framework, to perform zero-shot inference of urban land... (More); Accurate prediction of urban land use changes at fine spatial scales is essential for developing healthy and sustainable cities, yet traditional simulation models struggle to capture local dynamics due to limited availability of fine-grained data and insufficient complexity in modeling urban systems. To address these limitations, we propose a novel approach that leverages advances in pre-trained vision-language foundation models combined with spatial dynamic modeling to forecast detailed urban land use patterns. Specifically, we collected a spatially dense collection of street view images (SVIs) throughout Shenzhen, China, and applied UrbanCLIP, a specialized vision-language prompting framework, to perform zero-shot inference of urban land use directly from images without labeled datasets and model retraining. The resulting fine-grained classifications delineate eight distinct urban land use types, producing a detailed urban functional map. These high-resolution patterns were then integrated into a spatial dynamic model enhanced by polynomial regression to simulate urban evolution toward 2035. This approach effectively captures neighborhood influences, socioeconomic drivers, and urban planning policies. Our simulation provides actionable insights for sustainable development in Shenzhen by identifying areas for balanced growth, targeted infrastructure investments, and ecological preservation. Compared to conventional methods, our methodology significantly improves predictive accuracy and spatial granularity. By incorporating foundation models, our approach addresses traditional data constraints, offering scalable and robust tools for informed urban governance and decision-making. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/0009a7c5-3dc1-462d-b55c-321d424a0466

author

Cai, Zipan ; Karvonen, Andrew ^LU ; Cong, Cong and Huang, Weiming

organization

publishing date

2026-02-26

type

Contribution to journal

publication status

published

subject

keywords

Land use change, Vision-language models, Foundation models, Spatial dynamic modeling, Street view images

in

Computers, Environment and Urban Systems

volume

126

article number

102416

pages

16 pages

publisher

Elsevier

external identifiers

scopus:105030933534

ISSN

0198-9715

DOI

10.1016/j.compenvurbsys.2026.102416

project

Urban Arena

language

English

LU publication?

yes

id

0009a7c5-3dc1-462d-b55c-321d424a0466

date added to LUP

2026-02-26 18:24:52

date last changed

2026-04-21 14:29:29

@article{0009a7c5-3dc1-462d-b55c-321d424a0466,
  abstract     = {{Accurate prediction of urban land use changes at fine spatial scales is essential for developing healthy and sustainable cities, yet traditional simulation models struggle to capture local dynamics due to limited availability of fine-grained data and insufficient complexity in modeling urban systems. To address these limitations, we propose a novel approach that leverages advances in pre-trained vision-language foundation models combined with spatial dynamic modeling to forecast detailed urban land use patterns. Specifically, we collected a spatially dense collection of street view images (SVIs) throughout Shenzhen, China, and applied UrbanCLIP, a specialized vision-language prompting framework, to perform zero-shot inference of urban land use directly from images without labeled datasets and model retraining. The resulting fine-grained classifications delineate eight distinct urban land use types, producing a detailed urban functional map. These high-resolution patterns were then integrated into a spatial dynamic model enhanced by polynomial regression to simulate urban evolution toward 2035. This approach effectively captures neighborhood influences, socioeconomic drivers, and urban planning policies. Our simulation provides actionable insights for sustainable development in Shenzhen by identifying areas for balanced growth, targeted infrastructure investments, and ecological preservation. Compared to conventional methods, our methodology significantly improves predictive accuracy and spatial granularity. By incorporating foundation models, our approach addresses traditional data constraints, offering scalable and robust tools for informed urban governance and decision-making.}},
  author       = {{Cai, Zipan and Karvonen, Andrew and Cong, Cong and Huang, Weiming}},
  issn         = {{0198-9715}},
  keywords     = {{Land use change; Vision-language models; Foundation models; Spatial dynamic modeling; Street view images}},
  language     = {{eng}},
  month        = {{02}},
  publisher    = {{Elsevier}},
  series       = {{Computers, Environment and Urban Systems}},
  title        = {{Fine-grained urban land use simulation : integrating spatial dynamic modeling with a pre-trained vision-language model}},
  url          = {{http://dx.doi.org/10.1016/j.compenvurbsys.2026.102416}},
  doi          = {{10.1016/j.compenvurbsys.2026.102416}},
  volume       = {{126}},
  year         = {{2026}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Fine-grained urban land use simulation : integrating spatial dynamic modeling with a pre-trained vision-language model