Feature Mining of Shanghai Metro Commuters Based On K-Prototypes Clustering Method

Authors

  • Xin Yang School of Economics and Management NUST, Nanjing, China
  • Jinbing Ha School of Economics and Management NUST, Nanjing, China
  • Yuting Tian School of Economics and Management NUST, Nanjing, China

DOI:

https://doi.org/10.54097/7haayg94

Keywords:

Travelling features; K-prototypes algorithm; Passenger feature mining; OD chains.

Abstract

Urban rail transit passenger travel behavior exhibits pronounced spatio-temporal heterogeneity, which poses significant challenges for accurately capturing passenger flow dynamics and enabling precise short-term travel forecasting. Unlike most existing studies that focus on station-level predictions, this research innovatively explores the spatio-temporal characteristics of travel behavior from the perspective of individual passengers. Leveraging one-card transaction data from Shanghai’s rail transit system between April 1 and 30, 2015, we extract individual travel trajectory data in the form of OD chains through an OD relationship matching algorithm. A 13-dimensional clustering feature is constructed using a three-dimensional framework encompassing temporal, spatial, and travel intensity attributes. Notably, we introduce frequent travel OD share data as a critical metric to quantify the regularity of passenger travel patterns. To address the complexity of mixed-type data (including categorical and numerical variables), the K-Prototypes clustering algorithm is employed, demonstrating superior performance in handling heterogeneous datasets compared to traditional methods. The clustering results categorize passengers into four distinct groups, with the commuter category—accounting for 40% of the total—exhibiting the strongest spatial regularity. Further analysis of travel patterns across categories provides empirical evidence for identifying the primary sources of metro passenger flow, offering actionable insights for optimizing urban transit planning and demand management.

Downloads

Download data is not yet available.

References

[1] Wei, Q., Qiu, Y., Wen, Y. Cluster-based spatiotemporal dual self-adaptivenetwork for short-term subway passenger flow forecasting. Appl. Intell., 2022, 52: 14137–14152.

[2] Liu, L. J., Wu, M. X., Chen, R. C., Zhu, S. Z., Wang, Y. A hybrid deep learningmodel for multi-station classification and passenger flow prediction. Appl. Sci.-Basel, 2023, 13(5): 2899.

[3] Wang, L., Chen, Y., Wang, Y., Sun, X., Wu, Y., Peng, F., Song, G. Identificationand classification of bus and subway passenger travel patterns in beijing usingtransit smart card data. J. Adv. Transp., 2023: 6529819.

[4] Li, P., Wu, W., Pei, X. A separate modelling approach for short-term bus passen-ger flow prediction based on behavioural patterns: A hybrid decision tree method.Physica A, 2023, 616: 128567.

[5] Xu, H., Duan, F., Pu, P. Dynamic bicycle scheduling problem based on short-term demand prediction. Appl. Intell., 2019, 49: 1968–1981.

[6] Yue, Y. F., Chen, J., Feng, T., Wang, W., Wang, C. Y., Ma, X. W. New classifica-tion scheme and evolution characteristics analysis of high-speed railway stationsusing large-scale mobile phone data: A case study in jiangsu, china. J. Transp.Eng. Part A-Syst., 2023, 149(11): 04023108.

[7] Guo, Y. L., Zhu, Z. J., Jiang, X. H., Chen, T., Li, Q. Analyzing the impactsof land use and network features on passenger flow distribution at urban railstations from a classification perspective. Sustainability, 2024, 16(9): 3568.

[8] Lin, M., Huang, Z., Zhao, T., Zhang, Y., Wei, H. Spatiotemporal evolution oftravel pattern using smart card data. Sustainability, 2022, 14(15): 9564.

[9] Huang, Z. C., Zheng, H., Yang, K. Multitype origin-destination (od) passengerflow prediction for urban rail transit: A deep learning clustering first predictingsecond integrated framework. J. Adv. Transp., 2024: 6629500.

[10] Szepannek, G., Aschenbruck, R., Wilhelm, A. Clustering large mixed-type data with ordinal variables. Adv. Data Anal. Classif., 2024.

[11] Gao, Y., Hu, Y., Chu, Y. Elderly individuals with similar abilities are likely tohave similar care needs. Math. Probl. Eng., 2023: 7114343.

[12] Hern´andez, H., Alberdi, E., Goti, A., Oyarbide-Zubillaga, A. Application of thek-prototype clustering approach for the definition of geostatistical estimationdomains. Mathematics, 2023, 11(3): 740.

[13] Kuo, R. J., Wu, C. Y., Kuo, T. An ensemble method with a hybrid of geneticalgorithm and k-prototypes algorithm for mixed data classification. Comput. Ind.Eng., 2024, 190: 110066.

[14] Shpigelman, E., Hochstadt, A., Coster, D., et al. Clustering of clinical andechocardiographic phenotypes of covid-19 patients. Sci. Rep., 2023, 13: 8832.

[15] Zhao, J. J., Qu, Q., Zhang, F., Xu, C. Z., Liu, S. Y. Spatio-temporal analysis ofpassenger travel patterns in massive smart card data. IEEE Trans. Intell. Transp.Syst., 2017, 18(11): 3135–3146.

[16] Li, Y. C., Zhang, T., Lv, X. F., Lu, Y. X., Wang, W. S. Profiling public transitpassenger mobility using adversarial learning. ISPRS Int. J. Geo-Inf., 2023, 12(8): 338.

Downloads

Published

17-11-2025

How to Cite

Yang, X., Ha, J., & Tian, Y. (2025). Feature Mining of Shanghai Metro Commuters Based On K-Prototypes Clustering Method. Highlights in Science, Engineering and Technology, 158, 83-95. https://doi.org/10.54097/7haayg94