CN109508815B - General activity spatial measure analysis method based on subway IC card data - Google Patents

General activity spatial measure analysis method based on subway IC card data Download PDF

Info

Publication number
CN109508815B
CN109508815B CN201811224346.4A CN201811224346A CN109508815B CN 109508815 B CN109508815 B CN 109508815B CN 201811224346 A CN201811224346 A CN 201811224346A CN 109508815 B CN109508815 B CN 109508815B
Authority
CN
China
Prior art keywords
general
data
activity space
subway
station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811224346.4A
Other languages
Chinese (zh)
Other versions
CN109508815A (en
Inventor
季彦婕
余佳洁
刘阳
高良鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201811224346.4A priority Critical patent/CN109508815B/en
Publication of CN109508815A publication Critical patent/CN109508815A/en
Application granted granted Critical
Publication of CN109508815B publication Critical patent/CN109508815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a general activity spatial measure analysis method based on subway IC card data, and belongs to the field of traffic data mining. The method utilizes subway IC card swiping data to conduct measure analysis on the activity space of the general population, extracts corresponding measure indexes and obtains quantitative analysis of the activity space of the general population. The method provided by the invention is used for mining the corresponding general trip activity space measure indexes from the IC card data for the first time, and classifying general population based on general characteristics, wherein the general characteristics of the classified population are obvious. The method gives play to the objectivity and advantages of the IC card data and provides conditions for research on formation mechanisms of different general patterns.

Description

General activity spatial measure analysis method based on subway IC card data
Technical Field
The invention relates to a public transport data mining method, in particular to a subway general population activity space measure analysis method based on subway IC card data.
Background
The general population is an important component of the trip population. In China, the phenomenon of separation of students from study in middle and primary schools generally exists, and transportation means are required to be used for middle and long distance travel. In the travel period of students, ground traffic is congested, so the subway becomes an important traffic mode of the students. The activities of the general groups going out by rail transit are researched by utilizing the activity space, so that the time-space characteristics of the going-out activities can be known, and the intrinsic relation among activities can be controlled from the macro level more conveniently. In addition, the utilization condition of the rail transit and the existing space of the city can be further understood.
Compared with the traditional traffic survey data, the information provided by the subway IC card data has the characteristics of accuracy, large sample quantity, wide coverage, strong real-time performance, lower acquisition cost and the like, and provides a higher-quality data basis for researching the spatiotemporal behavior of a traveler. However, in the prior art studies, there are many disadvantages, such as: the traditional data are mostly adopted for measuring the personal activity space, and the travel activity information has the defects of missing or inaccurate time-space information, short time and the like; the research of the application trajectory data has the defects of short data information time, inaccurate characteristic analysis and insufficient full play of the characteristics of comprehensive big data information; the existing research mainly explores the activity space of commuting groups, neglects the groups of school students, and the school students have relatively fixed running time and place and have certain influence on the urban space structure; a great deal of research at home and abroad combines the activity space with other problems, such as working distance, community diversity and other factors, but rarely pays attention to the relationship between the characteristics of the activity space, such as shape, area, heat and the like, and spatial structures of travelers and cities, so that the research result can only analyze specific problems, the spatial structures of the cities cannot be evaluated, and traffic loads generated by the travel behaviors of a large proportion of travel groups cannot be combined with urban public traffic construction, thereby providing reliable basis for current planning adjustment and future planning. In the prior art, no relevant research on the analysis of general activity space measures has appeared.
Disclosure of Invention
The purpose of the invention is as follows: based on the defects of the prior art, the invention provides a general activity space measure analysis method based on subway IC card data.
The technical scheme is as follows: in order to achieve the purpose, the method for analyzing the general activity space measure based on the subway IC card data performs measure analysis on the activity space of general population by using the subway card swiping data, extracts corresponding measure indexes and analyzes the similarity and difference of the activity space of the general population. The method comprises the following steps:
(1) acquiring subway IC card swiping data and longitude and latitude data of a subway station, extracting general effective information from the card swiping data, matching the longitude and latitude data with a geographic map in spatial position, identifying general population, and establishing a general population travel database;
(2) preprocessing the trip data of general population;
(3) analyzing the travel time-space characteristics of general population and defining the activity space of the general population;
(4) based on the definition of the general population activity space, extracting activity space measure indexes from two dimensions of time and space;
(5) clustering analysis is carried out on the general population activity space measurement index data, and general populations are divided into different categories according to the activity space measurement indexes to obtain a general population activity space mode based on subway trip.
Preferably, the matching method of the spatial location information of the subway station in the step (1) comprises the following steps: and (3) crawling the longitude and latitude data of the subway stations on the electronic map by using a collector, converting the longitude and latitude data into data under a WGS-84 coordinate system by using a universal coordinate converter, and finally obtaining the position information of each subway station.
Preferably, the trip data preprocessing in the step (2) mainly comprises the following parts:
21) respectively extracting travel data of working days and holidays (including cold and summer holidays) of general school groups, and counting travel time, travel origin-destination points and travel times of the general school groups in different states;
22) and extracting travel data with school sites as endpoints, matching the travel data with IC card numbers, and counting travel time, frequency and endpoint position distribution conditions of general groups.
Preferably, step (3) comprises: comparing the general population information counted in the step (21) and the step (22) with corresponding data of other social groups to obtain the space-time characteristics of the general population, and defining the activity space of the general population as an ellipse which is obtained by taking a home as a focus and calculating the length of the major axis and the minor axis according to the distribution of travel points according to the characteristics that the travel distance has directional differences; according to the characteristic that the travel time is fixed, the general frequency and the general distance are added to define the general group activity space.
Preferably, the step (4) of extracting the activity space measure index in two dimensions of time and space mainly comprises: constructing a confidence ellipse, and taking the oblateness and the area as space measure indexes for depicting an activity space; secondly, the general distance is used as a space measure index for depicting the activity space; and thirdly, the general learning frequency is used as a time measurement index for depicting the activity space.
Preferably, in the step (4), a confidence ellipse is established by using the spatial coordinates of all the active points and an active spatial measure index is extracted, and the method mainly comprises the following steps:
41) selecting 95% confidence coefficient to output confidence ellipse to obtain circle center coordinate, length of major and minor semi-axes and rotation angle of confidence ellipse composed of individual active points;
42) calculating a confidence ellipse characteristic index: deriving an attribute table of an ellipse in the layer, and introducing two indexes of a flat rate and an area to measure and describe the activity space, wherein the flat rate alpha of the ellipse can be calculated by the following formula:
Figure BDA0001835565540000031
wherein, a is a major semi-axis of the ellipse, and b is a minor semi-axis of the ellipse; the larger the alpha value is, the flatter the ellipse is, the stronger the directivity of the active point is, and the active points are probably positioned on the same subway line; the smaller the alpha value is, the more round the ellipse is, and the more dispersed the active points are in space;
the area of the ellipse is calculated by a formula S ═ pi · a · b, and the larger the area value is, the wider the daily activity range of the primary and secondary school students is;
43) adding general frequency and general distance to perfect activity space measure index,
counting the times of the index of the general frequency according to the face number of the IC card through an established general population trip database;
the general distance is calculated by using the longitude and latitude information of the station according to an Euclidean distance calculation formula, wherein the formula is as follows:
wherein the content of the first and second substances,
Figure BDA0001835565540000032
long1, Long2, lat1 and lat2 are longitude and latitude data of subway stations of home and school respectively.
Preferably, step (41) comprises in particular the steps of:
411) extracting the IC card surface number, the station number of the incoming station, the station number of the outgoing station and the corresponding longitude and latitude of the IC card surface number, the station number of the incoming station and the station number of the outgoing station in the row database, and combining the station numbers of the incoming station and the station number of the outgoing station to obtain the activity point information of each traveler;
412) after the longitude and latitude data of the subway station are added into the arcgis, all coordinates including home and school sites are converted into Gaussian-Kruger coordinates from a WGS84 geodetic reference system through projection transformation;
413) selecting a direction distribution tool in the arcgis, inputting the converted coordinates, selecting a 95% confidence degree and outputting a confidence ellipse, and finally obtaining the center coordinates, the length of the major and minor semi-axes and the rotation angle of the confidence ellipse formed by the movable points of one circle of each individual.
Preferably, in the step (5), the activity spaces are classified by using a K-means clustering method, the number of clusters is determined by using an elbow rule, the ratio of the square sum of the groups to the square sum of the overall distance is calculated by using R software, the larger ratio is taken as the number of clusters, measure analysis is performed according to the characteristics of confidence ellipses, travel frequency and the like of the activity spaces, and the common points of the general travel behaviors of the same type and the differences among different types are analyzed.
Has the advantages that:
1. the invention realizes the measure analysis of the activity space of the general school community through the subway IC card data with longer data days and longer span, more comprehensively and accurately embodies the relation between the activity space of the general school community and the urban spatial layout, and provides a decision basis for solving the problem of unbalanced urban traffic load and optimizing the urban layout. The method is beneficial to understanding the behavior characteristics of general school groups and the traffic passenger flow demands at different time, and the spatial layout of the city can be planned better so as to improve the life quality of student groups.
2. The method provided by the invention is used for mining the corresponding general trip activity space measure indexes from the IC card data for the first time, and classifying general population based on general characteristics, wherein the general characteristics of the classified population are obvious. The method gives play to the objectivity and advantages of the IC card data and provides conditions for research on formation mechanisms of different general patterns.
3. The method of the invention carries out quantitative analysis on the spatial measure, the product of the method is the characteristic value of the confidence ellipse, the confidence ellipse represents the activity space of students to a certain extent, the characteristic value represents the measure of the activity space, the actual activity range is converted into a mathematical model, thereby providing a brand new data processing mode of the home and school sites, and the method is also a brand new method for analyzing the general activity space, and has stronger rationality because the data is matched with the focus point of the method.
4. The invention utilizes subway card swiping data by a confidence ellipse method, expresses a general activity space, and simultaneously solves the problems of insufficient time span and accuracy of basic analysis data, limitation of research objects and insufficient characteristic attention of the activity space in the existing research.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram illustrating elbow rule clustering results according to an embodiment of the present invention;
FIG. 3 is a general activity space diagram of the clustering results according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
Referring to fig. 1, the general activity spatial measure analysis method based on subway IC card data provided by the invention comprises the following steps:
the method comprises the steps of (1) obtaining subway IC card data and longitude and latitude data of a subway station, extracting general effective information from card swiping data, matching the longitude and latitude data with a geographic map in a spatial position, determining geographic positions of general population families and schools, completing identification of the general population, and establishing a subway IC card student crowd traveling database.
The data adopted by the embodiment of the invention is the working day card swiping data of 2015, 11 months, 2 days to 20 days, three consecutive weeks in Nanjing City of a Nanjing City subway company. Nanjing subway has a preferential policy for students to take, so their ticket card is a single type. All student card travel information is screened out firstly through the condition that the ticket card type is 54. According to a general identification method based on subway card swiping data (application number: 201711043136.0), card types are reserved as effective travel records of student cards, and an effective travel database of the student cards is obtained, and the data structure of the database is shown in table 1.
TABLE 1 subway IC card data structure
Figure BDA0001835565540000051
The longitude and latitude data of 122 subway stations are crawled on a Baidu map by a collector, and are converted into data under a WGS-84 coordinate system by a universal coordinate converter so as to be matched with a Nanjing map in a same coordinate system for spatial position, and finally position information of each subway station is obtained, wherein the position information is shown in a table 2.
TABLE 2 subway station latitude and longitude data
Number of vehicle station x1 y1
1 118.7128906 32.0112915
2 118.7164917 31.9979248
3 118.7277222 31.9901123
4 118.7382813 31.98449707
5 118.7562866 31.99371338
According to a general identification method based on subway card swiping data (application number: 201711043136.0), a record of departure from a school station before 9:00 and a record of departure from a home station after 16:00 are deleted from trip records of card numbers of all identified home stations and school stations. In addition, because subway card swiping records of three consecutive working days are adopted as samples in the method for identifying the home and the school, card numbers with the general school days less than 3 days and records of the card numbers are deleted, and finally 317828 general school trip records are obtained. Thus, the general population recognition is completed.
Step (2), carry out the preliminary treatment to general population trip data, the behavior characteristic analysis of the trip of being convenient for to draw the general population and the apparent division point of other crowds, perfect general population activity space's definition, data preliminary treatment content mainly includes:
21) respectively extracting travel data of working days and holidays (including cold and summer holidays) of general school groups, and counting travel time, travel origin-destination points and travel times of the general school groups in different states;
22) and extracting travel data with school sites as endpoints, matching the travel data with IC card numbers, and counting travel time, frequency and endpoint position distribution conditions of general groups.
And (3) analyzing the travel space-time characteristics of the general population, and defining the activity space of the general population.
And (3) comparing the information of the general population counted in the step (21) and the step (22) with corresponding data of other social groups when the travel time-space characteristics of the general population are analyzed. Specifically, the analysis process includes:
comparing the working day trip behavior and the holiday trip behavior of general population, wherein the trip time distribution, trip OD and trip frequency are obviously different, the holiday trip time is wholly lagged behind the working day, the trip frequency is not fixed, and the trip OD regularity is not strong;
comparing the trip behaviors of general school groups with those of other groups, wherein the trip range and the trip time of the general school groups are obviously fixed, the card swiping time at a school station is obviously concentrated and fixed, the card swiping data volume is normally distributed in the duration, the activity range is small, and the activity boundary and regularity are obvious;
the general population trip behavior and other crowd's trip behaviors that contrast used the school website as the extreme point, general population trip orbit is showing to concentrate and fixed, and trip generating time is showing to concentrate, and the duration is fixed, and the trip generation volume is normal distribution in the duration, and the trip frequency is stable, and the working day is obvious with holiday trip law difference.
The analysis shows that the general population has obvious spatio-temporal characteristics relative to other social populations:
time characteristics: the trip time period is fixed, the trip duration time of each trip is fixed, the trip data of the school station is concentrated, the peak time is fixed, the duration time is stable, and the trip frequency is fixed;
spatial characteristics: the travel OD is fixed, the moving range is small and fixed, the appearance sequence of the moving points is fixed, and the difference of the directions of the moving ranges is obvious.
Compared with other social groups, the travel origin-destination distribution of general population has extremely high certainty, and the difference between the general frequency and the general distance and between other groups is obvious, so that the method has stronger fixity. The definition of the activity space of the general population is perfected on the basis of the unique travel characteristics of the general population, and the common travel characteristics of the general population and other social populations are not taken as main indexes.
The activity space emphasizes the objective movement and activity of travelers under the limiting action of two dimensions of time and space, the main activities of general school groups are carried out based on schools and homes, the number of activity places is small, the activity space of the general school groups cannot be determined according to the density distribution of travel origin-destination points, meanwhile, the travel path of the general school groups going out of subways is located in an underground space, a buffer area cannot be formed near the high-density origin-destination points, and therefore the activity space cannot be defined according to the travel path and the buffer area. The general population activity points are concentrated and have strong directionality, and the activity distances based on the family have direction differences, so when defining the activity space of the general population, for travelers with obvious directionality differences in the travel distances, the geometric shape of the activity mode is defined to be an ellipse which is obtained by taking the family as a focus and calculating the length of the major axis and the minor axis according to the distribution of the travel points. Meanwhile, the general population has stronger travel space-time characteristics relative to other populations, the general frequency and the general distance of the general population are included in the general population activity space definition, and the general population activity space is represented together with the geometric shape of the activity mode.
(4) Based on the definition of the general population activity space, activity space measure indexes are extracted from two dimensions of time and space. The activity space measure index comprises: constructing a confidence ellipse, and taking the oblateness and the area as space measure indexes for depicting an activity space; secondly, the general distance is used as a space measure index for depicting the activity space; and thirdly, the general learning frequency is used as a time measurement index for depicting the activity space.
The general range of activities of primary and secondary school students is small, and particularly, the condition that no trip or subway trip is not used may exist on weekends, so that the confidence ellipse is not respectively constructed according to working days and rest days, but all trip points of an individual in one week are used. In the construction process, only the coordinates of each individual space activity point are needed, and the station entering and the station exiting are not needed to be distinguished. And (4) selecting 95% confidence degree to output a confidence ellipse, and finally obtaining the center coordinates, the length of the major and minor semi-axes and the rotation angle of the confidence ellipse formed by the movable points of one circle of each individual. The 95% confidence ellipse is a result that can be directly calculated according to longitude and latitude data information of the key activity point, ellipses with 68%, 95% and 99% confidence degrees can be selectively generated and respectively represent different standard deviation numbers, and according to the adaptability analysis of practical research, the standard deviation number is 2 and the ellipse confidence degree is 95% in the embodiment.
The ellipticity α of the ellipse can be calculated by the following formula:
Figure BDA0001835565540000071
wherein a is the major semi-axis of the ellipse and b is the minor semi-axis of the ellipse. The larger the alpha value is, the flatter the ellipse is, the stronger the directivity of the active point is, and the active points are probably positioned on the same subway line; the smaller the value of alpha, the more circular the ellipse, and the more dispersed the active points in space.
The area of the ellipse is calculated by the following formula:
S=π·a·b
the larger the area value is, the wider the daily activity range of the primary and secondary school students is.
And adding general frequency and general distance to perfect the activity space measurement index. Based on objective analysis of a large amount of travel data, a general population is excavated and is obviously distinguished from other populations on general distance and frequency, so the two points are included in travel space-time characteristic analysis of the general population as a supplementary measure index of a confidence ellipse characteristic value. The index of the general frequency is obtained by counting the times of the existing general trip database according to the face number of the IC card; the general distance is obtained by using an Euclidean distance calculation method, and the calculation formula is as follows:
|d12|=6368.16×arccos(sinX+cosX)
wherein the content of the first and second substances,
Figure BDA0001835565540000081
long1, Long2, lat1 and lat2 are longitude and latitude data of subway stations of home and school respectively.
(5) Clustering analysis is carried out on the general population activity space measurement index data, and general populations are divided into different categories according to the activity space measurement indexes.
And (4) carrying out clustering analysis by adopting a K-means clustering method, wherein the K-means clustering method needs to determine the clustering number in advance and then carry out classification. Meanwhile, it should be noted that this clustering method mainly calculates the distance between points according to the index values, and thus to eliminate the influence of the difference of different indexes on the scale on the result, it is necessary to perform normalization processing on the data.
The elbow rule is the most commonly mentioned method for determining the cluster number at present, the method displays the square sum in groups of different cluster numbers, the square sum in the groups represents the square sum of the distances from all data points in one type of data to a central point, and the smaller the square sum in the groups is, the higher the similarity of each index representing each data in the groups is. The results obtained are shown in FIG. 2. As can be seen from the figure, the square sum in the groups from one to five decreases rapidly, and then decreases more slowly, so that 5 or 6 is selected as the cluster number in the embodiment.
Meanwhile, the elbow rule graph only gives the results of similarity in the groups, and the difference between the groups cannot be seen from the results. In this case, the population can be divided into 5 classes and 6 classes, and the R software automatically calculates the ratio of the square sum between the groups to the square sum of the total distance, and the larger this ratio is, the more different the different classes are. The results in the R software show that as shown in table 6, the ratio of the number of clusters of 6 is slightly larger than the ratio of the number of clusters of 5, so in the example 6 categories were finally selected as the number of clusters.
TABLE 3 ratios of the sum of squares of the interclass to the sum of squares of the overall distances for different numbers of clusters
Number of clusters Sum of squared inter-group and/sum of squared overall distance
5 62.4%
6 65.1%
Finally, the student groups performing subway general study are divided into six categories by a K-means clustering method, and the average indexes of the six categories of general study groups are shown in Table 4.
TABLE 4 mean values of the indices of the six general groups
Figure BDA0001835565540000082
Figure BDA0001835565540000091
Selecting a typical individual from each general population, marking the positions of the family, school and other activity points in the graph by using Arcgis, and drawing a confidence ellipse representing the activity space according to the activity points, as shown in FIG. 3, wherein (a) - (f) respectively correspond to the activity spaces of the typical individuals in the six general populations. It can be seen that the range of activities of each type of typical trip individual has significantly different characteristics. The confidence ellipses intuitively represent the activity space of the general groups, the characteristic values of the ellipses are the measure indexes of the activity space, and the classification analysis can obtain which groups have reasonable general activity ranges and overlarge general activity ranges, so that the reasonability of school distribution can be reflected, the existing urban layout is evaluated, and the matching of urban traffic and urban layout is evaluated.

Claims (4)

1. A general activity space measure analysis method based on subway IC card data is characterized by comprising the following steps:
(1) acquiring subway IC card swiping data and longitude and latitude data of a subway station, extracting general effective information from the card swiping data, matching the longitude and latitude data with a geographic map in spatial position, identifying general population, and establishing a general population travel database;
(2) preprocessing general population travel data, comprising:
21) respectively extracting working days of general school groups and holiday travel data containing cold and summer holidays, and counting travel time, travel origin-destination point and travel times of the general school groups in different states;
22) extracting travel data with school sites as endpoints, matching IC card numbers, and counting travel time, frequency and endpoint position distribution conditions of general groups;
(3) analyzing the travel space-time characteristics of general population, and defining the activity space of the general population, wherein the analysis method comprises the following steps: comparing the general population information counted in the step (21) and the step (22) with corresponding data of other social groups to obtain the space-time characteristics of the general population, and defining the activity space of the general population as an ellipse which is obtained by taking a home as a focus and calculating the length of the major axis and the minor axis according to the distribution of travel points according to the characteristics that the travel distance has directional differences; adding general frequency and general distance to define general group activity space according to the characteristic that travel time is fixed;
(4) based on general population activity space definition, draw activity space measurement index from two dimensions of time and space, wherein the oblateness and the area of ellipse are as the space measurement index of portraying the activity space, and general distance is as the space measurement index of portraying the activity space, and general frequency is as the time measurement index of portraying the activity space, specifically includes:
41) selecting 95% confidence coefficient to output confidence ellipse to obtain circle center coordinate, length of major and minor semi-axes and rotation angle of confidence ellipse composed of individual active points;
42) calculating confidence ellipse characteristic indexes, deriving an attribute table of an ellipse in a layer, introducing two indexes of a flat rate and an area to measure and describe the activity space, wherein the flat rate alpha of the ellipse is calculated by the following formula:
Figure FDA0003114535900000011
wherein, a is a major semi-axis of the ellipse, and b is a minor semi-axis of the ellipse;
the area of the ellipse is calculated by a formula S ═ pi · a · b;
43) adding general frequency and general distance to perfect activity space measure index,
counting the times of the general population according to the face number of the IC card through an established general population trip database;
the general distance is calculated by using the longitude and latitude information of the station according to an Euclidean distance calculation formula, wherein the formula is as follows:
|d12|=6368.16×arccos(sinX+cosX)
wherein the content of the first and second substances,
Figure FDA0003114535900000021
long1, long2, lat1 and lat2 are longitude and latitude data of subway stations of homes and schools respectively;
(5) clustering analysis is carried out on the general population activity space measurement index data, and general populations are divided into different categories according to the activity space measurement indexes to obtain a general population activity space mode based on subway trip.
2. The method for analyzing the general activity spatial measure based on the subway IC card data as claimed in claim 1, wherein the matching method of the subway station spatial position in step (1) is as follows: and (3) crawling the longitude and latitude data of the subway stations on the electronic map by using a collector, converting the longitude and latitude data into data under a WGS-84 coordinate system by using a universal coordinate converter, and finally obtaining the position information of each subway station.
3. The method for analyzing the general activity spatial measure based on the subway IC card data as claimed in claim 1, wherein said step 41) comprises the following steps:
411) extracting the IC card surface number, the station number of the incoming station, the station number of the outgoing station and the corresponding longitude and latitude of the IC card surface number, the station number of the incoming station and the station number of the outgoing station in the row database, and combining the station numbers of the incoming station and the station number of the outgoing station to obtain the activity point information of each traveler;
412) after the longitude and latitude data of the subway station are added into the arcgis, all coordinates including home and school sites are converted into Gaussian-Kruger coordinates from a WGS84 geodetic reference system through projection transformation;
413) selecting a direction distribution tool in the arcgis, inputting the converted coordinates, selecting a 95% confidence coefficient, and outputting a confidence ellipse, and finally obtaining the circle center coordinates, the length of the major and minor semi-axes and the rotation angle of the confidence ellipse formed by each individual movable point.
4. The general activity space measure analysis method based on subway IC card data as claimed in claim 1, wherein in said step (5) using K-means clustering method to classify the activity space, determining the number of clusters by elbow rule, using R software to calculate the ratio of the square sum between the groups and the square sum of the total distance, taking the larger ratio as the number of clusters, performing measure analysis according to activity space confidence ellipse and trip frequency, and analyzing the difference between the common point of general trip behavior of the same type and different types.
CN201811224346.4A 2018-10-19 2018-10-19 General activity spatial measure analysis method based on subway IC card data Active CN109508815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811224346.4A CN109508815B (en) 2018-10-19 2018-10-19 General activity spatial measure analysis method based on subway IC card data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811224346.4A CN109508815B (en) 2018-10-19 2018-10-19 General activity spatial measure analysis method based on subway IC card data

Publications (2)

Publication Number Publication Date
CN109508815A CN109508815A (en) 2019-03-22
CN109508815B true CN109508815B (en) 2021-08-10

Family

ID=65746846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811224346.4A Active CN109508815B (en) 2018-10-19 2018-10-19 General activity spatial measure analysis method based on subway IC card data

Country Status (1)

Country Link
CN (1) CN109508815B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222411A (en) * 2021-05-12 2021-08-06 北京百度网讯科技有限公司 Analysis method, device, equipment and storage medium for activity space distribution

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105701180A (en) * 2016-01-06 2016-06-22 北京航空航天大学 Commuting passenger feature extraction and determination method based on public transportation IC card data
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data
CN107818415A (en) * 2017-10-31 2018-03-20 东南大学 A kind of recognition methods of attending a school by taking daily trips based on subway brushing card data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682342B2 (en) * 2009-05-13 2014-03-25 Microsoft Corporation Constraint-based scheduling for delivery of location information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105701180A (en) * 2016-01-06 2016-06-22 北京航空航天大学 Commuting passenger feature extraction and determination method based on public transportation IC card data
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data
CN107818415A (en) * 2017-10-31 2018-03-20 东南大学 A kind of recognition methods of attending a school by taking daily trips based on subway brushing card data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《School Commuting Pattern in Metro System Across Different Loyalty Groups》;顾宇等;《In Transportation Research Board 97rd Annual Meeting》;20180111;第1-6页 *
《基于GPS 数据的北京市郊区巨型社区居民日常活动空间》;申悦等;《地理学报》;20130430;第506-516页 *

Also Published As

Publication number Publication date
CN109508815A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
Schirmer et al. The role of location in residential location choice models: a review of literature
CN106096631A (en) A kind of recurrent population's Classification and Identification based on the big data of mobile phone analyze method
CN110533038A (en) A method of urban vitality area and inner city Boundary Recognition based on information data
CN110110902B (en) Accessibility measuring and calculating method for shared bicycle connection rail transit station
Kumar et al. Fast and scalable big data trajectory clustering for understanding urban mobility
CN112966899B (en) Urban public service facility construction decision method influencing population density
CN113159364A (en) Passenger flow prediction method and system for large-scale traffic station
WO2023050955A1 (en) Urban functional zone identification method based on function mixing degree and ensemble learning
Li et al. The spatiotemporal evolution and influencing factors of hotel industry in the metropolitan area: An empirical study based on China
CN107766983B (en) Method for setting emergency rescue parking point of urban rail transit station
CN112036757A (en) Parking transfer parking lot site selection method based on mobile phone signaling and floating car data
CN117056823A (en) Method and system for identifying occupation type of shared bicycle commuter user
CN112000755A (en) Regional trip corridor identification method based on mobile phone signaling data
Zhang et al. How road network transformation may be associated with reduced carbon emissions: An exploratory analysis of 19 major Chinese cities
CN109508815B (en) General activity spatial measure analysis method based on subway IC card data
CN114723596A (en) Urban functional area identification method based on multi-source traffic travel data and theme model
CN114662774A (en) City block vitality prediction method, storage medium and terminal
CN108681741B (en) Subway commuting crowd information fusion method based on IC card and resident survey data
Binoy et al. Spatial variation of the determinants affecting urban land value in Thiruvananthapuram, India
CN111008730B (en) Crowd concentration prediction model construction method and device based on urban space structure
Zhou et al. Big data for intrametropolitan human movement studies A case study of bus commuters based on smart card data
Cui et al. Research on the driving forces of urban hot spots based on exploratory analysis and binary logistic regression model
Yue et al. Classification and determinants of high-speed rail stations using multi-source data: A case study in Jiangsu Province, China
CN114861975A (en) Urban tourism traffic demand joint prediction method based on attraction strength
Taran Measuring Accessibility to Health Care Centers in the City of Al-Mafraq Using Geographic Information Systems.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant