CN107818415B - General recognition method based on subway card swiping data - Google Patents

General recognition method based on subway card swiping data Download PDF

Info

Publication number
CN107818415B
CN107818415B CN201711043136.0A CN201711043136A CN107818415B CN 107818415 B CN107818415 B CN 107818415B CN 201711043136 A CN201711043136 A CN 201711043136A CN 107818415 B CN107818415 B CN 107818415B
Authority
CN
China
Prior art keywords
station
card
stations
school
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711043136.0A
Other languages
Chinese (zh)
Other versions
CN107818415A (en
Inventor
季彦婕
顾宇
刘阳
刘攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201711043136.0A priority Critical patent/CN107818415B/en
Publication of CN107818415A publication Critical patent/CN107818415A/en
Application granted granted Critical
Publication of CN107818415B publication Critical patent/CN107818415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07BTICKET-ISSUING APPARATUS; FARE-REGISTERING APPARATUS; FRANKING APPARATUS
    • G07B15/00Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points
    • G07B15/02Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points taking into account a variable factor such as distance or time, e.g. for passenger transport, parking systems or car rental systems
    • G07B15/04Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points taking into account a variable factor such as distance or time, e.g. for passenger transport, parking systems or car rental systems comprising devices to free a barrier, turnstile, or the like
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/20Individual registration on entry or exit involving the use of a pass

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Technology (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a general identification method based on subway card swiping data, which comprises the following steps of: 1) collecting and preprocessing data including IC card numbers, stations and card swiping time based on subway card swiping information; 2) finding out a station with the highest use frequency and a station with the highest use frequency corresponding to the station with the highest use frequency for each card number, and taking the station as a candidate station of a home station and a school station; 3) classifying travel records between two candidate stations of each card number according to the work and rest time arrangement of the middle and primary schools of the city, and determining a home station and a school station according to a certain time rule; 4) and finding and deleting the record of the non-general trip and the card number and the record thereof which are difficult to judge. The invention provides a method for identifying the general population based on the subway card swiping data for the first time by integrating and processing a large amount of subway card swiping data from the time and space angles, solves the basic problem of researching general behaviors by using big data, and makes up the defects of the traditional investigation method.

Description

General recognition method based on subway card swiping data
Technical Field
The invention relates to a method for collecting and analyzing travel behavior data in traffic planning, in particular to a general identification method based on subway card swiping data.
Background
In recent years, the selection of student travel modes has attracted more and more attention of students. Students in primary and junior middle schools should enter according to the policy of close entrance in China. However, due to the unbalanced educational resource distribution, more and more parents acquire high-quality educational resources for their children by making a school selection, which often results in a long distance to school. Subways are chosen by the major medium and long distance travel modes of cities. However, these students who use the subway are still driven by parents to get to school on a daily basis. In order to guide these students to use the subway more, it is necessary to better understand their daily general patterns.
The commuting behavior, namely the process of going to and from school from home corresponding to the commuting behavior, is analyzed by acquiring information of commuting travel by adopting a traditional questionnaire survey method, the traditional questionnaire survey method consumes a large amount of manpower and time in the survey process, the survey sample amount is small, the sample coverage is small and incomplete, and the analysis result is deviated or only a local area with a limited range can be analyzed. Furthermore, general behavior is a long-term process, and it is difficult to obtain complete data through short-term traditional surveys. Fortunately, subway smart card systems designed for toll collection can provide detailed card swipe records, including card types, card swipe dates, and subway stations, which can be used for many purposes, and which can better replace data obtained by many traditional survey methods. However, although the commuting identification research method based on the smart card data is mature day by day, the research on the identification of school commuting behavior is very little at present. The conventional commute identification method focuses on time and space modes, and identifies commute travel of people according to riding frequency of a week, fixed station getting on and off, time intervals of two times of riding and the like. In the method for identifying the general behaviors, the number of students in middle and primary schools using the subway for a long time is not very large because the phenomenon that parents drive to pick up and send the students to and from the school is very common, and a large number of real general population can be mistakenly excluded if only depending on the trip frequency; meanwhile, due to the noon break system of schools, the time spent in the same place is not a good identification method. Therefore, strict and reasonable time and space constraint conditions must be adopted to accurately judge whether the travel behavior in one working day is a general behavior.
Disclosure of Invention
The purpose of the invention is as follows: based on the defects, the invention provides a method for identifying the crowd using the subway general study and the travel record based on the subway card swiping data, and the general study behavior can be accurately judged.
The technical scheme is as follows: the invention relates to a general identification method based on subway card swiping data, which comprises the following steps of:
(1) and data acquisition and pretreatment: IC data and subway station coordinate data for all subway stations in all working days for three consecutive weeks and more in one city are required. The IC data includes a card number, an arrival date, an arrival time, an exit date, an exit time, an arrival vehicle number, an exit vehicle number, and a card type. After subway IC card data are collected, all travel records of continuous working days of the subway IC card are combined according to time sequence by taking a card holder as a unit, all card types are screened out and only retained as card swiping records of student cards, and abnormal data are deleted.
(2) And calculating the occurrence frequency of the stations used by the card number for each card number, finding out the station with the highest occurrence frequency, and counting the number of the stations with the highest occurrence frequency.
(3) If the number of stations with the highest occurrence frequency is 1, taking the station as a candidate station Si1 of the station of the card number or the station of the school; if the number of stations with the highest occurrence frequency is 2, taking the two stations as candidate stations Si1 and Si2 of the station of the card number or the station of the school; and if the number of stations with the highest frequency is more than 2, merging adjacent stations, and then using the stations with the highest frequency as candidate stations Si1 of the home station or the school station, if 2 stations with the highest frequency are still juxtaposed after merging, using the two merged stations as candidate stations Si1 and Si2 of the home station and the school station of the card number, and if the number of the merged stations is more than 2, deleting the card number and the trip record of the card number.
(4) And for the card numbers with the highest frequency of occurrence in the step 3, which are 1 and more than 2, calculating and judging the number of stations with the highest frequency of occurrence in the stations corresponding to the selected candidate stations: if the number of stations with the highest occurrence frequency is 1, taking the station as another candidate station Si2 of the station of the card number or the station of the school; and if the number of stations with the highest frequency is more than 2 or equal to 2, merging adjacent stations, and then taking the station with the highest frequency as another candidate station Si2 of the station of the home or school, and if the stations are still parallel after merging, deleting the card numbers and the trip records of the stations.
(5) And according to the work and rest time arrangement of the primary and secondary schools, dividing travel records between candidate stations of each card number into four types according to the station entry time: (I) am: the arrival time is before the evening study time in the morning; (II) pm: the arrival time is after the earliest school time in the afternoon; (III) noon 1: the arrival time is within the range from the earliest school time in the morning to the earliest school time in the afternoon; (IV) noon 2: the arrival time is within the range from the earliest time of study in the afternoon to the latest time of study in the afternoon.
(6) Sorting the four categories according to the card number and the arrival time respectively; for the card number with the travel record belonging to the class (I), determining the station number of the station where the first travel record enters as a station of home, and determining the station number of the station where the corresponding station where the station exits as a station of school; for the card number with the travel record belonging to the class (II), determining the station number of the last travel record as a home station, and determining the corresponding station number of the station of the school; for the card number with the travel record belonging to the class (III), determining the station number of the station of the first travel record as a school station, and determining the corresponding station number of the station as a home station; and (3) determining the station number of the last trip record as a school station and the corresponding station number of the station as a home station for the card number with the trip record belonging to the (IV) class.
(7) And deleting the records of departure from the school station before the last school hour in the morning and the records of departure from the home station after the last school hour in the afternoon from the travel records of all the card numbers of the identified home station and the school station.
(8) Counting the general days of each card number, and deleting the card numbers with the general days less than the specified threshold days and the records thereof.
Has the advantages that: compared with the prior art, the invention has the following advantages:
the subway card swiping data adopted by the invention is easy to obtain, the data is comprehensive and objective, and the law of the subway card swiping data is easier to reveal by using big data. Although the subway commuting behavior is relatively mature at present, the mode of daily subway use of the subway commuting behavior is diversified due to the fact that the trip mode of a student is influenced by the trip mode of a parent, and the commuting behavior of the student cannot be identified by a traditional commuting identification method. Therefore, the invention provides a general behavior identification method based on subway card swiping data by considering the possible receiving and sending behaviors of students and combining the school-going and school-leaving time regulation of schools. Compared with the existing commute identification method, the time-space constraint of the invention is stricter and more reasonable, and the accuracy of the commute identification result is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the method for identifying a general behavior based on subway card swiping data comprises three stages: firstly, data acquisition and pretreatment, which corresponds to the step 1 in the figure 1; secondly, identifying a home station and a school station for each card number, which corresponds to the step 2-step 6 in the figure 1; and thirdly, deleting the abnormal trip record, the card numbers with too few general days and the trip record thereof, and detailed processes are described in detail below corresponding to the steps 7-8 in the figure 1.
Firstly, data acquisition and preprocessing
According to the present invention, IC card data and subway station coordinate data of all subway stations in all working days for three consecutive weeks or more in one city are required. In the embodiment, the original data is the card swiping data of all subway stations from 10 month 10 to 10 month 28 in Nanjing 2016. In step 1, the original data in the database is firstly stored in a CSV format and read by R software, column data named "card number", "arrival date", "arrival time", "departure date", "departure time", "arrival vehicle number", "departure station number", "card type", "arrival station longitude", "arrival station latitude", "departure station longitude", and "departure station latitude" are extracted, and then the original data is preprocessed, and only records that the card type is a student card and the arrival date belongs to the working day are reserved. In the invention, the card type is 54 and represents a student card, records of different inbound dates and outbound dates and records of the same inbound station number and the same outbound station number are deleted, and original travel record data X are obtained. The specific data form is shown in table 1.
TABLE 1 subway card swiping data example
Figure BDA0001451690880000041
Remarking: practical application of coordinate data to reserve 9-digit decimal
Second, identifying the station and school station for each card number
The method comprises the steps of firstly finding out candidate stations Si1 and Si2 of a home station and a school station for each card number i, then classifying travel records between the two candidate stations of each card number according to the arrival time according to the work and rest time arrangement of the middle school and the primary school of the city, and then determining the home station and the school station according to a certain time rule. The station represents the location of the student, and the station represents the location of the student.
In step 2, the using frequency of all stations of the card number is calculated for each card number, the station with the highest occurrence frequency is found out, and the number of the stations with the highest occurrence frequency of each card number is judged. Specifically, sorting the original trip record data X according to the card number, extracting three columns of the card number, the station number of the entering station and the station number of the exiting station in the original data X, and storing the three columns as a database O; copying the data in the database O to a new database N, and renaming the columns in the new database N to be card numbers, outbound station numbers and inbound station numbers; merging databases O and N by rows, that is, merging the inbound stop number in database O and the outbound stop number in database N into one column, called stop number a, merging the outbound stop number in database O and the inbound exit number in database N into one column, called stop number B, to form a database M, the data form of which is shown in table 2; sorting the data in the database M according to the card numbers in rows, counting the frequency of station occurrence in the station number A or the station number B of each card number in the database M by using a circulation algorithm, defining new columns Si1 and Si2 for all the card numbers in the database M, recording the value as 0 (the Nanjing subway station does not contain the station with the number of 0), and recording the value in the database P, wherein the data form is shown in Table 3.
Table 2 examples of data in database M
Figure BDA0001451690880000042
Figure BDA0001451690880000051
Table 3 data examples in database P
Card number Number of vehicle station Frequency of Si1 Si2
9726xxxxxx78 2 1 0 0
9726xxxxxx78 18 7 0 0
9726xxxxxx78 21 2 0 0
9726xxxxxx78 19 4 0 0
In step 3, candidate stations of the home station and the school station are found for the first time. And finding the station number with the highest frequency of use of each card number in the database P, calculating the number of the station numbers with the highest frequency of use of each card number by using a loop statement, and storing the number in a new column n. For example, in table 3, among the stations used by card number 9726xxxxxx78, the station with the highest frequency is only one station with 18, and the number of stations with the highest frequency is 1. Next, candidate stations for the home station or the school station are found out based on the number of stations having the highest frequency of occurrence. The following judgment is made: if the number of the stations is 1, the station is a candidate station Si1 of the card number; if the number of the stations is 2, the two stations are candidate stations Si1 and Si2 of the card number; if the number of stations is more than 2, the station with the highest frequency after the adjacent stations are combined is Si1, and if the number of stations with the highest frequency after the combination is still more than 2, the card number and the corresponding data are deleted.
Specifically, if n is 1, let Si1 for the portion of card numbers equal to the most frequently occurring station number, store all data for the portion of card numbers including the column "Si 1" as nrep and merge with X by column number (eventually nrep should contain all columns in X and original nrep for all card numbers in the original nrep (i.e., nrep before merging)). As shown in table 4, the number of stations with the highest frequency is 1, the station number is 18, and Si1 is made 18.
Table 4 data examples in database nrep
Figure BDA0001451690880000052
Figure BDA0001451690880000061
Note: for simplicity of description, the same site latitude and longitude data as X in Table 4 are not shown
If n is 2, let Si1 and Si2 of the part of card numbers equal to the two station numbers respectively (no sequence correspondence), store all data of the part of card numbers including the columns "Si 1" and "Si 2" in rep. As shown in table 5, when there are 2 stations with the highest frequency, the two station numbers are randomly assigned to Si1 and Si2 corresponding to the card numbers, where Si1 is set to 9 and Si2 is set to 73 in this embodiment.
TABLE 5 data examples in database rep
Card number Number of vehicle station Frequency of Si1 Si2
9726xxxxxx52 9 8 9 73
9726xxxxxx52 73 8 9 73
9726xxxxxx52 23 1 9 73
9726xxxxxx52 24 1 9 73
Note: for simplicity of description, the same site latitude and longitude data as X in Table 5 are not shown
If n is greater than 2, as shown in table 6, station numbers 14, 15, and 23 all appear 3 times, the euclidean distances between the stations are calculated pairwise using the station longitude and latitude information through the following formula
Figure BDA0001451690880000062
And stored in new column d 12.
Table 6 n greater than 2 data examples
Card number Number of vehicle station Frequency of Station longitude Station latitude
9961xxxxxx61 14 3 118.79168700 32.08972168
9961xxxxxx61 15 3 118.79687500 32.09790039
9961xxxxxx61 23 3 118.75372310 32.03948975
9961xxxxxx61 9 1 118.77893070 32.04327393
Figure BDA0001451690880000063
Wherein the content of the first and second substances,
Figure BDA0001451690880000064
the Long1 is an inbound coordinate longitude, the Lat1 is an inbound coordinate latitude, the Long2 is an outbound coordinate longitude, and the Lat2 is an outbound coordinate latitude.
If d is12Merging the two sites if the two sites are less than or equal to a specified thresholdIn the invention, the threshold is set as 1 km: any station is taken, the frequency of the station is replaced by the sum of the frequencies of the two stations, then Si1 is made to be equal to the station number with the highest frequency, the station 14 and the station 15 can be merged as in the above example, and the merging result is shown in Table 7, wherein Si1 is 14.
Table 7 n greater than 2 processed data examples
Card number Number of vehicle station Frequency of Si1
9961xxxxxx61 14 6 14
9961xxxxxx61 23 3 14
9961xxxxxx61 9 1 14
If the combined vehicles still have parallel connection, Si1 and Si2 are respectively equal to the station numbers with the highest parallel frequency after the two combined vehicles are combined; if the combined parallel number is larger than 2, the travel OD points are considered to be too dispersed and cannot be judged, and the card number and the travel record thereof are deleted. All data in database P that this part of the processed card number contains the column "Si 1" is stored in nrep1 and merged with X by the card number in columns, and nrep1 is then merged in rows and stored in nrep.
Through the above processing, for some card numbers, both the candidate home station and the candidate school station (i.e., Si1 and Si2) are already available, but there are some card numbers, only one of the candidate home station and the candidate school station (i.e., Si1) is available, and another candidate station needs to be found. I.e., step 4 described below, the candidate stop is found for the second time. For only one candidate station Si1 found in step 3, calculating the frequency of occurrence of the station corresponding to the candidate station Si1, where the station corresponding to the present invention is: in the travel record of each card number, there are stops shown by Si1 (indicating that the travel relates to the stop shown by Si1) and none (indicating that the travel does not relate to the stop shown by Si1), only the travel record of Si1 is selected here, and the remaining stops except Si1 are extracted, and are the stops corresponding to Si 1. Finding out another candidate station Si2 of the home or school according to the number of the corresponding stations with the highest occurrence frequency, and executing the following judgment: if the number of the stations is 1, the station is another candidate station Si2 of the card number; if the station number is greater than or equal to 2, merging the station with the highest frequency after the adjacent stations to obtain Si2, and if the station numbers are still parallel after merging, deleting the card number and the corresponding record.
Specifically, the trip record including the candidate station Si1 in the inbound station number or the outbound station number screened for each card number in the data nrep is stored as O2, similarly to step 2, the inbound station number and the outbound station number of each card number in O2 are merged and stored as M2, M2 is sorted according to the card number, all the card numbers with the inbound station number Si1 are deleted, and the frequency of occurrence of the remaining inbound station numbers of each card number in M2 is counted. The station number with the highest frequency of use of each card number is found in the data M2, the number of the station numbers with the highest frequency of use of each card number is calculated by using a loop statement and stored in a new column n', and then the judgment is carried out by using a condition statement. If n' is 1, let Si2 of the part of card numbers equal to the station number with the highest frequency of occurrence. If n' is greater than 2 or equal to2, similar to the step 3, calculating the Euclidean distance between the card numbers by the formula shown in the step 3
Figure BDA0001451690880000071
And stored in the new column d12', if d12And if the number is less than or equal to 1km, merging the two stations, and deleting the card number and the trip record of the station with the highest parallel frequency after merging. All data in M2 for which the partially processed card number contains the column "Si 2" is stored in nrep 'and merged with X by the card number column, and then nrep' is stored in nrep merged by row.
And merging rep and nrep according to rows and storing the merged rep and nrep in a database Q. And deleting travel records containing station numbers except Si1 and Si2 in the station numbers of the incoming station or the outgoing station in the database Q to obtain all the general travel record databases R for traveling only between the candidate stations.
What is next to be done is to determine whether the candidate stops Si1 and Si2 for each card number i in the database R are home stops or school stops.
In step 5, the travel records between the two candidate stations of all the card numbers are selected and sorted according to the station-entering time, and the category of each travel record of each card number is sequentially judged according to the station-entering time and the sequence. According to the work and rest schedule of the middle and primary schools in Nanjing (see Table 8), the travel records between two candidate stations Si1 and Si2 of each card number i are divided into 4 types according to the inbound time: (I) am: the arrival time is before the evening study time in the morning; (II) pm: the arrival time is after the earliest school time in the afternoon; (III) noon 1: the arrival time is in the morning school time range, namely the earliest school time in the morning to the earliest school time in the afternoon; (IV) noon 2: the arrival time is in the morning school time range of the afternoon, namely the earliest school time in the afternoon to the earliest school time in the afternoon. In order to tolerate the error, the range can be appropriately relaxed, and the range is set as (I) am: the arrival time is before 9: 00; (II) pm: the inbound time is after 14: 00; (III) noon 1: the station-entering time is between 11:30 and 13: 00; (IV) noon 2: the inbound time is between 13:00 and 14: 00.
TABLE 8 time of getting on and off in Nanjing primary and middle schools
Figure BDA0001451690880000081
In step 6, the home station and the school station are judged according to the time condition. The specific operation process is as follows: (1) newly establishing a 'family station' column and a 'school station' column for the database R, and assigning a value of 0; (2) screening travel records with the arrival time being more than or equal to 9:00 from a database R, storing the travel records in am, sequencing am according to the card number and the arrival time in sequence, selecting the travel record with the earliest arrival time of each card number i, defining the arrival station number as a station and storing the station in a row of 'station', and storing the station number in the record as a school station in a row of 'school station'; (3) screening a travel record with the arrival time later than or equal to 14:00 from a database R, storing the travel record in pm, deleting card numbers and travel records of the card numbers in pm which are the same as those in am, sequencing pm according to the card numbers and the arrival time in sequence, selecting the travel record with the latest arrival time of each card number i, defining the arrival station number as a station and storing the station in a row 'station', and storing the arrival station number in the record as a school station in the row 'school station'; (4) screening travel records with the arrival time between 11:30 and 13:00 from a database R, storing the travel records in the noon1, deleting card numbers and travel records in the noon1, which are the same as the card numbers in the am and the pm, sorting the noon1 according to the card numbers and the arrival time in sequence, selecting the travel record with the earliest arrival time of each card number i, defining the arrival station number as a school station and storing the arrival station in a row of school stations, and storing the departure station number in the record as a family station in the row of family stations; (5) screening travel records with the arrival time between 13:00 and 14:00 from a database R, storing the travel records in the noon2, deleting card numbers and travel records thereof in the noon2, wherein the card numbers and the travel records are the same as those in am, noon1 and pm, sorting the noon2 according to the card numbers and the arrival time in sequence, selecting the travel record with the earliest arrival time of each card number i, defining the arrival station number as a station and storing the station in a row of the station, and storing the departure station number in the record as a school station in the row of the school station; (6) am, pm, noon1 and noon2 are combined in rows and stored in database R1. The "home station" column and the "school station" column in the database R are deleted, and R1 are merged and stored in the database S by the card number column, so that the data S including the home station and the school station of each learner is obtained. In this step of processing, only card numbers and their records for the 9:00 to 11:30 trips are eliminated.
Thirdly, deleting abnormal trip records, card numbers with too few general days and trip records thereof
In step 7, according to the work and rest time arrangement of the middle and primary schools, the records of departure from the school station before the latest morning study time in the morning and the records of departure from the school station after the latest afternoon study time in the trip records of all the card numbers of the identified school station and the school station are deleted, in the embodiment, the time requirement is properly relaxed, and the records of departure from the school station before 9:00 and the records of departure from the home station after 16:00 are deleted.
In step 8, the circulation algorithm is used to count the number of general days of each card number in the database S and store the number of general days in a new column, "general days", and the lowest threshold of the general days is determined according to the date span of the collected data, that is, the lowest threshold is n when the data of consecutive n weeks are collected, because the subway card swiping record of consecutive three-week working days is adopted as the sample in the embodiment, the card number and the record of the card number with the general days less than 3 days are deleted. The final database Z is obtained so far, and the partial identification results are shown in table 9, which includes the columns "card number", "date of arrival", "time of arrival", "date of departure", "time of departure", "station number of arrival", "station number of departure", "station longitude of arrival", "station latitude of arrival", "station longitude of departure", "station latitude of departure", "station of home", "station of school", "day of general study". The general student number identified in this example is 40% of the total student number in the raw data.
Table 9 partial identification results example
Figure BDA0001451690880000101
Table 9 (continuation) partial recognition result example
Figure BDA0001451690880000102

Claims (8)

1. A general identification method based on subway card swiping data is characterized by comprising the following steps:
(1) collecting IC card swiping data and subway station coordinate data of each subway station within a certain time period, preprocessing original card swiping data, and removing invalid data;
(2) calculating the occurrence frequency of all stations of the card number for each card number, finding out the station with the highest occurrence frequency, and counting the number of the stations with the highest occurrence frequency of each card number;
(3) finding out the home station and the school station candidate station of each card number according to the number of the stations with the highest occurrence frequency, and the method specifically comprises the following steps:
(31) judging the number of stations with the highest frequency of occurrence of each card number, and if the number of the stations is 1, determining that the corresponding station is a candidate station Si1 of the card number; if the number of the stations is 2, the corresponding two stations are respectively the candidate stations Si1 and Si2 of the card number; if the number of stations is more than 2, merging stations with the distance not exceeding a specified threshold, and then making the station with the highest frequency be Si1, if the number of the stations with the highest frequency after merging is 2, the two corresponding stations are respectively candidate stations Si1 and Si2 of the card number, and if the number of the stations with the highest frequency after merging is still more than 2, deleting the card number and the corresponding record;
(32) as for the card number of only one candidate station Si1 found in step (31), calculating the frequency of occurrence of the station corresponding to the candidate station Si1, finding out another candidate station Si2 of the home station or the school station according to the number of the corresponding stations with the highest frequency of occurrence: if the number of stations with the highest frequency of occurrence in the corresponding stations is 1, the station is another candidate station Si2 of the card number; if the number of stations with the highest frequency in the corresponding stations is more than or equal to 2, merging the stations with the distance not exceeding a specified threshold, then setting the station with the highest frequency as Si2, and if the stations are still parallel after merging, deleting the card number and the corresponding record;
(4) dividing travel records among all card number candidate stations into a plurality of categories according to the arrival time according to the work and rest time arrangement of the primary and middle schools;
(5) and determining the home station and the school station according to the earliest or latest arrival time aiming at the travel records falling into each category, so as to obtain the general travel records.
2. The method for general identification based on subway card swiping data according to claim 1, wherein the original card swiping data in the step (1) comprises the following steps: card number, arrival date, arrival time, departure date, departure time, arrival vehicle number, departure vehicle number, and card type.
3. The method for general identification based on subway card swiping data according to claim 2, wherein the preprocessing of the raw data in the step (1) comprises: only the record that the card type is the student card and the arrival date belongs to the working day is reserved, and the abnormal data that the station number of the arrival vehicle is the same as that of the departure station and the arrival date is different from that of the departure station is deleted.
4. A general identification method based on subway card swiping data as claimed in claim 1, wherein the duration of a certain time period in step (1) is not less than three weeks.
5. The method for general identification based on subway card swiping data according to claim 1, wherein the trip category in the step (4) comprises:
am: the arrival time is before the evening study time in the morning;
pm: the arrival time is after the earliest school time in the afternoon;
nonon 1: the arrival time is within the range from the earliest school time in the morning to the earliest school time in the afternoon;
nonon 2: the arrival time is within the range from the earliest time of study in the afternoon to the latest time of study in the afternoon.
6. A general identification method based on subway card swiping data according to claim 5, wherein said step (5) comprises: sorting the four categories according to the card number and the arrival time respectively; for the card number with the travel record belonging to am, determining the station number of the station where the first travel record enters as a station of home, and determining the station number of the station where the corresponding station where the travel record exits as a station of school; for the card number with the travel record belonging to pm, determining the station number of the last travel record as a home station, and determining the corresponding station number of the station as a school station; for the card number with the travel record belonging to the noon1, determining the station number of the first travel record as a school station, and the corresponding station number of the station as a home station; for the card number with the travel record belonging to the noon2, the station number of the last travel record is determined as the school station, and the corresponding station number of the station is the home station.
7. The method for general identification based on subway card swiping data according to claim 1, characterized by further comprising: according to the work and rest time arrangement of the middle and primary schools, the records of departure from the school station before the latest morning study time in the morning and the records of departure from the school station after the latest afternoon study time in the trip records of the card numbers of all the identified station and the school station are deleted.
8. The method for general identification based on subway card swiping data according to claim 1, characterized by further comprising: and deleting the card number with the general days less than the specified threshold days and the trip record thereof.
CN201711043136.0A 2017-10-31 2017-10-31 General recognition method based on subway card swiping data Active CN107818415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711043136.0A CN107818415B (en) 2017-10-31 2017-10-31 General recognition method based on subway card swiping data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711043136.0A CN107818415B (en) 2017-10-31 2017-10-31 General recognition method based on subway card swiping data

Publications (2)

Publication Number Publication Date
CN107818415A CN107818415A (en) 2018-03-20
CN107818415B true CN107818415B (en) 2021-07-09

Family

ID=61604455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711043136.0A Active CN107818415B (en) 2017-10-31 2017-10-31 General recognition method based on subway card swiping data

Country Status (1)

Country Link
CN (1) CN107818415B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681741B (en) * 2018-04-08 2021-11-12 东南大学 Subway commuting crowd information fusion method based on IC card and resident survey data
CN109508815B (en) * 2018-10-19 2021-08-10 东南大学 General activity spatial measure analysis method based on subway IC card data
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN110472813B (en) * 2019-06-24 2023-12-22 广东浤鑫信息科技有限公司 Self-adaptive adjustment method and system for school bus station

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237129A1 (en) * 2001-03-02 2002-09-04 Hitachi, Ltd. Service providing method
CN103198104A (en) * 2013-03-25 2013-07-10 东南大学 Bus station origin-destination (OD) obtaining method based on urban advanced public transportation system
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237129A1 (en) * 2001-03-02 2002-09-04 Hitachi, Ltd. Service providing method
CN103198104A (en) * 2013-03-25 2013-07-10 东南大学 Bus station origin-destination (OD) obtaining method based on urban advanced public transportation system
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
通勤制约度对儿童陪伴出行决策过程的影响;何保红 等;《交通运输系统工程与信息》;20141231;第14卷(第6期);第223-230页 *

Also Published As

Publication number Publication date
CN107818415A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
CN107818415B (en) General recognition method based on subway card swiping data
CN104318324B (en) Shuttle Bus website and route planning method based on taxi GPS records
CN108346292B (en) Urban expressway real-time traffic index calculation method based on checkpoint data
Axhausen et al. 80 weeks of GPS-traces: Approaches to enriching the trip information
Quintini et al. Going separate ways? School-to-work transitions in the United States and Europe
CN106600960A (en) Traffic travel origin and destination identification method based on space-time clustering analysis algorithm
CN105718946A (en) Passenger going-out behavior analysis method based on subway card-swiping data
Armoogum et al. Survey harmonisation with new technologies improvement (SHANTI)
CN112363999B (en) Public traffic passenger flow analysis method, device, equipment and storage medium
CN108122131A (en) Travel pattern and the recognition methods of duty residence based on public bicycles brushing card data
Chen et al. Extracting bus transit boarding stop information using smart card transaction data
CN110969861A (en) Vehicle identification method, device, equipment and computer storage medium
CN111291216B (en) Method and system for analyzing foothold based on face structured data
CN107657006B (en) Public bicycle IC card and subway IC card matching method based on time-space characteristics
CN114501336B (en) Road traffic volume measuring and calculating method and device, electronic equipment and storage medium
CN117056823A (en) Method and system for identifying occupation type of shared bicycle commuter user
CN108681741B (en) Subway commuting crowd information fusion method based on IC card and resident survey data
CN111897810B (en) Method for establishing combined air pollution prevention and control scheme between quantitative different-scale areas
CN114611622B (en) Method for identifying urban-crossing commute crowd by utilizing mobile phone data
CN115376325B (en) Public transport transfer data screening system, method and application based on travel chain
Yue et al. Classification and determinants of high-speed rail stations using multi-source data: A case study in Jiangsu Province, China
Cardell-Oliver et al. CIAM: A data-driven approach for classifying long-term engagement of public transport riders at multiple temporal scales
CN115599985A (en) Target customer identification method and system, electronic device and readable storage medium
CN110570650B (en) Travel path and node flow prediction method based on RFID data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant