CN103699601A - Temporal-spatial data mining-based metro passenger classification method - Google Patents

Temporal-spatial data mining-based metro passenger classification method Download PDF

Info

Publication number
CN103699601A
CN103699601A CN201310683227.6A CN201310683227A CN103699601A CN 103699601 A CN103699601 A CN 103699601A CN 201310683227 A CN201310683227 A CN 201310683227A CN 103699601 A CN103699601 A CN 103699601A
Authority
CN
China
Prior art keywords
time
passenger
days
smart card
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310683227.6A
Other languages
Chinese (zh)
Other versions
CN103699601B (en
Inventor
赵娟娟
张帆
田臣
须成忠
白雪
邹瑜斌
罗俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Beidou Intelligent Technology Co., Ltd.
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310683227.6A priority Critical patent/CN103699601B/en
Publication of CN103699601A publication Critical patent/CN103699601A/en
Application granted granted Critical
Publication of CN103699601B publication Critical patent/CN103699601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a temporal-spatial data mining-based metro passenger classification method, which comprises the steps of 1, basic data calculation comprising intelligent card efficient-statistics and inter-site similarity calculation; 2, data preprocessing; 3, passenger classification. The method has the beneficial effects that a time and space-based user travel law algorithm is used for clustering passengers with similar characteristics to finally finish classifying the passengers into a first class of rarely traveling passengers, a second class of singly temporally regular passengers, a third class of singly spatially regular passengers, a fourth class of temporally and spatially regular passengers and a fifth class of temporally and spatially irregular passengers by analyzing travel characteristics of the passengers; the classification method is effective and accurate, and life characteristics of the passengers can be effectively known about by classifying the passengers.

Description

The Metro Passenger sorting technique of excavating based on space-time data
Technical field
The present invention relates to belong to information data process field, be specifically related to the Metro Passenger sorting technique of excavating based on space-time data.
Background technology
At present, smart card has been widely used in the every field such as bank, medical treatment, traffic, campus, has the advantages such as convenient, fast.Smart card has been widely used in the urban transportation fields such as public transport, subway, taxi, ferry, parking lot as a kind of pay for public transportation means.Due to features such as traffic intelligent card are economical, convenient, traffic intelligent card is also increasing as the passenger of transit trip Payment Methods, passenger transaction data is also more complete and accurate by bus.
Traffic intelligent card is as a kind of paying means, at large data age, how smart card is collected to abundant user's trip data is effectively analyzed and for communications policy and the analysis of passenger's characteristic of life, for building high-quality Public Transport Service, facilitating civic go off daily, effectively understand civic characteristic of life, rule of life has important Research Significance.Replace traditional research of analyzing the traffic intelligent card transaction data of passenger behavior feature by manual research more and more to there is feasibility.
According to statistics, in Shenzhen, select brush SZT card to reach 10,000,000 as the passenger of transit trip Payment Methods.Due to the advantages such as interference that subway freight volume is large, quick, the time is accurate, safe and reliable, be not subject to weather and traffic above-ground, passenger is had to very large attractive force, Shenzhen Metro has become the first-selected transit trip mode of City Residents of Shenzhen, end 2013, Shenzhen Metro has 5 circuits, 131 seat coach put into effect, and the volume of the flow of passengers reaches 2,500,000 people times/day, have accounted for whole Shenzhen and have selected 1/3rd of the transit trip volume of the flow of passengers.
Existing is mainly its similarity of type analysis and the otherness of combined with intelligent card to passenger's classification, and for example student card, the elderly's card, generic card, do not do cluster and have in conjunction with passenger's self feature by bus to passenger.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of Metro Passenger sorting technique of excavating based on space-time data, tool validity and accuracy.
Technical scheme of the present invention comprises a kind of Metro Passenger sorting technique of excavating based on space-time data, comprises step:
S1, basic data are calculated, and comprise that between the effective statistics of smart card and website, similarity is calculated;
The transaction record of S2, data pre-service: S21, filtration disappearance field; S22, by the All Activity record of described smart card according to time sequence, and calculate each record of riding in described smart card; S23, calculate the number of days of always swiping the card of smart card described in each; S24, data result in S22, S23 is gathered; S25, to smart card execution step S21-S24 described in each, until whole described smart card is all processed; S26, Output rusults is added up, calculated number of days probability distribution by bus;
S3, occupant classification: S31, extract the described record by bus of smart card described in each; Whether S32, the judgement number of days of swiping the card are less than threshold value, are to export classification 1: the passenger that seldom goes on a journey, turns to S36; Otherwise carry out S33; S33, utilize the time-based user rule algorithm Tm-ODCluster that goes on a journey, calculate the most intensive time period Sm and time intensive probability P t, and judge whether temporal regularity, be to be temporal regularity, turn to S34, otherwise turn to S35; S34, in described time period Sm, utilize user based on the space rule algorithm Sp-ODCluster that goes on a journey, whether judgement trip place rule, is to export classification 4: the regular passenger in time and space, otherwise output classification 2: single temporal regularity passenger; Turn to afterwards S36; S35, utilize the described user based on the space rule algorithm Sp-ODCluster that goes on a journey, judge between round-the-clock in trip place rule whether, be to export classification 3: the regular passenger in single space, otherwise export classification 5: time and space be irregular passenger all; S36, judge that whether whole described smart cards are all processed, be to turn to S37, otherwise turn to S31; S37, occupant classification finish.
Preferably, described basic data comprises smart card tran list, subway terminal list, subway line table;
Described smart card tran list comprises CardID, TrmnlID, TrnsctTime, TrnsctyType; Wherein, the unique identification that described CardID is smart card; Described TrmnlID is unique sign of subway station card swiping terminal, and described TrnsctTime is charge time, and described TrnsctTime is type out of the station;
Described subway line table comprises RouteID, PathInfo, Type; Wherein, described routeID is line name, and described PathInfo is approach website, and described Type is circuit types.
Preferably, described smart card effectively statistics for test before, in test, after test, all have a smart card of transaction record; Between described website, similarity is calculated as the website quantity judging between website and whether is less than or equal to 1.
Preferably, described step S22 calculates each in described smart card and is recorded as by bus coupling described starting point and the terminal of record by bus, and the described form of record is by bus: the name of station that enters the station, the name of station that sets off, the time of entering the station, departures time, riding time.
Preferably, the described time-based user rule algorithm Tm-ODCluster that goes on a journey comprises,
S331, take sky as cycle, 30 minutes as time spacer segment, calculate the state (0,1) by bus of all time periods of every day;
The number of days T by bus of S332, described each time period of calculating i,
T i = Σ j = 1 j = Dnum ( t ji | t j ( ( i + 1 ) % 48 ) | t j ( ( i + 1 ) % 48 ) ) ,
Wherein, Dnum is total transaction number of days, and i is (1,2,3 ... 48);
S333, find the time period Sm swipe the card the most intensive, and calculate described time intensive of probability P t=Sm/DNUM, wherein, DNUM is total number of days for passenger swipes the card;
If S334 Pt is greater than Time Density threshold value Thrt, is temporal regularity passenger, and turns to step S34; If Pt is less than Time Density threshold value Thrt, is time erratic behavior passenger, and turns to step S35.
Preferably, the described user based on the space rule algorithm Sp-ODCluster that goes on a journey comprises,
All records by bus in S341, query time section T, will described record by bus with (O, D) mark, wherein, O is that access station, D are outlet station, and adds up from O and enter the number of days of taking that D goes out;
Form data recording collection ODLIST(O, D, daynum, timelst), wherein, Daynum is number of days, timelst is time set;
S342, employing OD-cluster algorithm carry out cluster to OD, the similarity between judgement two websites: if two websites are adjacent sites, similarity is 1, otherwise is 0;
S343, take out total number of days Dmax of maximum bunch, the intensive probability P s=Dmax/DNUM of computer memory, wherein, DNUM is total number of days for passenger swipes the card;
If Ps is greater than space density threshold value Thrs, be the regular passenger in space; Otherwise, be space erratic behavior passenger;
S344, space Regularity Analysis finish.
Preferably, described OD-cluster algorithm comprises,
S3421, from described data recording collection ODLIST, order is extracted object P, and has judged whether bunch, is to forward step S3422 to,
Otherwise set up bunch, take centered by described object P, total number of days sets up new bunch of C as the number of days of described object P, described object P is added to described new bunch of C, and described object P is designated processed;
S3422, calculate the distance at described object P and each bunch of center;
If certain CuCi center and object P meet similarity standard, described object P is referred to described bunch Ci, the number of days of total number of days of described bunch of Ci=total number of days+object P-(number of members that in the time collection of object P and bunch Ci, time collection occurs simultaneously),
Otherwise set up take centered by object P, total number of days sets up new bunch of C as the number of days of object P, and described object P added to described new bunch of C;
S3423, repeat above-mentioned steps, until all records are all processed and be included into certain bunch of Ci, and by total number of days, all bunches of Ci are sorted from big to small.
Beneficial effect of the present invention, by the rule algorithm of going on a journey of the user based on time, space, by the trip characteristics to passenger, analyze, passenger's cluster of similar features will be there is, and finally complete occupant classification, classification 1: the passenger that seldom goes on a journey, classification 2: single temporal regularity passenger, classification 3: the regular passenger in single space, classification 4: the regular passenger in time and space, classification 5: time and space be irregular passenger all.Above-mentioned sorting technique has validity and accuracy, by the division to passenger, can effectively understand passenger's characteristic of life.
Accompanying drawing explanation
Fig. 1 is the general flow chart of Metro Passenger sorting technique of the present invention.
Fig. 2 is the data pretreatment process figure of Metro Passenger sorting technique of the present invention.
Fig. 3 is the Tm-ODCluster process flow diagram of Metro Passenger sorting technique of the present invention.
Fig. 4 is the Sp-ODCluster process flow diagram of Metro Passenger sorting technique of the present invention.
Fig. 5 is the graph of a relation that passenger of the present invention rides between number of days and passengers quantity.
Fig. 6 is the occupant classification figure that had transaction in the test duration of the present invention.
Fig. 7 is the passenger's classification summary view that has transaction record on August of the present invention 25.
Fig. 8 is the passenger's classification summary view that has transaction record on August of the present invention 24.
Fig. 9 is the passenger's classification summary view that has transaction record on August of the present invention 21.
Figure 10 is 2013-8-19~2013-8-25 day occupant classification comparison diagram of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
The invention provides a kind of Metro Passenger sorting technique of excavating based on space-time data, as shown in Figure 1, comprise step:
Step S1, basic data are calculated, and comprise that between the effective statistics of smart card and website, similarity is calculated;
Step S2, data pre-service, as shown in Figure 2:
The transaction record of S21, filtration disappearance field; As lack the transaction record of riding time field, card number field, site information etc.
S22, using the card number of smart card as passenger's unique identification, by the All Activity record of described smart card according to time sequence, wherein, transaction record comprises departures, inbound communication, and calculates each record of riding in described smart card; Mate passenger each by bus starting point and terminal of record, every time by bus the form of record be: the name of station that enters the station, the name of station that sets off, the time of entering the station, departures time, riding time, wherein the unit of riding time is minute.
S23, calculate the number of days of always swiping the card of smart card described in each; If card number is 1234567, always have 20 days and check card, be counted as 20.
S24, data result in S22, S23 is gathered; Wherein, data result form is that (card number, { records 1}, { recording by bus two } by bus ..., record by bus N}, by bus number of days).
S25, to smart card execution step S21-S24 described in each, until whole described smart card is all processed; S26, Output rusults is added up, calculated number of days probability distribution by bus;
Step S3, occupant classification:
S31, extract the described record by bus of smart card described in each;
Whether S32, the judgement number of days of swiping the card are less than threshold value, are to export classification 1: the passenger that seldom goes on a journey, turns to S36; Otherwise carry out S33;
S33, utilize the time-based user rule algorithm Tm-ODCluster that goes on a journey, calculate the most intensive time period Sm and time intensive probability P t, and judge whether temporal regularity, be to be temporal regularity, turn to S34, otherwise turn to S35;
S34, in described time period Sm, utilize user based on the space rule algorithm Sp-ODCluster that goes on a journey, whether judgement trip place rule, is to export classification 4: the regular passenger in time and space, otherwise output classification 2: single temporal regularity passenger; Turn to afterwards S36;
S35, utilize the described user based on the space rule algorithm Sp-ODCluster that goes on a journey, judge between round-the-clock in trip place rule whether, be to export classification 3: the regular passenger in single space, otherwise export classification 5: time and space be irregular passenger all;
S36, judge that whether whole described smart cards are all processed, be to turn to S37, otherwise turn to S31;
S37, occupant classification finish.
The embodiment of the present invention, by the rule algorithm of going on a journey of the user based on time, space, by the trip characteristics to passenger, analyze, passenger's cluster of similar features will be there is, and finally complete occupant classification, classification 1: the passenger that seldom goes on a journey, classification 2: single temporal regularity passenger, classification 3: the regular passenger in single space, classification 4: the regular passenger in time and space, classification 5: time and space be irregular passenger all.Above-mentioned sorting technique has validity and accuracy, by the division to passenger, can effectively understand passenger's characteristic of life.
Wherein, basic data comprises smart card tran list, subway terminal list, subway line table;
Described smart card tran list comprises CardID, TrmnlID, TrnsctTime, TrnsctyType; Wherein, the unique identification that described CardID is smart card; Described TrmnlID is unique sign of subway station card swiping terminal, and described TrnsctTime is charge time, and described TrnsctTime is type out of the station; Enter the station 21 and departures 22, use respectively, sign.
Described subway line table comprises RouteID, PathInfo, Type; Wherein, described routeID is line name, and described PathInfo is approach website, and described Type is circuit types, up-downgoing: up 1, descending 2.
Described smart card effectively statistics for test before, in test, after test, all have a smart card of transaction record, as use the transaction data of 2013-05-01 to 2013-07-01 in the time as data source, effectively smart card need to meet before 2013-05-01, to have after having transaction record, 2013-07-01 between transaction record, 2013-05-01 to 2013-07-01 transaction record.
Between described website, similarity is calculated as the website quantity judging between website and whether is less than or equal to 1, and the similarity between two websites represents with Boolean, as shown in table 1:
Table 1
? Luohu International trade Old street Grand theater Science hall
Luohu 1 1 0 0 0 ?
International trade 1 1 1 0 0 ?
Old street 0 1 1 1 0 ?
Grand theater 0 0 1 1 1 ?
Science hall 0 0 0 1 1 ?
? ? ? ? ? ?
Preferably, described step S22 calculates each in described smart card and is recorded as by bus coupling described starting point and the terminal of record by bus, and the described form of record is by bus: the name of station that enters the station, the name of station that sets off, the time of entering the station, departures time, riding time.Further comprise and filter three kinds of records by bus: 1, disappearance starting point; 2, disappearance terminal; 3, riding time is greater than the record by bus of threshold value, if certain riding time is 23 hours, finally from " inbound to departures ", all of every passenger is ridden by the time-sequencing that enters the station for unit.
Preferably, as shown in Figure 3, the described time-based user rule algorithm Tm-ODCluster that goes on a journey comprises,
S331, take sky as cycle, 30 minutes as time spacer segment, calculate the state (0,1) by bus of all time periods of every day;
In this step, (9:30-10:00,0) represents that 9:30 is to the record of not riding between 10:00, and (9:30-10:00,1) represents that 9:30 is to there being record by bus between 10:00.As shown in table 2, in table, every a line has 48 row, is expressed as (ti1, ti2, ti3 ..., tij ...), wherein, i is that i days, j represent the time period, ti1 is the state by bus of i days 0:00~0:29, and ti2 is that i days 0:30 are to the state of riding of 0:59 ..., the like.
Table 2
day\time 1 14 15 16 17 18 19 20 35 36 37 38 39 43 44 45
1 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 1 0 0 ?
2 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 1 0 ?
3 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
4 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
5 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
6 0 ? 0 ? 1 0 0 0 0 ? 0 0 0 0 0 ? 0 0 0 ?
7 0 ? 0 ? 0 0 0 0 1 ? 0 0 0 0 0 ? 0 0 1 ?
8 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
9 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
10 0 ? 1 ? 0 0 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
11 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
12 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 1 ?
13 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 1 0 0 ?
14 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 1 ?
15 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
16 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
17 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 1 ?
18 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 0 ?
19 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
20 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
21 0 ? 0 ? 0 1 0 0 0 ? 0 0 1 0 0 ? 0 0 0 ?
22 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 0 ?
23 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
24 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
25 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
26 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
27 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
28 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
29 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 1 0 ? 0 0 0 ?
30 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
31 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 1 0 ? 0 0 0 ?
32 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
33 0 ? 0 ? 0 0 0 0 1 ? 0 0 0 1 0 ? 0 0 0 ?
34 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
35 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
36 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
37 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
38 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 1 0 ? 0 0 0 ?
39 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 1 0 ? 0 0 0 ?
40 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 1 ?
41 0 ? 0 ? 0 1 0 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
42 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
43 0 ? 0 ? 0 0 0 1 0 ? 0 0 0 0 0 ? 0 0 1 ?
44 0 ? 0 ? 0 0 1 0 0 ? 0 0 0 0 0 ? 0 0 1 ?
45 0 ? 0 ? 0 0 0 0 0 ? 0 0 0 0 0 ? 0 0 0 ?
The number of days T by bus of S332, described each time period of calculating i, will be appointed as one and a half hours the time period, whole day has 48 time period: T1, T2, T3 ..., T48, i.e. 0:00-1:29,0:30-1:59,1:00-2:29 ... 23:30-00:59,
T i = Σ j = 1 j = Dnum ( t ji | t j ( ( i + 1 ) % 48 ) | t j ( ( i + 1 ) % 48 ) ) ,
Wherein, Dnum is total transaction number of days, and i is (1,2,3 ... 48); It is as shown in table 3,
Table 3
1 14 15 16 17 18 19 20 35 36 37 38 39 43 44 45
0 ? 2 8 30 40 35 13 2 0 1 18 18 17 0 3 23 21 20 ?
The number of days of swiping the card of S333, more all time periods, finds the time period Sm that swipes the card the most intensive, and the number of times of swiping the card in time period Sm is maximum, as T17:8:00 in table 3 swipes the card the most intensive to 9:30; And calculate described time intensive of probability P t=Sm/DNUM, wherein, DNUM is total number of days for passenger swipes the card; As being time intensive in table 2 40/45,0.889.
If S334 Pt is greater than Time Density threshold value Thrt, is temporal regularity passenger, and turns to step S34; If Pt is less than Time Density threshold value Thrt, is time erratic behavior passenger, and turns to step S35.
As shown in Figure 4,, the described user based on the space rule algorithm Sp-ODCluster that goes on a journey comprises,
All records by bus in S341, query time section T, will described record by bus with (O, D) mark, wherein, O is that access station, D are outlet station, and adds up from O and enter the number of days of taking that D goes out;
Form data recording collection ODLIST(O, D, daynum, timelst), wherein, Daynum is number of days, timelst is time set;
S342, employing OD-cluster algorithm carry out cluster to OD, the similarity between judgement two websites: if two websites are adjacent sites, similarity is 1, otherwise is 0;
General passenger only can get on or off the bus from the nearest website in both sides, position, for example: Tang Langhe university city station is two adjacent stations, based on this (pool is bright, Window on the World) and (university city, Window on the World), may be twice similar trip.
S343, take out total number of days Dmax of maximum bunch, the intensive probability P s=Dmax/DNUM of computer memory, wherein, DNUM is total number of days for passenger swipes the card;
If Ps is greater than space density threshold value Thrs, be the regular passenger in space; Otherwise, be space erratic behavior passenger;
S344, space Regularity Analysis finish.
The central point that defines each class bunch is the maximum website (going out, enter two websites) by bus of number of days by bus in this type of.OD-cluster algorithm:
Input: data recording collection ODLIST(access station, outlet station, number of days, time collection), by number of days, arrange from big to small.
Output: the form of each bunch is (central point, total number of days, the record of this bunch), and central point form is: { access station, outlet station }
S3421, from described data recording collection ODLIST, order is extracted object P, and has judged whether bunch, is to forward step S3422 to,
Otherwise set up bunch, take centered by described object P, total number of days sets up new bunch of C as the number of days of described object P, described object P is added to described new bunch of C, and described object P is designated processed;
S3422, calculate the distance at described object P and each bunch of center;
If certain CuCi center and object P meet similarity standard, described object P is referred to described bunch Ci, the number of days of total number of days of described bunch of Ci=total number of days+object P-(number of members that in the time collection of object P and bunch Ci, time collection occurs simultaneously), avoids calculating on the same day repeatedly;
Otherwise set up take centered by object P, total number of days sets up new bunch of C as the number of days of object P, and described object P added to described new bunch of C;
S3423, repeat above-mentioned steps, until all records are all processed and be included into certain bunch of Ci, and by total number of days, all bunches of Ci are sorted from big to small.
Experiment test
Test 1, by analyzing raw data, calculates passenger getting on/off OD matrix and the number of days of riding.And with (number of days of riding, number, number percent) statistics, calculate passenger's number of days probability distribution of riding, as shown in Figure 5, can find out that number of days and number that passenger takes subway are inversely proportional to, show that most of passenger seldom adopts subway trip or seldom trip, such as old man etc.
Test 2, in conjunction with rule algorithm Tm-ODCluster algorithm and the user based on the space rule algorithm Sp-ODCluster that goes on a journey that goes on a journey of the time-based user based on space, two all passengers in the middle of the month (supposing a corresponding passenger of card) are done and classified, as shown in Figure 6, can find out that the passenger who seldom takes subway or few trip occupies the majority, the presentation of results same problem of the above results and test 1.
Test 3, utilizes the result (card number, classification) of testing 2 pairs of every occupant classification, contrasts the transaction record of every day, and the occupant classification of every day is added up.As shown in Fig. 7, Fig. 8, Fig. 9, be respectively Sunday passenger's classification sum statistics of (2013-8-25), Saturday (2013-8-24), Wednesday (2013-8-21); As shown in figure 10,2013-8-21~2013-8-25 occupant classification on working day comparison.
From the above results, can find out in working day, occupant classification is rule relatively, and space-time rule passenger occupies the majority, and single trip space rule passenger is minimum, illustrates that taking subway passenger on and off duty, that go to school and leave school etc. occupies the majority, and the passenger of flextime occupies the minority.
By the rule algorithm of going on a journey of the user based on time, space, by the trip characteristics to passenger, analyze, passenger's cluster of similar features will be there is, and finally complete occupant classification, classification 1: the passenger that seldom goes on a journey, classification 2: single temporal regularity passenger, classification 3: the regular passenger in single space, classification 4: the regular passenger in time and space, classification 5: time and space be irregular passenger all.Above-mentioned sorting technique has validity and accuracy, by the division to passenger, can effectively understand passenger's characteristic of life, is convenient to understand, instruct people's trip to plan and formulate matched subway control measures.
The above the specific embodiment of the present invention, does not form limiting the scope of the present invention.Various other corresponding changes and distortion that any technical conceive according to the present invention has been done, all should be included in the protection domain of the claims in the present invention.

Claims (7)

1. a Metro Passenger sorting technique of excavating based on space-time data, is characterized in that, comprises step:
S1, basic data are calculated, and comprise that between the effective statistics of smart card and website, similarity is calculated;
S2, data pre-service:
The transaction record of S21, filtration disappearance field;
S22, by the All Activity record after described smart card filtration treatment according to time sequence, and calculate each record of riding in described smart card;
S23, calculate the number of days of always swiping the card of smart card described in each;
S24, data result in S22, S23 is gathered;
S25, to smart card execution step S21-S24 described in each, until whole described smart card is all processed;
S26, Output rusults is added up, calculated number of days probability distribution by bus;
S3, occupant classification:
S31, extract the described transaction record of smart card described in each;
Whether S32, the judgement number of days of swiping the card are less than threshold value, are to export classification 1: the passenger that seldom goes on a journey, turns to S36; Otherwise carry out S33;
S33, utilize the time-based user rule algorithm Tm-ODCluster that goes on a journey, calculate the most intensive time period Sm and time intensive probability P t, and judge whether temporal regularity, be to be temporal regularity, turn to S34, otherwise turn to S35;
S34, in described time period Sm, utilize user based on the space rule algorithm Sp-ODCluster that goes on a journey, whether judgement trip place rule, is to export classification 4: the regular passenger in time and space, otherwise output classification 2: single temporal regularity passenger; Turn to afterwards S36;
S35, utilize the described user based on the space rule algorithm Sp-ODCluster that goes on a journey, judge between round-the-clock in trip place rule whether, be to export classification 3: the regular passenger in single space, otherwise export classification 5: time and space be irregular passenger all;
S36, judge that whether whole described smart cards are all processed, be to turn to S37, otherwise turn to S31;
S37, occupant classification finish.
2. Metro Passenger sorting technique according to claim 1, is characterized in that, described basic data comprises smart card tran list, subway terminal list, subway line table;
Described smart card tran list comprises CardID, TrmnlID, TrnsctTime, TrnsctyType; Wherein, the unique identification that described CardID is smart card; Described TrmnlID is unique sign of subway station card swiping terminal, and described TrnsctTime is charge time, and described TrnsctTime is type out of the station;
Described subway line table comprises RouteID, PathInfo, Type; Wherein, described routeID is line name, and described PathInfo is approach website, and described Type is circuit types.
3. Metro Passenger sorting technique according to claim 1, is characterized in that, described smart card effectively statistics for test before, in test, after test, smart card all has transaction record; The website quantity that between described website, similarity is calculated as between judgement website is less than or equal to 1.
4. Metro Passenger sorting technique according to claim 1, it is characterized in that, described step S22 calculates each in described smart card and is recorded as by bus coupling described starting point and the terminal of record by bus, the described form of record is by bus: the name of station that enters the station, the name of station that sets off, the time of entering the station, departures time, riding time.
5. Metro Passenger sorting technique according to claim 1, is characterized in that, the described time-based user rule algorithm Tm-ODCluster that goes on a journey comprises,
S331, take sky as cycle, 30 minutes as time spacer segment, calculate the state (0,1) by bus of all time periods of every day;
The number of days T by bus of S332, described each time period of calculating i,
T i = Σ j = 1 j = Dnum ( t ji | t j ( ( i + 1 ) % 48 ) | t j ( ( i + 1 ) % 48 ) ) ,
Wherein, Dnum is total transaction number of days, and i is (1,2,3 ... 48);
S333, find the time period Sm swipe the card the most intensive, and calculate described time intensive of probability P t=Sm/DNUM, wherein, DNUM is total number of days for passenger swipes the card;
If S334 Pt is greater than Time Density threshold value Thrt, is temporal regularity passenger, and turns to step S34; If Pt is less than Time Density threshold value Thrt, is time erratic behavior passenger, and turns to step S35.
6. Metro Passenger sorting technique according to claim 1, is characterized in that, the described user based on the space rule algorithm Sp-ODCluster that goes on a journey comprises,
All records by bus in S341, query time section T, will described record by bus with (O, D) mark, wherein, O is that access station, D are outlet station, and adds up from O and enter the number of days of taking that D goes out;
Form data recording collection ODLIST(O, D, daynum, timelst), wherein, Daynum is number of days, timelst is time set;
S342, employing OD-cluster algorithm carry out cluster to OD, the similarity between judgement two websites: if two websites are adjacent sites, similarity is 1, otherwise is 0;
S343, take out total number of days Dmax of maximum bunch, the intensive probability P s=Dmax/DNUM of computer memory, wherein, DNUM is total number of days for passenger swipes the card;
If Ps is greater than space density threshold value Thrs, be the regular passenger in space; Otherwise, be space erratic behavior passenger;
S344, space Regularity Analysis finish.
7. Metro Passenger sorting technique according to claim 6, is characterized in that, described OD-cluster algorithm comprises,
S3421, from described data recording collection ODLIST, order is extracted object P, and has judged whether bunch, is to forward step S3422 to,
Otherwise set up bunch, take centered by described object P, total number of days sets up new bunch of C as the number of days of described object P, described object P is added to described new bunch of C, and described object P is designated processed;
S3422, calculate the distance at described object P and each bunch of center;
If certain CuCi center and object P meet similarity standard, described object P is referred to described bunch Ci, the number of days of total number of days of described bunch of Ci=total number of days+object P-(number of members that in the time collection of object P and bunch Ci, time collection occurs simultaneously),
Otherwise set up take centered by object P, total number of days sets up new bunch of C as the number of days of object P, and described object P added to described new bunch of C;
S3423, repeat above-mentioned steps, until all records are all processed and be included into certain bunch of Ci, and by total number of days, all bunches of Ci are sorted from big to small.
CN201310683227.6A 2013-12-12 2013-12-12 Temporal-spatial data mining-based metro passenger classification method Active CN103699601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310683227.6A CN103699601B (en) 2013-12-12 2013-12-12 Temporal-spatial data mining-based metro passenger classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310683227.6A CN103699601B (en) 2013-12-12 2013-12-12 Temporal-spatial data mining-based metro passenger classification method

Publications (2)

Publication Number Publication Date
CN103699601A true CN103699601A (en) 2014-04-02
CN103699601B CN103699601B (en) 2017-02-08

Family

ID=50361129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310683227.6A Active CN103699601B (en) 2013-12-12 2013-12-12 Temporal-spatial data mining-based metro passenger classification method

Country Status (1)

Country Link
CN (1) CN103699601B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361502A (en) * 2014-04-24 2015-02-18 科技谷(厦门)信息技术有限公司 Analysis method of passenger behavior data
CN105091889A (en) * 2014-04-23 2015-11-25 华为技术有限公司 Hotspot path determination method and hotspot path determination equipment
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data
CN106529711A (en) * 2016-11-02 2017-03-22 东软集团股份有限公司 Method and apparatus for predicting user behavior
CN106549993A (en) * 2015-09-21 2017-03-29 阿里巴巴集团控股有限公司 A kind of Bus stop planning method and apparatus
CN106571056A (en) * 2015-10-10 2017-04-19 上海宝信软件股份有限公司 Method for monitoring big data of internal vehicle system
CN106571059A (en) * 2015-10-10 2017-04-19 上海宝信软件股份有限公司 Internal vehicle system big data monitoring system
CN106779116A (en) * 2016-11-29 2017-05-31 清华大学 A kind of net based on spatiotemporal data structure about car client reference method
CN107463564A (en) * 2016-06-02 2017-12-12 华为技术有限公司 The characteristic analysis method and device of data in server
CN107657006A (en) * 2017-09-22 2018-02-02 东南大学 Public bicycles IC-card and subway IC card matching process based on space-time characterisation
CN107844805A (en) * 2017-11-15 2018-03-27 中国联合网络通信集团有限公司 Method and device based on public transport card information identification a suspect
CN110097138A (en) * 2019-05-11 2019-08-06 北京京投亿雅捷交通科技有限公司 A kind of gauze passenger representation data library application system and method
CN110134865A (en) * 2019-04-26 2019-08-16 重庆大学 A kind of commuting passenger's social recommendation method and platform based on urban public transport trip big data
CN110533483A (en) * 2019-09-05 2019-12-03 中国联合网络通信集团有限公司 A kind of occupant classification method and system based on trip characteristics
CN113128282A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Crowd category dividing method and device and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731456A (en) * 2005-08-04 2006-02-08 浙江大学 Bus passenger traffic statistical method based on stereoscopic vision and system therefor
US20100098289A1 (en) * 2008-07-09 2010-04-22 Florida Atlantic University System and method for analysis of spatio-temporal data
CN102097002A (en) * 2010-11-22 2011-06-15 东南大学 Method and system for acquiring bus stop OD based on IC card data
CN103020284A (en) * 2012-12-28 2013-04-03 刘建勋 Method for recommending taxi pickup point based on time-space clustering
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731456A (en) * 2005-08-04 2006-02-08 浙江大学 Bus passenger traffic statistical method based on stereoscopic vision and system therefor
US20100098289A1 (en) * 2008-07-09 2010-04-22 Florida Atlantic University System and method for analysis of spatio-temporal data
CN102097002A (en) * 2010-11-22 2011-06-15 东南大学 Method and system for acquiring bus stop OD based on IC card data
CN103020284A (en) * 2012-12-28 2013-04-03 刘建勋 Method for recommending taxi pickup point based on time-space clustering
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105091889A (en) * 2014-04-23 2015-11-25 华为技术有限公司 Hotspot path determination method and hotspot path determination equipment
CN105091889B (en) * 2014-04-23 2018-10-02 华为技术有限公司 A kind of determination method and apparatus of hotspot path
CN104361502A (en) * 2014-04-24 2015-02-18 科技谷(厦门)信息技术有限公司 Analysis method of passenger behavior data
CN106549993A (en) * 2015-09-21 2017-03-29 阿里巴巴集团控股有限公司 A kind of Bus stop planning method and apparatus
CN106571056B (en) * 2015-10-10 2019-06-21 上海宝信软件股份有限公司 Internal vehicle system big data monitoring method
CN106571056A (en) * 2015-10-10 2017-04-19 上海宝信软件股份有限公司 Method for monitoring big data of internal vehicle system
CN106571059A (en) * 2015-10-10 2017-04-19 上海宝信软件股份有限公司 Internal vehicle system big data monitoring system
CN106571059B (en) * 2015-10-10 2019-06-21 上海宝信软件股份有限公司 Internal vehicle system big data monitors system
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data
CN107463564A (en) * 2016-06-02 2017-12-12 华为技术有限公司 The characteristic analysis method and device of data in server
CN106529711A (en) * 2016-11-02 2017-03-22 东软集团股份有限公司 Method and apparatus for predicting user behavior
CN106529711B (en) * 2016-11-02 2020-06-19 东软集团股份有限公司 User behavior prediction method and device
CN106779116A (en) * 2016-11-29 2017-05-31 清华大学 A kind of net based on spatiotemporal data structure about car client reference method
CN106779116B (en) * 2016-11-29 2020-11-10 清华大学 Online taxi appointment customer credit investigation method based on time-space data mining
CN107657006A (en) * 2017-09-22 2018-02-02 东南大学 Public bicycles IC-card and subway IC card matching process based on space-time characterisation
CN107657006B (en) * 2017-09-22 2020-12-11 东南大学 Public bicycle IC card and subway IC card matching method based on time-space characteristics
CN107844805A (en) * 2017-11-15 2018-03-27 中国联合网络通信集团有限公司 Method and device based on public transport card information identification a suspect
CN107844805B (en) * 2017-11-15 2020-10-27 中国联合网络通信集团有限公司 Method and device for identifying suspicious personnel based on bus card information
CN110134865A (en) * 2019-04-26 2019-08-16 重庆大学 A kind of commuting passenger's social recommendation method and platform based on urban public transport trip big data
CN110134865B (en) * 2019-04-26 2023-03-24 重庆大学 Commuting passenger social contact recommendation method and platform based on urban public transport trip big data
CN110097138A (en) * 2019-05-11 2019-08-06 北京京投亿雅捷交通科技有限公司 A kind of gauze passenger representation data library application system and method
CN110533483A (en) * 2019-09-05 2019-12-03 中国联合网络通信集团有限公司 A kind of occupant classification method and system based on trip characteristics
CN113128282A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Crowd category dividing method and device and terminal

Also Published As

Publication number Publication date
CN103699601B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN103699601A (en) Temporal-spatial data mining-based metro passenger classification method
Cao et al. Comparing importance-performance analysis and three-factor theory in assessing rider satisfaction with transit
Munizaga et al. Estimation of a disaggregate multimodal public transport Origin–Destination matrix from passive smartcard data from Santiago, Chile
Schakenbos et al. Valuation of a transfer in a multimodal public transport trip
Medina Inferring weekly primary activity patterns using public transport smart card data and a household travel survey
Rajbhandari et al. Estimation of bus dwell times with automatic passenger counter information
Prillwitz et al. Moving towards sustainability? Mobility styles, attitudes and individual travel behaviour
Hess Effect of free parking on commuter mode choice: Evidence from travel diary data
Schoner et al. Catalysts and magnets: Built environment and bicycle commuting
Thill et al. Trip making, induced travel demand, and accessibility
Egu et al. Investigating day-to-day variability of transit usage on a multimonth scale with smart card data. A case study in Lyon
Liu et al. Understanding the determinants of young commuters’ metro-bikeshare usage frequency using big data
CN103699801B (en) Temporally and spatially regular subway passenger clustering and edge detecting method
Zhao et al. Understanding temporal and spatial travel patterns of individual passengers by mining smart card data
Suman et al. Perception of potential bus users and impact of feasible interventions to improve quality of bus services in Delhi
Tushara et al. Mode choice modelling for work trips in Calicut City
CN106504525A (en) OD matrixes generation technique and its applied research based on IC-card data
Kaewkluengklom et al. Investigation of changes in passenger behavior using longitudinal smart card data
Li et al. Observing the characteristics of multi-activity trip chain and its influencing mechanism
Bouman et al. Detecting activity patterns from smart card data
Eisenmann et al. Are cars used differently in Germany than in California? Findings from annual car-use profiles
Bouman et al. Recognizing demand patterns from smart card data for agent-based micro-simulation of public transport
Harbrecht et al. Behavior-oriented modeling of electric vehicle load profiles: A stochastic simulation model considering different household characteristics, charging decisions and locations
Bierlaire et al. Analysis of driver's response to real-time information in Switzerland
Chung et al. Structural equation models of day-to-day activity participation and travel behavior in a developing country

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200107

Address after: 518000 Guangdong city of Shenzhen province Qianhai Shenzhen Hong Kong cooperation zone before Bay Road No. 1 building 201 room A

Patentee after: Shenzhen Beidou Intelligent Technology Co., Ltd.

Address before: 1068 No. 518055 Guangdong city in Shenzhen Province, Nanshan District City Xili University School Avenue

Patentee before: Shenzhen Advanced Technology Research Inst.

TR01 Transfer of patent right