CN107316108A - A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology - Google Patents
A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology Download PDFInfo
- Publication number
- CN107316108A CN107316108A CN201710463795.3A CN201710463795A CN107316108A CN 107316108 A CN107316108 A CN 107316108A CN 201710463795 A CN201710463795 A CN 201710463795A CN 107316108 A CN107316108 A CN 107316108A
- Authority
- CN
- China
- Prior art keywords
- passenger
- feature
- nearest
- behavior
- different
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000694 effects Effects 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000013461 design Methods 0.000 claims abstract description 13
- 230000003993 interaction Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000010276 construction Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000003442 weekly effect Effects 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 claims 24
- 238000000605 extraction Methods 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Fuzzy Systems (AREA)
- Development Economics (AREA)
- Mathematical Physics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Sliding window multiple features Forecasting Methodology is chosen the invention discloses a kind of citizens' activities public bus network.This method is used constructs sample based on sliding window;Sample characteristics attribute is constructed using rider history behavior record in 139 day set time window, by multiple sliding window, covers different time intervals to construct many parts of training samples;In each time window, design feature attribute in terms of interaction feature Attribute class, the characteristic attribute class of passenger's mass transit card different type feature of the characteristic attribute class, passenger of characteristic attribute class, different line features from rider history travel behaviour feature on specific public bus network etc. seven;Using classical accuracy precision, recall rate recall and F1 value as evaluating standard, final scoring is ranked up according to F1 values.The present invention is using the sample training model of construction, and method is fast, reliably, effectively.
Description
Technical field
Chosen the present invention relates to public bus network, more particularly to one kind chooses sliding window for citizens' activities public bus network
Multiple features are predicted, belong to data mining technology field.
Background technology
As China's economic growth and urbanization rate are continuously increased, citizens' activities demand constantly increases, while traffic is gathered around
It is stifled also increasingly serious.The history travel behaviour of fixed passenger is analyzed and excavated, prediction passenger future on fixed circuit
Trip mode, providing information symmetrical and safe outside environment for numerous passengers has important directive significance.
The content of the invention
Present invention aims at there is provided a kind of citizen that sliding window multiple features are constructed based on fixed citizen's historical behavior
Trip public bus network chooses Forecasting Methodology, provides information symmetrical for numerous passengers, the outside environment of safety and comfort has important meaning
Justice.
The object of the invention is achieved through the following technical solutions:
A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology, comprises the following steps:
1) sample is constructed using based on sliding window;Use rider history behavior record structure in 139 day set time window
Sample characteristics attribute is made, by multiple sliding window, covers different time intervals to construct many parts of training samples;
In each time window, characteristic attribute class, the feature of different line features from rider history travel behaviour feature
Interaction feature Attribute class, the characteristic attribute of passenger's mass transit card different type feature of Attribute class, passenger on specific public bus network
The interaction feature Attribute class of the Behavior law of class, different passenger types on specific public bus network, passenger's mass transit card hair fastener place
Characteristic attribute class, the characteristic attribute class and reactions change of feature become between passenger mass transit card hair fastener place and fixed public bus network
The aspect design feature attribute of characteristic attribute seven of the ratio class of gesture;Per category feature again from statistics class, time class, ratio trend class,
The coding several classifications of class carry out specific design;
2) model evaluation using classical accuracy precision, recall rate recall and F1 value as evaluating standard, most
Final review point is ranked up according to F1 values;Calculation formula is as follows:
Each sample is together decided on by card_id and with line_name, and wherein card_id represents mass transit card unique ID number,
Line_name represents public bus network title, and whether sample class label is then held mass transit card passenger in following fixed time period and can
Go on a journey to determine on line_name circuits.
Preferably, the characteristic attribute class of the rider history travel behaviour feature includes passenger behavior on all public bus networks
Sequential category feature, passenger's time category feature, the ratio trend category feature and passenger different classes of attribute of passenger's trip change
Feature;
Wherein, the sequential category feature of passenger behavior is to count in nearest 12 hours of each passenger, most on all public bus networks
Nearly 1,3,7,14,28,56,84,112, total degree of being ridden on all public bus networks in 139 days;
Passenger's time category feature refers to passenger's average riding interval number of days, the nearest bus card-reading exchange hour of passenger,
User enlivens hourage, trip number of times and is more than between all numbers of all numbers of 1 time, trip number of times more than 2 times, average charge time every other day
Number, average number of times of swiping the card weekly.
The ratio trend category feature of passenger's trip change:In view of the variation tendency influence of rider history behavior, Cheng Kehang
For number of times be more than 2 times all number accountings, passenger nearest 1, number of swiping the card for 2,4 times 4 nearest 2,4,8 swipe the card several accountings, all footlines
For number of times head office be number of times accounting, working day behavior number of times in behavior number of times accounting etc. of always swiping the card, this category feature can be to passenger
By bus rule portrayed.
The feature of the different classes of attribute of passenger.Different classes of passenger is on future trip and influence, working clan's travel time
Rule, old man's trip is affected by other factors larger, is different features by 7 kinds of different mass transit card Type mappings.
Preferably, the characteristic attribute class of the different line features multiplies including circuit sequential class statistical nature, circuit history
The variation tendency category feature and public bus network coding characteristic of seat amount;
Wherein circuit sequential class statistical nature, which refers to that trip of the history volume of the flow of passengers of different circuits on passenger is present, influences, right
The volume of the flow of passengers of the every circuit respectively to nearest 12 hours, nearest 1,3,7,14,28,56,84,112,139 days is counted, and is given
Time window in weekend, the working day total volume of the flow of passengers of passenger counted;To weekend and the work maximum volume of the flow of passengers of per day, history
Counted;
The variation tendency category feature of the circuit history seating amount refers to the presence that the change of the history volume of the flow of passengers is gone on a journey to passenger
Influence is nearest to each circuit 1,2,4 weeks nearest 2,4,8 weeks interior volume of the flow of passengers ratio construct feature;
The public bus network coding characteristic refers to that the website number of different circuit locations and daily circuit is selected passenger
There is influence in following travel route, mainly there is different line characteristics, every circuit website number feature.
Preferably, interaction feature Attribute class of the passenger on specific public bus network includes passenger to there is history to take row
For every circuit sequential statistics category feature, passenger to have history take behavior daily each circuit time category feature and multiply
Visitor takes behavior ratio trend category feature to history ride circuit;
Wherein, the passenger refers to passenger every to the sequential statistics category feature for having every circuit of history seating behavior
The liveness that history on the specific circuit of bar is taken is portrayed, and is having history to take behavior line passenger in set time window
Public transport trading activity on road in nearest 12 hours, in nearest 1,3,7,14,28,56,84,112,139 days is counted, to multiplying
Visitor takes maximum times, weekend seating number of times, working day seating number of times and counted;
The passenger refers to that passenger is having history seating to the time category feature for having daily each circuit of history seating behavior
Nearest riding time interval in behavior, passenger has the time interval of seating behavior, passenger to have and gone by bus in preset time window
Enlivening number of days and enliven hourage, return and multiply minimum number of days, averagely return and multiply number of days feature for record;
The passenger takes behavior ratio trend category feature to history ride circuit and refers to the specific line of nearest 1 week occupant ride
Road number of times took behavior number accounting, the online way concentration of passenger in nearest 2 weeks and enlivens hourage and enliven small in circuit complete or collected works
When total accounting, passenger number of times taken at weekend always taking number of times accounting always taking number of times accounting, working day and take number of times
Feature.
Preferably, the characteristic attribute class of passenger's mass transit card different type feature is counted including different type passenger sequential
Category feature and different type passenger's trend category feature;
Wherein, prime number different type passenger sequential statistics category feature refers to carve the trip rule of different passenger types
Draw, the week by different passenger types nearest 12 hours on all circuits, in nearest 1,3,7,14,28,56,84,112,139 days
End and working day behavior number of times feature are counted;
The different type passenger trend category feature is nearest to different groups crowd 1,2,4 weeks nearest 2, in 4,8 weeks
Travel amount accounting is counted.
Preferably, the characteristic attribute class in passenger's mass transit card hair fastener place includes different location passenger's sequential class statistics spy
Levy, different hair fastener passenger trip trend category feature and different location passenger coding category feature;
The different location passenger sequential class statistical nature refer to respectively to individual place passenger nearest 12 hours, nearest 1,
3rd, 7,14,28,56,84,112, behavior total degree statistics in 139 days;
The different hair fasteners passenger trip trend category feature refer to different location passenger nearest 1,2,4 weeks nearest
2nd, 4, travel amount accounting statistics in 8 weeks, weekend goes on a journey number of times in total degree accounting;
The different location passenger coding category feature refers to the rule by bus and the line of presence on different mass transit card hair fastener ground
Lu Dougong is mapped as handing over card hair fastener place.
Relative to prior art, the invention has the advantages that and beneficial effect:
1) present invention builds training sample and test sample based on sliding window.The present invention believes passenger's trip historical series
Sectional compression is ceased, the influence of recent behavior is emphasized, passenger's rule of going on a journey in the recent period is strong to future trip correlation, and behavior pair in the past
Future influence gradually weakens, and takes current feature fine granularity to extract, feature coarseness at a specified future date is extracted, and is aided with overlapping extraction side
Method;The present invention is when constructing sample attribute feature and sample class label, set time window size;Constructed by sliding window
Many parts of samples.
2) varigrained extracting mode is used when the present invention carries out feature extraction to fixed citizen's history mass transit card data;
Construct the characteristic attribute with otherness;From rider history behavioral data, each circuit goes over the daily volume of the flow of passengers, and passenger has every trade
To enliven number of days, the different grain size progress feature extraction of travel route and final prediction label;
3) present invention design polymorphic type characteristic attribute;It is characteristic attribute class from rider history travel behaviour feature, not collinear
Interaction feature Attribute class of the characteristic attribute class, passenger of road feature on specific public bus network, passenger's mass transit card different type are special
The characteristic attribute class of point, different passenger types take interaction feature Attribute class, the passenger's public transport of rule on specific public bus network
Between the characteristic attribute class in card hair fastener place, passenger mass transit card hair fastener place and fixed public bus network the characteristic attribute class of feature with
And the aspect design citizens' activities characteristic attribute of characteristic attribute seven of the ratio class of some other variation tendency.
Embodiment
To more fully understand the present invention, the present invention is elaborated with reference to embodiments, but it is claimed
Scope be not limited thereto.
The present invention chooses method based on fixed citizen colony historical behavior feature prediction citizen future trip public bus network, should
Scheme of the invention design is as follows.
The present invention is avoids constructing many parts of sample trip data distribution inconsistence problems, and proposition, which is used, is based on sliding window structure
Make sample.Experimental data is that experimental data is the cities of Guangdong public transport line in 1 day to 2014 on December five months 31, of August in 2014
Road mass transit card user's history transaction data, sample characteristics are constructed using rider history behavior record in 139 day set time window
Attribute, following 7 days passengers fix whether going on a journey on public bus network and determine the class label of sample, by multiple sliding window,
Different time intervals is covered to construct many parts of training samples.
In each time window, characteristic attribute class, the feature of different line features from rider history travel behaviour feature
Interaction feature Attribute class, the characteristic attribute of passenger's mass transit card different type feature of Attribute class, passenger on specific public bus network
Class, different passenger type on specific public bus network on the interaction feature Attribute class of Behavior law, passenger's mass transit card hair fastener
The characteristic attribute class and reactions change of feature between characteristic attribute class, passenger mass transit card hair fastener place and the fixed public bus network put
The aspect design feature attribute of characteristic attribute of ratio class of trend etc. seven;Per category feature again from statistics class, time class, ratio trend
Class, the coding several classifications of class carry out specific design.
1) feature set of passenger behavior attribute is designed:
The sequential category feature of passenger behavior on all public bus networks:The statistics by bus of passenger recently on all circuits is to multiplying
Rule is described visitor by bus, and Distance Time is shorter, and to future, the influence of trip is bigger, as historical trading time gap is got over
Long, influence is less and less, and extraction interval granularity is increasing, count in nearest 12 hours of each passenger, nearest 1,3,7,
14th, 28,56,84,112, ridden in 139 days on all public bus networks total degree.
Passenger's time category feature:Passenger's active degree is described in preset time window.Passenger's average riding interval number of days, multiplies
Objective bus card-reading exchange hour recently, user enliven all numbers of hourage, trip number of times more than 1 time, trip number of times and are more than 2 times
All numbers, average charge time interval number of days, average number of times of swiping the card weekly.
The ratio trend category feature of passenger's trip change:In view of the variation tendency influence of rider history behavior, Cheng Kehang
For number of times be more than 2 times all number accountings, passenger nearest 1, number of swiping the card for 2,4 times 4 nearest 2,4,8 swipe the card several accountings, all footlines
For number of times head office be number of times accounting, working day behavior number of times in behavior number of times accounting etc. of always swiping the card, this category feature can be to passenger
By bus rule portrayed.
The feature of the different classes of attribute of passenger.Different classes of passenger is on future trip and influence, working clan's travel time
Rule, old man's trip is affected by other factors larger, is different features by 7 kinds of different mass transit card Type mappings.
2) the characteristic attribute class of different line features is designed:
Circuit sequential class statistical nature:Trip of the history volume of the flow of passengers of different circuits on passenger, which exists, to be influenceed, to every line
The volume of the flow of passengers that road counts nearest 12 hours, nearest 1,3,7,14,28,56,84,112,139 days respectively is counted, given
Weekend, the working day total guest flow statistics of passenger in time window, weekend and the work maximum guest flow statistics of per day, history;
The variation tendency category feature of circuit history seating amount:The change of the history volume of the flow of passengers influences on the presence that passenger goes on a journey, right
Each circuit nearest 1,2,4 weeks nearest 2, in 4,8 weeks volume of the flow of passengers ratio construction feature.
Public bus network coding characteristic:The website number of different circuit locations and daily circuit is to the following trip of passenger's selection
There is influence in circuit, mainly there is different line characteristics, every circuit website number feature
3) interaction feature Attribute class of the design passenger on specific public bus network:
Passenger counts category feature to the sequential for having every circuit of history seating behavior:To passenger on every specific circuit
The liveness taken of history portrayed, having on history seating behavior circuit nearest 12 small to passenger in set time window
When interior, nearest 1,3,7,14,28,56,84,112, the public transport trading activity statistics in 139 days, passenger take maximum times, week
Take number of times, working day and take number of times in end.
Passenger is to there is the time category feature of daily each circuit of history seating behavior:Passenger is having in history seating behavior most
Passenger has the time interval of seating behavior, passenger to have behavior record by bus near riding time interval, preset time window
Number of days (enlivening number of days) and hourage is enlivened, is returned and is multiplied minimum number of days, averagely returns and multiply the features such as number of days.
Passenger takes behavior ratio trend category feature to history ride circuit:The nearest specific circuit number of times of 1 week occupant ride
Behavior number accounting, the online way concentration of passenger was taken in nearest 2 weeks to enliven hourage and enliven hour sum in circuit complete or collected works
Accounting, passenger take number of times at weekend and are always taking the features such as number of times accounting in always seating number of times accounting, working day seating number of times.
4) the characteristic attribute class of passenger's mass transit card different type feature is designed:
Different type passenger sequential counts category feature:Different groups passenger trip rule is different, to different passenger types
Trip rule portrayed, by different passenger types nearest 12 hours on all circuits, nearest 1,3,7,14,28,56,84,
112nd, in 139 days weekend and working day behavior number of times feature.
Different type passenger's trend category feature:Different groups Behavioral change trend is reacted, such as old group can be with season
Change trip rule and change, student group can change with the change trip rule of winter and summer vacation, to different groups
Crowd nearest 1, nearest 2, in 4,8 weeks, travel amount accounting is counted within 2,4 weeks.
5) the characteristic attribute class in design passenger's mass transit card hair fastener place:
Different location passenger's sequential class statistical nature:There is difference in the trip rule of different location passenger, respectively to individual
Point passenger behavior total degree statistics in nearest 12 hours, nearest 1,3,7,14,28,56,84,112,139 days (presses weekend and work
Make to count respectively day).
Different hair fasteners passenger trip trend category feature:Different location passenger is nearest 1,2,4 weeks nearest 2,4,8 weeks
Interior travel amount accounting statistics, weekend goes on a journey number of times in total degree accounting.
Different location passenger encodes category feature:The rule by bus and the circuit of presence on different mass transit card hair fastener ground are not
Together, in order to embody these information in the sample, 20 different mass transit card hair fastener place mappings are characterized.
Model evaluation uses classical accuracy (precision), recall rate (recall) and F1 values as evaluating standard,
Final scoring is ranked up according to F1 values.Specific formula for calculation is as follows:
Each sample is together decided on by card_id and with line_name, and wherein card_id represents mass transit card unique ID number,
Line_name represents public bus network title, and whether sample class label is then held mass transit card passenger in following fixed time period and can
Go on a journey to determine on line_name circuits.
Embodiment:2015 Guangdong citizen's bus trip predictions
Five months part public bus network mass transit card user's history numbers that embodiment data are provided by public transport company of Guangdong Province
According to history data structure information is as shown in table 1.
The data sheet field of table 1
Field symbols | Field name | Field value |
Line_name | Line name | 7 circuits |
Terminal_id | Card swiping terminal ID | Field encryption |
Card_id | IC-card ID | Field encryption |
Create_city | Hair fastener place | 20 places such as Guangzhou, Foshan |
Deal_time | Exchange hour | 2014090108 |
Card_type | Public transport Card Type | 7 kinds of generic card, old man's card etc. |
Stop_cnt | Circuit website number | 24 |
Line_type | Circuit types | In Guangzhou/wide Buddhist is trans-regional |
According to fixed group, history rides to record on public bus network, excavates behavior mould of the fixed crowd in public transport
Formula, thus it is speculated that passenger's trip custom and preference, the public bus network that prediction passenger takes in following fixed time period.
Model evaluation uses classical accuracy (precision), recall rate (recall) and F1 values as evaluating standard,
Final scoring is ranked up according to F1 values.Specific formula for calculation is as follows:
Each sample is together decided on by card_id and with line_name, and wherein card_id represents mass transit card unique ID number,
Line_name represents public bus network title, and whether sample class label is then held mass transit card passenger in following fixed time period and can
Go on a journey to determine on line_name circuits.
Modeling data is that the part public bus network south of the Five Ridges, August 1 day to 2014 on December, five months 31, Guangdong in 2014 is general
Whether family historical data, sliding window is fixed as extracting for 139 days the feature of design sample, according to passenger to certain in subsequent 7 days
Bar circuit has travel behaviour to carry out the label label (1/0 of sample drawn;Take).According to above-mentioned thought, respectively using 8
Month 1 day extracted passenger's characteristic attribute to December 17, December 18 to December 24 sample drawn label configurations sample sample_
1;August 3 days extracted passenger's feature to December 19, December 20 to December 26 sample drawn label configurations sample sample_
2;August 6 days extracted passenger's feature to December 22, December 23 to December 29 sample drawn label configurations sample sample_
3;August 8 days extracted passenger's characteristic attribute to December 24, December 25 the label configurations sample to sample drawn on December 31
Sample_4, so then constructing many parts of abundances has the sample of otherness.The passenger's feature extracted to August to December 31 for 15th
Construct unlabeled exemplars sample_0, trip of the prediction passenger on January 1st, 2015 to January 7 fixed public bus network.Often
291 characteristic attributes are obtained according to features described above attribute construction method in part sample.
Have that correlation is higher, granularity is different in view of the characteristic attribute of the sample of construction, random forests algorithm can be very well
Such characteristic attribute is handled, the present invention carries out the training modeling of sample using random forests algorithm so that model prediction
Performance is more excellent.Finally, model performance desired value such as table 2 is obtained.
The model performance result of table 2
Claims (6)
1. a kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology, it is characterised in that comprise the following steps:
1) sample is constructed using based on sliding window;Sample is constructed using rider history behavior record in 139 day set time window
Eigen attribute, by multiple sliding window, covers different time intervals to construct many parts of training samples;
In each time window, characteristic attribute class, the characteristic attribute of different line features from rider history travel behaviour feature
Interaction feature Attribute class on specific public bus network of class, passenger, the characteristic attribute class of passenger's mass transit card different type feature, no
Interaction feature Attribute class, the feature in passenger's mass transit card hair fastener place with Behavior law of the passenger type on specific public bus network
The characteristic attribute class and the ratio of reactions change trend of feature between Attribute class, passenger mass transit card hair fastener place and fixed public bus network
It is worth the aspect design feature attribute of characteristic attribute seven of class;Per category feature again from statistics class, time class, ratio trend class, coding class
Several classifications carry out specific design;
2) model evaluation is used as evaluating standard, most final review using classical accuracy precision, recall rate recall and F1 value
Divide and be ranked up according to F1 values;Calculation formula is as follows:
<mrow>
<mi>Pr</mi>
<mi>e</mi>
<mi>c</mi>
<mi>i</mi>
<mi>s</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo>|</mo>
<mo>&cap;</mo>
<mo>(</mo>
<mi>Pr</mi>
<mi> </mi>
<mi>e</mi>
<mi>d</mi>
<mi>i</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
<mo>,</mo>
<mi>Re</mi>
<mi> </mi>
<mi>f</mi>
<mi>e</mi>
<mi>r</mi>
<mi>e</mi>
<mi>n</mi>
<mi>c</mi>
<mi>e</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
<mo>|</mo>
</mrow>
<mrow>
<mo>|</mo>
<mrow>
<mi>Pr</mi>
<mi> </mi>
<mi>e</mi>
<mi>d</mi>
<mi>i</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
<mrow>
<mi>Re</mi>
<mi> </mi>
<mi>c</mi>
<mi>a</mi>
<mi>l</mi>
<mi>l</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo>|</mo>
<mo>&cap;</mo>
<mrow>
<mo>(</mo>
<mi>Pr</mi>
<mi> </mi>
<mi>e</mi>
<mi>d</mi>
<mi>i</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
<mo>,</mo>
<mi>Re</mi>
<mi> </mi>
<mi>f</mi>
<mi>e</mi>
<mi>r</mi>
<mi>e</mi>
<mi>n</mi>
<mi>c</mi>
<mi>e</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mrow>
<mo>|</mo>
<mrow>
<mi>Re</mi>
<mi> </mi>
<mi>f</mi>
<mi>e</mi>
<mi>r</mi>
<mi>e</mi>
<mi>n</mi>
<mi>c</mi>
<mi>e</mi>
<mi>S</mi>
<mi>e</mi>
<mi>t</mi>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
<mrow>
<mi>F</mi>
<mn>1</mn>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<mo>&times;</mo>
<mi>Pr</mi>
<mi> </mi>
<mi>e</mi>
<mi>d</mi>
<mi>i</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mo>&times;</mo>
<mi>Re</mi>
<mi> </mi>
<mi>c</mi>
<mi>a</mi>
<mi>l</mi>
<mi>l</mi>
</mrow>
<mrow>
<mi>Pr</mi>
<mi> </mi>
<mi>e</mi>
<mi>c</mi>
<mi>i</mi>
<mi>o</mi>
<mi>s</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mo>+</mo>
<mi>Re</mi>
<mi> </mi>
<mi>c</mi>
<mi>a</mi>
<mi>l</mi>
<mi>l</mi>
</mrow>
</mfrac>
</mrow>
Each sample is together decided on by card_id and with line_name, and wherein card_id represents mass transit card unique ID number,
Line_name represents public bus network title, and whether sample class label is then held mass transit card passenger in following fixed time period and can
Go on a journey to determine on line_name circuits.
2. citizens' activities public bus network according to claim 1 chooses sliding window multiple features Forecasting Methodology, its feature exists
In the characteristic attribute class of the rider history travel behaviour feature includes the sequential class spy of passenger behavior on all public bus networks
Levy, passenger's time category feature, passenger trip change ratio trend category feature and the different classes of attribute of passenger feature;
Wherein, on all public bus networks the sequential category feature of passenger behavior be count in nearest 12 hours of each passenger, nearest 1,
3rd, 7,14,28,56,84,112, ridden in 139 days on all public bus networks total degree;
Passenger's time category feature refers to passenger's average riding interval number of days, the nearest bus card-reading exchange hour of passenger, user
Enliven hourage, trip number of times be more than all numbers of 1 time, trip number of times be more than 2 times all numbers, average charge time interval number of days,
Average number of times of swiping the card weekly.
The ratio trend category feature of passenger's trip change:In view of the variation tendency influence of rider history behavior, passenger behavior
All number accountings of the number more than 2 times, passenger nearest 1, number of swiping the card for 2,4 times 4 are nearest 2,4,8 swipe the card several accountings, weekend behaviors time
Number head office be number of times accounting, working day behavior number of times in behavior number of times accounting etc. of always swiping the card, this category feature can multiply to passenger
Car rule is portrayed.
The feature of the different classes of attribute of passenger.Different classes of passenger is on future trip and influences, working clan's travel time rule,
Old man's trip is affected by other factors larger, is different features by 7 kinds of different mass transit card Type mappings.
3. citizens' activities public bus network according to claim 1 chooses sliding window multiple features Forecasting Methodology, its feature exists
In the characteristic attribute class of the different line features includes circuit sequential class statistical nature, the change of circuit history seating amount and become
Gesture category feature and public bus network coding characteristic;
Wherein circuit sequential class statistical nature, which refers to that trip of the history volume of the flow of passengers of different circuits on passenger is present, influences, to every
The volume of the flow of passengers of the circuit respectively to nearest 12 hours, nearest 1,3,7,14,28,56,84,112,139 days is counted, when given
Between in window weekend, the working day total volume of the flow of passengers of passenger counted;Weekend and the maximum volume of the flow of passengers of per day, history that works are carried out
Statistics;
The variation tendency category feature of the circuit history seating amount refers to that the change of the history volume of the flow of passengers influences on the presence that passenger goes on a journey,
Nearest to each circuit 1,2,4 weeks nearest 2, in 4,8 weeks volume of the flow of passengers ratio construction feature;
The public bus network coding characteristic refers to that the website number of different circuit locations and daily circuit selects future to passenger
There is influence in travel route, mainly there is different line characteristics, every circuit website number feature.
4. citizens' activities public bus network according to claim 1 chooses sliding window multiple features Forecasting Methodology, its feature exists
In interaction feature Attribute class of the passenger on specific public bus network includes passenger to there is history to take every circuit of behavior
Sequential statistics category feature, passenger to have history take behavior daily each circuit time category feature and passenger to history take
Circuit takes behavior ratio trend category feature;
Wherein, the passenger refers to passenger in every tool to the sequential statistics category feature for having every circuit of history seating behavior
The liveness that history on body circuit is taken is portrayed, and is having history to take on behavior circuit passenger in set time window
Public transport trading activity in nearest 12 hours, in nearest 1,3,7,14,28,56,84,112,139 days is counted, passenger is multiplied
Maximum times, weekend seating number of times, working day seating number of times is sat to be counted;
The passenger refers to that passenger is having history to take behavior to the time category feature for having daily each circuit of history seating behavior
Passenger has the time interval of seating behavior, passenger to have the note of behavior by bus in upper nearest riding time interval, preset time window
Record enlivens number of days and enlivens hourage, returns and multiply minimum number of days, averagely return and multiply number of days feature;
The passenger takes behavior ratio trend category feature to history ride circuit and refers to the specific circuit of nearest 1 week occupant ride
Number takes behavior number accounting, the online way concentration of passenger in nearest 2 weeks and enlivens hourage to enliven hour in circuit complete or collected works total
Number accounting, passenger take number of times at weekend and are always taking number of times accounting feature in always seating number of times accounting, working day seating number of times.
5. citizens' activities public bus network according to claim 1 chooses sliding window multiple features Forecasting Methodology, its feature exists
In the characteristic attribute class of passenger's mass transit card different type feature includes different type passenger sequential and counts category feature and difference
Type passenger's trend category feature;
Wherein, prime number different type passenger sequential statistics category feature refers to portray the trip rule of different passenger types,
Weekend by different passenger types nearest 12 hours on all circuits, in nearest 1,3,7,14,28,56,84,112,139 days
Counted with working day behavior number of times feature;
The different type passenger trend category feature is nearest to different groups crowd 1, gone on a journey nearest 2, in 4,8 weeks within 2,4 weeks
Amount accounting is counted.
6. citizens' activities public bus network according to claim 1 chooses sliding window multiple features Forecasting Methodology, its feature exists
In the characteristic attribute class in passenger's mass transit card hair fastener place includes different location passenger's sequential class statistical nature, different hair fasteners
The trip trend category feature of ground passenger and different location passenger coding category feature;
The different location passenger sequential class statistical nature refer to respectively to individual place passenger nearest 12 hours, nearest 1,3,7,
14th, 28,56,84,112, behavior total degree statistics in 139 days;
The different hair fasteners passenger trip trend category feature refer to different location passenger nearest 1,2,4 weeks nearest 2,4,
Travel amount accounting statistics in 8 weeks, weekend goes on a journey number of times in total degree accounting;
The different location passenger coding category feature refers to the rule by bus and the circuit of presence on different mass transit card hair fastener ground all
Public affairs are mapped as handing over card hair fastener place.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463795.3A CN107316108A (en) | 2017-06-19 | 2017-06-19 | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463795.3A CN107316108A (en) | 2017-06-19 | 2017-06-19 | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107316108A true CN107316108A (en) | 2017-11-03 |
Family
ID=60183854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710463795.3A Pending CN107316108A (en) | 2017-06-19 | 2017-06-19 | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107316108A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108615096A (en) * | 2018-05-10 | 2018-10-02 | 平安科技(深圳)有限公司 | Server, the processing method of Financial Time Series and storage medium |
CN109741114A (en) * | 2019-01-10 | 2019-05-10 | 博拉网络股份有限公司 | A kind of user under big data financial scenario buys prediction technique |
CN110019367A (en) * | 2017-12-28 | 2019-07-16 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of statistical data feature |
CN111126681A (en) * | 2019-12-12 | 2020-05-08 | 华侨大学 | Bus route adjusting method based on historical passenger flow |
-
2017
- 2017-06-19 CN CN201710463795.3A patent/CN107316108A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019367A (en) * | 2017-12-28 | 2019-07-16 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of statistical data feature |
CN110019367B (en) * | 2017-12-28 | 2022-04-12 | 北京京东尚科信息技术有限公司 | Method and device for counting data characteristics |
CN108615096A (en) * | 2018-05-10 | 2018-10-02 | 平安科技(深圳)有限公司 | Server, the processing method of Financial Time Series and storage medium |
CN109741114A (en) * | 2019-01-10 | 2019-05-10 | 博拉网络股份有限公司 | A kind of user under big data financial scenario buys prediction technique |
CN111126681A (en) * | 2019-12-12 | 2020-05-08 | 华侨大学 | Bus route adjusting method based on historical passenger flow |
CN111126681B (en) * | 2019-12-12 | 2022-06-07 | 华侨大学 | Bus route adjusting method based on historical passenger flow |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107316108A (en) | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology | |
Rahman et al. | Perceived service quality of paratransit in developing countries: A structural equation approach | |
Hnatkovska et al. | Breaking the caste barrier: Intergenerational mobility in India | |
Rentziou et al. | VMT, energy consumption, and GHG emissions forecasting for passenger transportation | |
Chay | The impact of federal civil rights policy on black economic progress: Evidence from the equal employment opportunity act of 1972 | |
Tonts et al. | From state paternalism to neoliberalism in Australian rural policy: perspectives from the Western Australian wheatbelt | |
Egu et al. | Investigating day-to-day variability of transit usage on a multimonth scale with smart card data. A case study in Lyon | |
CN106372775A (en) | Assessment method and system of comprehensive value of power grid client | |
CN105389713A (en) | Mobile data traffic package recommendation algorithm based on user historical data | |
Morency et al. | Car sharing system: what transaction datasets reveal on users' behaviors | |
CN107368915B (en) | Subway passenger travel time selection behavior analysis method | |
CN106934412A (en) | A kind of user behavior sorting technique and system | |
CN104239968A (en) | Short-term load predicting method based on quick fuzzy rough set | |
CN106919953A (en) | A kind of abnormal trip Stock discrimination method based on track traffic data analysis | |
CN106777169A (en) | A kind of user's trip hobby analysis method based on car networking data | |
CN110070255A (en) | The method for introducing commuter's Passenger Traveling Choice modeling and analysis after sharing bicycle | |
Ritchie | A statistical approach to statewide traffic counting | |
Guo et al. | Exploring potential travel demand of customized bus using smartcard data | |
Ravšelj et al. | R&D subsidies as drivers of corporate performance in Slovenia: The regional perspective | |
CN108681741A (en) | Based on the subway of IC card and resident's survey data commuting crowd's information fusion method | |
CN112949926B (en) | Income maximization ticket amount distribution method based on passenger demand re-identification | |
Cho et al. | Presidential voting and the local variability of economic hardship | |
Blomquist | Value of life, economics of | |
CN110020666B (en) | Public transport advertisement putting method and system based on passenger behavior mode | |
CN112508425B (en) | Urban travel user portrait system construction method for elastic public transport system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171103 |