CN109784416A - The mode of transportation method of discrimination of semi-supervised SVM based on mobile phone signaling data - Google Patents

The mode of transportation method of discrimination of semi-supervised SVM based on mobile phone signaling data Download PDF

Info

Publication number
CN109784416A
CN109784416A CN201910076104.3A CN201910076104A CN109784416A CN 109784416 A CN109784416 A CN 109784416A CN 201910076104 A CN201910076104 A CN 201910076104A CN 109784416 A CN109784416 A CN 109784416A
Authority
CN
China
Prior art keywords
mode
transportation
sample
classifier
trip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910076104.3A
Other languages
Chinese (zh)
Other versions
CN109784416B (en
Inventor
张锦
唐劲松
冯雨庭
肖斌
罗静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201910076104.3A priority Critical patent/CN109784416B/en
Publication of CN109784416A publication Critical patent/CN109784416A/en
Application granted granted Critical
Publication of CN109784416B publication Critical patent/CN109784416B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to Computer Recognition Technologies, more particularly to the mode of transportation method of discrimination of semi-supervised SVM based on mobile phone signaling data a kind of, (1) preparation and preprocessed data, (2) tag design type, (3) trip characteristics are extracted, (4) improved mode of transportation manual identified process is established, (5) training preliminary classification device, (6) differentiate the mode of transportation of unmarked sample, (7) judge whether classifier meets termination condition, (8) data set of high confidence level sample updates, (9) optimization of the semi-supervised SVM classifier based on Tri-training, (10) differentiate the mode of transportation of unmarked sample, (11) judge whether classifier meets termination condition, (12) data set of low confidence sample updates, (13) based on the semi-supervised svm classifier of shell vector The optimization of device.Acquisition of information cost of the present invention reduces, data user rate is promoted, and differentiates that flexible, comprehensive, precision is high, application scenarios are more extensive.

Description

The mode of transportation method of discrimination of semi-supervised SVM based on mobile phone signaling data
Technical field
The present invention relates to Computer Recognition Technologies, and in particular to a kind of friendship of the semi-supervised SVM based on mobile phone signaling data Logical mode method of discrimination.
Background technique
The mode of transportation information of trip traffic programme, in terms of have important role and value.Mesh The preceding method for obtaining mode of transportation information includes the two major classes such as conventional survey and data mining.Questionnaire survey, telephone questionnaire etc. pass System investigation method is difficult to high-frequency, large-scale implementation, and can not accurately reflect actual mode of transportation information;Data mining Method is mainly based upon data in mobile phone and establishes certain rule digging mode of transportation information, compared to conventional survey, data mining Method not only can solve survey organization difficulty is big, sampling rate is low, information show dimension it is single the disadvantages of, may also adapt to me The transport need and supply of state's rapid development update rhythm.In terms of data source angle, data mining mainly uses satellite digit According to mobile phone signaling data, although satellite location data is widely used, but it is big to obtain difficulty, and full-time data integrity degree is low, and Mobile phone signaling data has many advantages, such as that procurement cost is low, sampling rate is high, full-time data integrity degree is high, data updating decision, Ke Yigeng Comprehensively, mode of transportation information is excavated for greater flexibility.Therefore the present invention differentiates user's mode of transportation using mobile phone signaling data.
Domestic and foreign scholars are mainly had studied the methods of fuzzy diagnosis, Bayesian Decision Tree and are differentiated based on mobile phone signaling data Application in mode of transportation.In recent years, simultaneously table is widely applied in machine learning method in more classification problems and pattern discrimination problem Now preferably, it is gradually applied to mode of transportation and differentiates field, main research concentrates on two side such as supervised learning and unsupervised learning Face.Mobile phone signaling data belongs to data untagged, then needs manually according to the method for supervised learning to a large amount of mobile phone signaling numbers According to being marked, heavy workload and data user rate is low;Though the method according to unsupervised learning is believed without handmarking's traffic Breath, but output classification is difficult to and precision is low.
Summary of the invention
In view of the above technical problems, the mode of transportation that the present invention provides the semi-supervised SVM based on mobile phone signaling data differentiates Method is differentiated using the classification method of semisupervised support vector machines and is used with a small amount of marked data and a large amount of data untaggeds Family mode of transportation has the advantages such as nicety of grading is high, training cost is low, there is good universality and promotional value.
The mode of transportation method of discrimination of semi-supervised SVM provided by the invention based on mobile phone signaling data, is that one kind is based on The mode of transportation method of discrimination of the semisupervised support vector machines of mobile phone signaling data, comprising the following steps:
Step 1: preparation and preprocessed data.The user's Trip chain for mobile phone signaling data will be had been based on portraying, and handle As only including single mode of transportation trip subchain, it is converted into the data set that the present invention uses.
Step 2: tag design type.The mode of transportation quantity determined is k, and according to urban transportation factor, design is specific Mode of transportation differentiate situation, be each mode of transportation tag design, and according to each mode of transportation be easy distinguish degree establish it is oriented Acyclic figure;
The present invention devises the directed acyclic graph for being conducive to promote classification effectiveness, according to the easy differentiation journey of each mode of transportation Degree designs directed acyclic graph, sequentially will mark according to walking, bicycle, electric vehicle/motorcycle, bus, car, others Label are designed to 1-6, wherein 1 and 6 be the maximum two kinds of modes of transportation of difference, remaining is 1 and 5,2 and 5 etc. respectively.In addition, sharp Class categories can be adjusted flexibly with the structure of directed acyclic graph, often increase or delete a kind of classification, only need to increase by one in side Column node does not impact remaining node-classification.
Step 3: extracting trip characteristics.By existing travel modal situation, traditional folk houses trip survey data, mobile phone Signaling data, extracts semi-supervised learning feature, including speed, when consumption, distance, origin and destination POI classification, age of user etc. five greatly Class.The trip characteristics of each trip subchain are calculated, extracted one by one.
Step 4: establishing improved mode of transportation manual identified process.With Bayesian decision tree method and third party's map Data indirect labor identifies the mode of transportation of trip subchain.
Step 5: training preliminary classification device.Trip subchain is randomly selected, is manually known according to improved mode of transportation in step 4 Other process identifies sample mode of transportation, if increasing sample size without all standing k kind trip mode, continuing to identify, until k kind Mode of transportation has the sample of manual identified.Trip subchain after manual identified forms marked sample set L, residue trip subchain Constitute unmarked sample set U.Using marked sample set L and unmarked sample set U, the initial semi-supervised SVM classifier of training.
Step 6: differentiating the mode of transportation of unmarked sample.Unmarked sample set U is differentiated with classifier, obtains sample This mode of transportation.
Step 7: judging whether classifier meets termination condition.Judge whether classifier meets precision, unmarked sample collection U It whether is the termination conditions such as empty set;If meeting, which is optimum classifier, and exports the traffic side of each trip subchain Formula;Otherwise, step 8 is gone to.
Step 8: the data set of high confidence level sample updates.The high sample of selected part confidence level in differentiation result in advance, It is added in marked sample set L, updates sample set.
Step 9: the optimization of the semi-supervised SVM classifier based on Tri-training.Utilize sample updated in step 8 The semi-supervised SVM classifier of collection building Tri-training.
Step 10: differentiating the mode of transportation of unmarked sample.Unmarked sample set U is differentiated with current class device, Obtain sample mode of transportation.
Step 11: judging whether classifier meets termination condition.The judgement of termination condition is carried out according to the method for step 7, If meeting, which is optimum classifier, and exports the mode of transportation of each trip subchain;Otherwise, step 12 is gone to.
Step 12: the data set of low confidence sample updates.From the differentiation result in step 9 to unmarked sample collection U The low sample of selected part confidence level is added in marked sample set L, updates sample set.
Step 13: the optimization of the semi-supervised SVM classifier based on shell vector.Utilize sample set structure updated in step 11 It builds the classifier based on CHB-ASVM Active Learning and goes to step 6.
The detailed substeps of further part step are as follows:
Step 1 prepares and preprocessed data
1.1 are collected and prepared for data;
The present invention uses the method for discrimination of the mobile phone signaling data research mode of transportation comprising chain information of going on a journey.Mobile phone signaling Data are converted into data set of the invention after over cleaning, excavation, and data field includes subscriber-coded, timestamp, tracing point Longitude and latitude, track vertex type, age of user etc..
1.2 extract trip subchain;
The present invention is based on track vertex type, the full-time Trip chain of user, which is split, becomes only trip comprising once going on a journey Chain.
Step 2 tag design type
Step 3 extracts trip characteristics
Step 4 establishes improved mode of transportation manual identified process
Respectively by sample xi, (i=1,2, L, n) is using the mode of transportation identification process based on Bayesian Decision Tree and is based on The mode of transportation identification process of third party's map datum identifies, obtains mode of transportation resultWithThen both judgements Whether identical, if they are the same, then the sample mode of transportation isOtherwise, it transfers to expert to compare in sample to sentence It is disconnected, identify the mode of transportation X of the trip subchaini
Step 5 trains preliminary classification device
Trip subchain in 5.1 pairs of steps 1 carries out simple random sampling, chooses sample xi, (i=1,2, L, n) constitutes sample This collection N, sample size n.Utilize the traffic side of all samples in the mode of transportation manual identified process judgement sample collection N of step 4 Formula simultaneously marks, if increasing sample size without full trip mode is covered, continuing to identify, until whole label classifications have sample, Trip subchain after differentiation constitutes marked sample set L, and residue trip subchain constitutes unmarked sample set U.
5.2 enable t=1, remember that marked sample set isI.e.It is rightBootstrap sampling is carried out, is generated Three training sample set L '1、L′2、L′3, and three preliminary classification devices are trained with SVM algorithm, it is denoted as
5.3 preliminary classification devicesForIt is integrated, i.e.,
Step 6: differentiating the mode of transportation of unmarked sample
Utilize classifierThe mode of transportation for subchain of going on a journey in unmarked sample set U is differentiated.
Step 7 judges whether classifier meets termination condition
7.1 determine termination condition.Situations such as according to training objective, data precision, sample size, determine optimum classifier Nicety of grading, semisupervised support vector machines determine termination condition to indexs such as the utilization rates of unmarked sample.
7.2 judge current classifierWhether termination condition is met, if meeting, which is optimal classification Device, and export the mode of transportation result that the classifier determines;Otherwise, step 8 is gone to.
Step 8 high confidence level sample data set updates
8.1 calculate preliminary making result confidence level conf (xi)。
8.2 in preliminary making result, chooses and m class label confidence level is selected to be greater than threshold epsilon1Sample, i.e. current class DeviceThe relatively determining sample of class label, is denoted as U ε1
The 8.3 sample U ε that confidence level is high1It is added to marked sample setIn, update sample set.
The optimization of semi-supervised SVM classifier of the step 9 based on Tri-training
9.1 carry out Tri-training semi-supervised learning using the sample set that step 8 updates, and generate classifier
9.2 calculate three classifiersError in classificationCalculate 3 classification Weight of the device in integrated classifier
9.3 generate integrated classifierI.e.
Step 10: differentiating the mode of transportation of unmarked sample
Utilize classifierThe mode of transportation for subchain of going on a journey in unmarked sample set U is differentiated.
Step 11 judges whether classifier meets termination condition
Judge classifierWhether termination condition is met, if meeting, which is optimum classifier, and exporting should The mode of transportation result that classifier determines;Otherwise, step 12 is gone to.
Step 12 low confidence sample data set updates
12.1 calculate preliminary making result confidence level conf (xi)。
12.2 in preliminary making result, chooses and j class label confidence level is selected to be less than threshold epsilon2Sample, i.e. current class DeviceRelatively uncertain sample, is denoted as U ε2
12.3 meter U ε2Middle shell vector isBy sample setNote For
12.4 by sample setIt is marked with the improved mode of transportation manual identified process of step 4 Note, and never marker samples concentrate this part sample of removal, even
The optimization of semi-supervised SVM classifier of the step 13 based on shell vector
13.1 respectively in sample setSample set after carrying out sample set obtained by 3 resamplings and markingCarry out SVM Incremental learning, note training classifier be
13.2 calculate 3 classifiersError in classificationAnd 3 classifiers are collecting Weight in constituent class device
13.3 generating classifier
13.4 enable t=t+1, and go to step 6.
The mode of transportation method of discrimination of semi-supervised SVM provided by the invention based on mobile phone signaling data improves traffic side Formula manual identified process promotes manual identified by the mode of transportation method of discrimination and third party's map datum of Bayesian Decision Tree The efficiency of mode of transportation;A kind of method of tag along sort variable design is provided, mode of transportation quantity is considered as variable, is established adjustable The directed acyclic graph of number of nodes situations such as according to data precision, Urban Traffic Modes, adjusts tag along sort number and directed acyclic Graph structure, and then adjust and differentiate result.And the semisupervised support vector machines classifier based on Tri-training, by confidence level The higher unmarked marked sample set of addition continues to optimize classifier performance, promotes mode of transportation by updating sample set Discriminating power.It is based particularly on the semisupervised support vector machines classifier of shell vector, the low Unlabeled data of confidence level is utilized Handmarking determines mode of transportation and marked sample set is added, and the determination of supporting vector is improved by the lower sample of confidence level Efficiency determines the decision boundary of support vector machines, promotes mode of transportation identification effect.
Technical solution of the present invention has the advantages that
1. mode of transportation acquisition of information cost reduces, data user rate is promoted.The source data of mobile phone signaling data obtains letter It is single convenient, it reduces costs.Using the method for semisupervised support vector machines, do not mark using less handmarking's data and largely Numeration improves data user rate according to being trained.
2. mode of transportation differentiates flexibly, comprehensively.Differentiate that result covers the main traffic mode of Urban Residential Trip substantially, and Class categories can be adjusted with urban transportation infrastructure construction type, urban can be well adapted for.
3. mode of transportation discrimination precision is higher.Classifier can be by constantly adjusting the composition of sample data set, optimization Classifier performance promotes nicety of grading.
4. application scenarios are more extensive.Differentiate that result can provide number for urban traffic control and planning, city construction planning and building According to support.
Detailed description of the invention
Fig. 1 is the main-process stream of differentiation of the invention;
Fig. 2 is the directed acyclic graph of embodiment;
Fig. 3 is the manual identified Trip chain trip mode process of embodiment;
Fig. 4 is the Bayesian Decision Tree of embodiment;
Fig. 5 is the mode of transportation identification process by third party's map datum of embodiment;
Fig. 6 is the classifier precision situation of change of embodiment.
Specific embodiment
It is described in conjunction with the embodiments the specific technical solution of the present invention.
According to process shown in FIG. 1, the present embodiment the following steps are included:
Step 1 prepares and preprocessed data
1.1 are collected and prepared for data
The present invention uses the method for discrimination of the mobile phone signaling data research mode of transportation comprising chain information of going on a journey.Mobile phone signaling Data are converted into data set of the invention after over cleaning, excavation, and data field includes subscriber-coded, timestamp, tracing point Longitude and latitude, track vertex type, age of user etc., the Trip chain on Wednesday on the 14th of September in 2016 of user A0000001 such as 1 institute of table Show.
The full-time Trip chain of 1 user of table
Wherein origin and destination indicate that the beginning or end of trip, dwell point indicate that user stops on the ground, common tracing point table Show user by the point.
1.2 extract trip subchain
The present invention is based on track vertex type, the full-time Trip chain of user, which is split, becomes only trip comprising once going on a journey The trip subchain of chain, user 958fea201 is as shown in table 2, wherein the trip time started is 9:19:11, end time 9:52: 01。
The trip subchain of 2 user's single of table trip
Step 2 tag design type
2.1 determine number of labels
City where data used in example is the city GY, GZ province, and the time is in September, 2016, main traffic mode include walking, Electric vehicle and motorcycle are set mark of the same race by bicycle, electric vehicle, motorcycle, private car, bus, taxi etc., this example Car and taxi are set as label of the same race by label, if mode of transportation quantity k=6, including walking, bicycle, electric vehicle/rub Motorcycle, private car/taxi, bus, other, label is 1,2,3 respectively ... ..., 6.
2.2 establish directed acyclic graph
It is as shown in Figure 2 to be easy the directed acyclic graph that the degree distinguished is established according to each mode of transportation by number of tags k=6.
Step 3 extracts trip characteristics
Trip characteristics are divided into five classes such as speed, time, distance and origin and destination POI, user characteristics, to each category feature into Row subdivision, further determines that trip characteristics according to data precision and characteristic of city.
3.1 velocity characteristic.Trip speed can be subdivided into average speed, maximum speed, 75 quantile of speed, velocity variance Etc. features, can also be identified on this basis according to friction speed section accounting, different trip mode velocity characteristics such as 3 institute of table Show.[0.5,5] percentage of access speed, [1,10] percentage, [5,15] percentage are as learning characteristic.
The different trip mode speed cumulative frequencies (%) of table 3 are distributed
3.2 temporal characteristics.Travel time can be divided into the trip features such as moment and psychology travel time.
3.3 distance feature.Linear distance is as trip characteristics between selection trip distance and origin and destination.
3.4 origin and destination POI features.Another information of mobile phone signaling data is base station location point, i.e. latitude and longitude information, structure After building Trip chain, it can determine that the starting point of every trip and the latitude and longitude information of the point of arrival, these latitude and longitude informations can lead to It crosses Baidu POI and transforms into trip ground attribute, be divided into residential block, government organs, office building, cuisines, service for life, hospital, public affairs 6 classes such as landscape point are as learning characteristic, and characteristic value is respectively 1,2,3 ... ..., and 6.
3.5 age of user.
Analysis is it is found that the features such as available velocity, time, distance, trip ground attribute can be used as the input spy of semi-supervised learning Sign, specific features table is as shown in table 4, and the trip characteristics that user A0000001 once goes on a journey are as shown in table 5.
4 trip characteristics table of table
The trip characteristics that 5 user A0000001 of table once goes on a journey
Step 4 establishes improved mode of transportation manual identified process
Respectively by sample xi, (i=1,2, L, n) is using the mode of transportation identification process based on Bayesian Decision Tree and is based on The mode of transportation identification process of third party's map datum identifies, obtains mode of transportation resultWithThen both judgements Whether identical, if they are the same, then the sample mode of transportation isOtherwise, expert is transferred to compare judgement in sample, Identify the mode of transportation X of the trip subchaini.Improved mode of transportation manual identified process as shown in figure 3, Bayesian Decision Tree such as It is as shown in Figure 5 by the mode of transportation identification process of Baidu API shown in Fig. 4.
By user A000001 on September 14th, 2,016 first trip subchain carry out mode of transportation manual identified, Mode of transportation 1 is bus (label 5), and mode of transportation 2 is bus (label 5), and two results are identical, the traffic of the trip subchain Mode is bus.
Step 5 trains preliminary classification device
Trip subchain in 5.1 pairs of steps 1 carries out simple random sampling, chooses sample xi, (i=1,2, L, n) constitutes sample This collection N, sample size n.Utilize the traffic side of all samples in the mode of transportation manual identified process judgement sample collection N of step 4 Formula simultaneously marks, if increasing sample size without full trip mode is covered, continuing to identify, until whole label classifications have sample, Trip subchain after differentiation constitutes marked sample set L, and residue trip subchain constitutes unmarked sample set U.
5.2 enable t=1, remember that marked sample set isI.e.It is rightBootstrap sampling is carried out, is generated Three training sample set S1、S2、S3, and three preliminary classification devices are trained with SVM algorithm, it is denoted as
5.3 preliminary classification devicesForIt is integrated, i.e.,
Step 6: differentiating the mode of transportation of unmarked sample
Utilize classifierThe mode of transportation for subchain of going on a journey in unmarked sample set U is differentiated.
Step 7 judges whether classifier meets termination condition
7.1 determine termination condition.Situations such as according to training objective, data precision, sample size, determine optimum classifier Nicety of grading, semisupervised support vector machines determine termination condition to indexs such as the utilization rates of unmarked sample.
7.2 judge current classifierWhether termination condition is met, if meeting, which is optimal classification Device, and export the mode of transportation result that the classifier determines;Otherwise, step 8 is gone to.
Step 8 high confidence level sample data set updates
8.1 calculate preliminary making result confidence level conf (xi)。
8.2 in preliminary making result, chooses and m class label confidence level is selected to be greater than threshold value TconfthSample, i.e., currently ClassifierThe relatively determining sample of class label, is denoted as
8.3 samples that confidence level is highIt is added to marked sample setIn, update sample set.
The optimization of semi-supervised SVM classifier of the step 9 based on Tri-training
9.1 carry out Tri-training semi-supervised learning using the sample set that step 8 updates, and generate classifier
9.2 calculate three classifiersError in classificationCalculate 3 classification Weight of the device in integrated classifier
9.3 generate integrated classifierI.e.
Step 10: differentiating the mode of transportation of unmarked sample
Utilize classifierThe mode of transportation for subchain of going on a journey in unmarked sample set U is differentiated.
Step 11 judges whether classifier meets termination condition
Judge classifierWhether termination condition is met, if meeting, which is optimum classifier, and exporting should The mode of transportation result that classifier determines;Otherwise, step 12 is gone to.
Step 12 low confidence sample data set updates
12.1 calculate preliminary making result confidence level conf (xi)。
12.2 in preliminary making result, chooses and j class label confidence level is selected to be less than threshold value CconfthSample, i.e., currently ClassifierRelatively uncertain sample, is denoted as
12.3 metersMiddle shell vector isBy sample setIt is denoted as
12.4 by sample setIt is marked with the improved mode of transportation manual identified process of step 4 Note, and never marker samples concentrate this part sample of removal, even
The optimization of semi-supervised SVM classifier of the step 13 based on shell vector
13.1 respectively in sample setIt carries out Bootstrap and samples sample set after 3 gained sample sets and label Carry out SVM incremental learning, note training classifier be
13.2 calculate 3 classifiersError in classificationAnd 3 classifiers exist Weight in integrated classifier
13.3 generating classifier
13.4 enable t=t+1, and go to step 6.
Step 5 to step 13 is the realization step of the semisupervised support vector machines of the invention studied, to better illustrate reality Existing process, the present invention carry out example with 10000, the city GY trip subchain, are described in detail.
Details are as follows for semisupervised support vector machines example:
By taking 10000, the city GY trip subchain as an example, illustrate the detailed process for the semisupervised support vector machines that the present invention studies.
Step 5: random sampling being carried out to conceptual data, 300 trip subchains is chosen and carries out mode of transportation manual identified, structure At marked sample set L, residue trip subchain constitutes unmarked sample set U.With to quantity be 300 with marker samples collection L into Row resampling, number of samples 200 constitute three training sample sets, and train three preliminary classification devices, union with SVM algorithm As a preliminary classification device
Step 6: utilizing classifier(current class device) to the traffic side for subchain of going on a journey in unmarked sample set U Formula is differentiated.
Step 7: setting that termination condition is 0.85 as classifier precision or unmarked sample set U is empty set, i.e.,The remaining trip subchain of resampling constitutes verifying collection sample, judges preliminary classification devicePrecision accuracy=0.45 andIt does not meet termination condition and goes to step 8.
Step 8: calculating each trip subchain preliminary making result confidence level conf (x using formula (2)i), it chooses m=30 and sets Reliability is greater thanSample, and be added in marked sample set, update sample set.
The degree of probability that the present invention uses the distance of distance between sample and optimal classification surface to belong to a different category as sample Amount,
Wherein, f (x)=ω x+b.The class for not knowing sample for convenience of measurement SVM Active Learning marks confidence level, simplifies It calculates and influence measures effect, convolution (1) does not use measurement sample xiClass mark confidence level conf (xi):
Step 9: the semi-supervised SVM classifier according to the sample set updated in step 8 based on Tri-training
Step 10: utilizing classifierThe mode of transportation for subchain of going on a journey in unmarked sample set U is differentiated.
Step 11: judging classifierPrecision accuracy=0.47 andIt does not meet termination condition and goes to step 12。
Step 12: calculating each trip subchain preliminary making result confidence level conf (x using formula (2)i), it chooses j=10 and sets Reliability is less than threshold value Cconfth=0.50 sample calculates the shell vector in the sample, according to the improved mode of transportation of step 4 Manual identified process differentiates the mode of transportation of shell vector, and is added in marked sample set, updates sample set.
Step 13: the classifier based on CHB-ASVM Active Learning is constructed using sample set updated in step 12, it is right Unmarked sample set U is differentiated, and goes to step 6.
Step 7 is repeated to step 16, until the 18th circulation, by 35 suboptimization, classifier meets termination condition, terminates Training, and export the mode of transportation of each trip subchain.Classifier precision is as shown in Figure 6 with the variation of cycle-index.
Threshold value used in the present invention, stop condition are as shown in table 6.Based on threshold value value of the invention, the common skill in this field The example that art personnel use other values of threshold value of the present invention without creative efforts, belongs to protection of the present invention Range.
6 threshold value suggestion of table and stop condition value
Shadow of the value of these threshold values by factors such as urban land use, business activity, humane custom, public transport network layouts It rings, for different cities, value is different in different times, value recommended value for reference only in table 6.

Claims (10)

1. the mode of transportation method of discrimination of the semi-supervised SVM based on mobile phone signaling data, which comprises the following steps:
Step 1: preparation and preprocessed data;
The user's Trip chain for mobile phone signaling data will be had been based on portraying is treated as only comprising single mode of transportation trip Chain is converted into the data set that the present invention uses;
Step 2: tag design type;
The mode of transportation quantity determined is k, according to the factors such as urban transportation infrastructure construction status, data precision, design Specific mode of transportation differentiates situation, is each mode of transportation tag design, and is easy the degree distinguished according to each mode of transportation and establishes Directed acyclic graph;
Step 3: extracting trip characteristics;
By existing travel modal situation, traditional folk houses trip survey data, mobile phone signaling data, semi-supervised learning is extracted Feature, including speed, when consumption, distance, origin and destination POI classification, five major class of age of user;It calculates one by one, extract each trip The trip characteristics of subchain;
Step 4: establishing improved mode of transportation manual identified process;
The mode of transportation of trip subchain is identified with Bayesian decision tree method and third party map datum indirect labor;
Step 5: training preliminary classification device;
Trip subchain is randomly selected, according to improved mode of transportation manual identified process in step 4, identifies sample mode of transportation, If increasing sample size without all standing k kind trip mode, continuing to identify, until k kind mode of transportation has the sample of manual identified This;Trip subchain after manual identified forms marked sample set L, and residue trip subchain constitutes unmarked sample set U;Using Marker samples collection L and unmarked sample set U, the initial semi-supervised SVM classifier of training;
Step 6: differentiating the mode of transportation of unmarked sample;
Unmarked sample set U is differentiated with classifier, obtains sample mode of transportation;
Step 7: judging whether classifier meets termination condition;
Judge whether classifier meets precision, whether unmarked sample integrates U as empty set termination condition;If meeting, the classifier For optimum classifier, and export the mode of transportation of each trip subchain;Otherwise, step 8 is gone to;
Step 8: the data set of high confidence level sample updates;
It in the sample for differentiating that selected part confidence level is high in result in advance, is added in marked sample set L, updates sample set;
Step 9: the optimization of the semi-supervised SVM classifier based on Tri-training;
Utilize the semi-supervised SVM classifier of sample set updated in step 8 building Tri-training;
Step 10: differentiating the mode of transportation of unmarked sample;
Unmarked sample set U is differentiated with current class device, obtains sample mode of transportation;
Step 11: judging whether classifier meets termination condition;
The judgement that termination condition is carried out according to the method for step 7, if meeting, which is optimum classifier, and is exported each The mode of transportation for subchain of going on a journey;Otherwise, step 12 is gone to;
Step 12: the data set of low confidence sample updates;
The low sample of selected part confidence level, is added to marked sample from the differentiation result in step 9 to unmarked sample collection U In this collection L, sample set is updated;
Step 13: the optimization of the semi-supervised SVM classifier based on shell vector;
The semi-supervised SVM classifier based on shell vector is constructed using sample set updated in step 11 and goes to step 6.
2. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 1 based on mobile phone signaling data, feature It is, a kind of directed acyclic graph for optimization judgement sequence that the step 2 is established, including following procedure: according to each traffic side Each mode of transportation is ranked up by the degree that formula is easy to distinguish, and by most easily distinguish two kinds of modes of transportation and walking with it is other It is set as directed acyclic graph vertex, walking and car are set as second layer judgement, and so on.
3. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 2 based on mobile phone signaling data, feature It is, the method that the trip mode that the step 3 is established extracts feature.
4. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 3 based on mobile phone signaling data, feature It is, the step 4 establishes improved mode of transportation manual identified process, including following procedure: respectively by sample xi, (i= 1,2, L, n) utilize the mode of transportation identification process based on Bayesian Decision Tree and the mode of transportation based on third party's map datum Identification process is identified, mode of transportation result is obtainedWithThen both judge it is whether identical, if they are the same, then the sample Mode of transportation isOtherwise, it transfers to expert to compare judgement in sample, identifies the mode of transportation of the trip subchain Xi
5. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 4 based on mobile phone signaling data, feature It is, the step 5 training preliminary classification device, including following sub-step:
(5.1) simple random sampling is carried out to the trip subchain in step 1, chooses sample xi, (i=1,2, L, n) constitutes sample set N, sample size n;Using all samples in the mode of transportation manual identified process judgement sample collection N of step 4 mode of transportation simultaneously Label, until whole label classifications have sample, differentiates if increasing sample size without full trip mode is covered, continuing to identify Trip subchain afterwards constitutes marked sample set L, and residue trip subchain constitutes unmarked sample set U;
(5.2) t=1 is enabled, remembers that marked sample set isI.e.It is rightResampling is carried out, three trained samples are generated This collection L1′、L2′、L3', and three preliminary classification devices are trained with SVM algorithm, it is denoted as
(5.3) preliminary classification deviceForIt is integrated, i.e.,
6. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 5 based on mobile phone signaling data, feature It is, the step 7 judges whether classifier meets termination condition, including following sub-step:
(7.1) termination condition is determined;According to training objective, data precision, sample size situation, the classification of optimum classifier is determined Precision, semisupervised support vector machines determine termination condition to the utilization rate index of unmarked sample;
(7.2) judge current classifierWhether termination condition is met, if meeting, which is optimum classifier, and Export the mode of transportation result that the classifier determines;Otherwise, step 8 is gone to.
7. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 6 based on mobile phone signaling data, feature It is, the step 8 high confidence level sample data set updates, including following sub-step:
(8.1) preliminary making result confidence level conf (x is calculatedi);
(8.2) it in preliminary making result, chooses and m class label confidence level is selected to be greater than threshold epsilon1Sample, i.e. current class deviceThe relatively determining sample of class label, is denoted as U ε1
(8.3) the sample U ε that confidence level is high1It is added to marked sample setIn, update sample set.
8. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 7 based on mobile phone signaling data, feature It is, the optimization of semi-supervised SVM classifier of the step 9 based on Tri-training, including following sub-step:
(9.1) Tri-training semi-supervised learning is carried out using the sample set that step 8 updates, generates classifier
(9.2) three classifiers are calculatedError in classificationCalculate 3 classifiers Weight in integrated classifier
(9.3) integrated classifier is generatedI.e.
9. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 8 based on mobile phone signaling data, feature It is, the step 12 low confidence sample data set updates, including following sub-step:
(12.1) preliminary making result confidence level conf (x is calculatedi);
(12.2) it in preliminary making result, chooses and j class label confidence level is selected to be less than threshold epsilon2Sample, i.e. current class deviceRelatively uncertain sample, is denoted as U ε2
(12.3) U ε is counted2Middle shell vector isBy sample setIt is denoted as
(12.4) by sample setIt is marked with the improved mode of transportation manual identified process of step 4, And never marker samples concentrate this part sample of removal, even
10. the mode of transportation method of discrimination of the semi-supervised SVM according to claim 9 based on mobile phone signaling data, special Sign is, the optimization of semi-supervised SVM classifier of the step 13 based on shell vector, including following sub-step:
(13.1) respectively in sample setSample set after carrying out sample set obtained by 3 resamplings and markingCarry out SVM increasing Amount study, note training classifier be
(13.2) 3 classifiers are calculatedError in classificationAnd 3 classifiers are integrated Weight in classifier
(13.3) classifier is generated
(13.4) t=t+1 is enabled, and goes to step 6.
CN201910076104.3A 2019-01-26 2019-01-26 Traffic mode discrimination method of semi-supervised SVM (support vector machine) based on mobile phone signaling data Expired - Fee Related CN109784416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910076104.3A CN109784416B (en) 2019-01-26 2019-01-26 Traffic mode discrimination method of semi-supervised SVM (support vector machine) based on mobile phone signaling data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910076104.3A CN109784416B (en) 2019-01-26 2019-01-26 Traffic mode discrimination method of semi-supervised SVM (support vector machine) based on mobile phone signaling data

Publications (2)

Publication Number Publication Date
CN109784416A true CN109784416A (en) 2019-05-21
CN109784416B CN109784416B (en) 2020-08-04

Family

ID=66502430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910076104.3A Expired - Fee Related CN109784416B (en) 2019-01-26 2019-01-26 Traffic mode discrimination method of semi-supervised SVM (support vector machine) based on mobile phone signaling data

Country Status (1)

Country Link
CN (1) CN109784416B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111385753A (en) * 2019-10-24 2020-07-07 南京瑞栖智能交通技术产业研究院有限公司 Medical facility accessibility evaluation method based on mobile phone signaling data
CN112351394A (en) * 2020-11-03 2021-02-09 崔毅 Traffic travel model construction method based on mobile phone signaling data
CN112542045A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for identifying traffic travel mode based on mobile phone signaling
CN114928809A (en) * 2021-06-11 2022-08-19 荣耀终端有限公司 Use method of geographic fence and electronic equipment
CN117541269A (en) * 2023-12-08 2024-02-09 北京中数睿智科技有限公司 Third party module data real-time monitoring method and system based on intelligent large model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007087537A2 (en) * 2006-01-23 2007-08-02 The Trustees Of Columbia University In The City Of New York System and method for grading electricity distribution network feeders susceptible to impending failure
CN103903441A (en) * 2014-04-04 2014-07-02 山东省计算中心 Road traffic state distinguishing method based on semi-supervised learning
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN105117789A (en) * 2015-07-29 2015-12-02 西南交通大学 Resident trip mode comprehensive judging method based on handset signaling data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007087537A2 (en) * 2006-01-23 2007-08-02 The Trustees Of Columbia University In The City Of New York System and method for grading electricity distribution network feeders susceptible to impending failure
CN103903441A (en) * 2014-04-04 2014-07-02 山东省计算中心 Road traffic state distinguishing method based on semi-supervised learning
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN105117789A (en) * 2015-07-29 2015-12-02 西南交通大学 Resident trip mode comprehensive judging method based on handset signaling data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAHANGIRI ARASH ET AL: "applying machine learning techniques to transprotation mode recogniton using mobile phone sensor data", 《IEEE TRANSACTION ON INTELLIGENT TRANSPORTATION SYSTEMS》 *
张锦 等: "城市轨道交通规划模糊综合评价方法研究", 《铁道运输与经济》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111385753A (en) * 2019-10-24 2020-07-07 南京瑞栖智能交通技术产业研究院有限公司 Medical facility accessibility evaluation method based on mobile phone signaling data
CN111385753B (en) * 2019-10-24 2022-01-04 南京瑞栖智能交通技术产业研究院有限公司 Medical facility accessibility evaluation method based on mobile phone signaling data
CN112351394A (en) * 2020-11-03 2021-02-09 崔毅 Traffic travel model construction method based on mobile phone signaling data
CN112542045A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for identifying traffic travel mode based on mobile phone signaling
CN114928809A (en) * 2021-06-11 2022-08-19 荣耀终端有限公司 Use method of geographic fence and electronic equipment
CN114928809B (en) * 2021-06-11 2023-04-07 荣耀终端有限公司 Use method of geographic fence and electronic equipment
CN117541269A (en) * 2023-12-08 2024-02-09 北京中数睿智科技有限公司 Third party module data real-time monitoring method and system based on intelligent large model

Also Published As

Publication number Publication date
CN109784416B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN109784416A (en) The mode of transportation method of discrimination of semi-supervised SVM based on mobile phone signaling data
Meng et al. City-wide traffic volume inference with loop detector data and taxi trajectories
Li et al. Transportation mode identification with GPS trajectory data and GIS information
Li et al. Prediction of urban human mobility using large-scale taxi traces and its applications
CN105447504B (en) A kind of travel pattern Activity recognition method and corresponding identification model construction method
CN106384120B (en) A kind of resident's activity pattern method for digging and device based on mobile phone location data
CN108427965A (en) A kind of hot spot region method for digging based on road network cluster
CN110472066A (en) A kind of construction method of urban geography semantic knowledge map
CN108389420A (en) A kind of bus passenger get-off stop real-time identification method based on history trip characteristics
CN105513370B (en) The traffic zone division methods excavated based on sparse license plate identification data
CN113378891B (en) Urban area relation visual analysis method based on track distribution representation
CN107656987A (en) A kind of subway station function method for digging based on LDA models
CN109102114B (en) Bus trip getting-off station estimation method based on data fusion
CN111653096A (en) Urban trip mode identification method based on mobile phone signaling data
CN110442715A (en) A kind of conurbation geographical semantics method for digging based on polynary big data
CN106157624B (en) More granularity roads based on traffic location data shunt visual analysis method
Namiot et al. A Survey of Smart Cards Data Mining.
CN108108859A (en) A kind of traffic administration duties optimization method based on big data analysis
CN105893352A (en) Air quality early-warning and monitoring analysis system based on big data of social network
CN112884014A (en) Traffic speed short-time prediction method based on road section topological structure classification
Rezaie et al. Semi-supervised travel mode detection from smartphone data
CN106570182A (en) Getting-off station recognition method and system for bus
CN107067727B (en) A kind of road traffic service level evaluation method based on fuzzy KNN characteristic matching
ZHAO et al. Big data-driven residents’ travel mode choice: a research overview
CN110955804B (en) Adaboost method for user space-time data behavior detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200804

Termination date: 20210126