CN107301433A - Net based on clustering and discriminant model about car discrimination method and system - Google Patents
Net based on clustering and discriminant model about car discrimination method and system Download PDFInfo
- Publication number
- CN107301433A CN107301433A CN201710573249.5A CN201710573249A CN107301433A CN 107301433 A CN107301433 A CN 107301433A CN 201710573249 A CN201710573249 A CN 201710573249A CN 107301433 A CN107301433 A CN 107301433A
- Authority
- CN
- China
- Prior art keywords
- driver
- mrow
- clustering
- net
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012850 discrimination method Methods 0.000 title claims abstract description 12
- 230000011664 signaling Effects 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 32
- 238000012360 testing method Methods 0.000 claims description 14
- 238000013480 data collection Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000000034 method Methods 0.000 abstract description 9
- 239000000284 extract Substances 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000007621 cluster analysis Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 241000238097 Callinectes sapidus Species 0.000 description 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 241001601331 Sphingomonas taxi Species 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/02—Reservations, e.g. for tickets, services or events
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Marketing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of net based on clustering and discriminant model about car discrimination method and system, this method comprises the following steps:Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, the driver user of several unknown classifications is randomly selected as sample set N;Step (2):Carry out feature extraction;Step (3):Feature is analyzed;Step (4):Set up model;Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and judged.Based on the signaling data of mobile phone, extract the moving characteristic of driver, can be in the case where only knowing a class data label, whether the data for determining Unknown Label belong to known class, rapid and convenient, the result identified can hit the non-net of justice about car for traffic administration department and be serviced, and help them quickly to position suspected vehicles, the human cost of law enforcement is reduced, operating efficiency is lifted.
Description
Technical field
The invention belongs to net about car administrative skill field, reflected more particularly, to a kind of net based on clustering and discriminant model about car
Other method and system.
Background technology
Under the promotion of the background and market trend of " internet+", net about car is fast as a kind of emerging trip car mode
Short-term training is the favorite in market, the important component gone on a journey as wisdom.
Net about car is online order taxi, is that one kind connects passenger, driver and vehicle, passenger passes through intelligence
Mobile phone application software, preengages the trip mode of driver's pickup and delivery service.The appearance of net about car, meets public's variation trip need
Ask, improve the utilization ratio of motor vehicle, but with the continuous expansion of net about car scale, a series of social supervisions that it brings
Problem is also what be can not be ignored.
Net about car had not only been had any different but also had been related with traditional taxi.In vehicle color and vehicle, taxi typically has
Unified color and mark, net about car is then varied.On operation way, taxi can cruise attract customers, website wait visitor and
Reservation is received lodgers, and net about car cannot cruise prostitutions, can only preengage serve by the network platform.In supervision,
Taxi is typically managed collectively by taxi company, and net about car then lacks certain oversight mechanism.
Initial stage, net about car is the supplement to taxi.With increasing for net about car dedicated driver, net about car is hired out to tradition
Garage's industry forms certain impact, by the resistance of taxi driver to a certain extent.Further, since net about car platform pair
The examination of driver and vehicle is not strict, and market confusion is lived again, and the social concern such as dispute, accident emerges in an endless stream, net about car market
Need standardized administration badly.
In order to manage the confusion of net about car market,《The operating service of online order taxi manages Tentative Measures》In 2016
Implemented from November 1, in.Wherein clear stipulaties, in operating service, driver must not in the street cruise and attract customers, not Ying Ji
Field, railway station etc. are set up uniformly to cruise car dispatch service station or carry out the objective place of time of queuing up and attracted customers.
Under the net overall background that about the new rule of car operation are put into effect, Department of Communications is used as public trip service management mechanism, it is necessary to plus
Management to net about car by force.It is, to carry out by way of manually patrolling, but so to expend to the way to manage of net about car at present
Substantial amounts of manpower, therefore, Department of Communications is in the urgent need to a kind of screening mode of automation, to help them to lock suspected vehicles,
Realize law enforcement rapidly and efficiently.
The content of the invention
Based on the problem to be solved in the present invention is to provide a kind of signaling data by mobile phone, the mobile spy of driver is extracted
The net based on clustering and discriminant model levied about car discrimination method.
To solve above-mentioned technical proposal, the technical solution adopted by the present invention is that the net based on clustering and discriminant model about car reflects
Other method comprises the following steps:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, with
Machine extracts the driver user of several unknown classifications as sample set N;
Step (2):Obtain in the step (1) signaling of the driver user within a period of time in sample set M and sample set N
Data, carry out feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi department
There is certain otherness in machine;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample
This collection N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample in the training set P is rejected
Point, obtains cluster centre point, calculates in training set P each effective sample point to cluster centre point apart from sum, and be based on away from
From the threshold value that increment situation of change draws classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and sentenced
It is fixed.
In the present invention, based on the signaling data of mobile phone, the moving characteristic of driver is extracted, a class can be only being known
In the case of data label, whether the data for determining Unknown Label belong to known class, rapid and convenient;Pass through step (3)
Signature analysis, could be aware that whether the feature of extraction in step (2) correct, if without otherness, illustrating feature extraction
It is problematic;Clustering Model using taxi driver as sample is established by the step (4), so, can in step (5)
Whether known taxi classification is belonged to the signaling data for determining unknown driver user rapidly and efficiently.
It is preferred that, in the step (4), the model drawn in the step (4) is verified using checking collection Q,
Tested using test set N.
The accuracy of Clustering Model can be improved using checking collection Q and test set N.
It is preferred that, in the step (2), the feature of extraction includes cell and switches and be resident duration, wherein, feature cell is cut
Change including cell switching a few days average, cell switching a few days standard deviation, busy cell switching number average, busy cell switching number mark
Accurate poor, idle cell switching number average and idle cell switching number standard deviation;Feature be resident duration include busy be resident median,
Busy is resident average, busy and is resident the resident median of standard deviation, idle, the resident average of idle and the resident standard deviation of idle.
It is preferred that, in the step (4), for training set P, preferable clustering number K, profile system are calculated using silhouette coefficient
Number is the evaluation index of the intensive and degree of scatter of class, and formula is as follows:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
It is preferred that, in the step (4), calculate distance of each effective sample point to cluster centre point in training set P
Sum, and sorting, draws increment graph, and X-axis represents sample sequence number in training set P, Y-axis represent sample point to central point distance it
With draw training set P flex point, the value in Y-axis corresponding to the flex point is the threshold value y of classification;
Threshold=y(x=101)=2.239995.
The present invention is other problem solved is that provide a kind of net based on clustering and discriminant model about car identification system, this is
System includes data collection module, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's letter being collected into several described data collection modules
Data are made as sample set M;Randomly select the driver user for the unknown classification being collected into several described data collection modules
It is used as sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging to be somebody's turn to do by clustering and discriminant model
The classification of driver user.
Net based on the clustering and discriminant model mobile phone signaling data that about car identification system is provided with mobile operator of the invention
Based on, using the discrimination model based on cluster, taxi driver and net Yue Che driver are judged, the result energy identified
Enough the non-net of justice about car is hit for traffic administration department to be serviced, help them quickly to position suspected vehicles, reduce the people of law enforcement
Power cost, lifts operating efficiency.
Brief description of the drawings
It is further described below in conjunction with the accompanying drawings with embodiments of the present invention:
Fig. 1 is resident the sample distribution scatterplot that standard deviation characteristic is drawn to choose cell switching a few days standard deviation characteristic and idle
Figure;
Fig. 2 dissipates to choose the sample distribution that cell switching a few days characteristics of mean and cell switching a few days standard deviation characteristic are drawn
Point diagram;
Fig. 3 is t-SNE Feature Dimension Reduction sample distribution figures;
Fig. 4 is modeling analysis flow chart;
Fig. 5 is to obtain preferable clustering number schematic diagram;
Fig. 6 is cluster analysis result schematic diagram;
Fig. 7 is the cluster analysis result schematic diagram after rejecting abnormalities value;
Fig. 8 is cluster centre distribution line chart;
Fig. 9 is cluster sample distribution box-shaped figure in notable feature;
Figure 10 is increment graphs apart from sum sequence after of each effective sample point x to central point in training set P;
Figure 11 is the net of the invention based on clustering and discriminant model about car discrimination method simple process structure chart;
Figure 12 is the net of the invention based on clustering and discriminant model about car identification system structure chart.
Embodiment
As shown in figure 11, the net based on clustering and discriminant model of the embodiment of the present invention about car discrimination method includes following step
Suddenly:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, with
Machine extracts the driver user of several unknown classifications as sample set N;
Step (2):Obtain in the step (1) signaling of the driver user within a period of time in sample set M and sample set N
Data, carry out feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi department
There is certain otherness in machine;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample
This collection N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample in the training set P is rejected
Point, obtains cluster centre point, calculates in training set P each effective sample point to cluster centre point apart from sum, and be based on away from
From the threshold value that increment situation of change draws classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and sentenced
It is fixed.
In the step (4), the model drawn in the step (4) is verified using checking collection Q, using test
Collection N is tested.
In the step (2), the feature of extraction includes cell and switches and be resident duration, wherein, the switching of feature cell includes
Cell switching a few days average, cell switching a few days standard deviation, busy cell switching number average, busy cell switching number standard deviation,
Idle cell switches number average and idle cell switching number standard deviation;Feature, which is resident duration, includes busy resident median, busy
Resident average, busy are resident standard deviation, idle and are resident the resident average of median, idle and the resident standard deviation of idle.
In addition, in the step (4), for training set P, preferable clustering number K, silhouette coefficient are calculated using silhouette coefficient
It is the evaluation index of the intensive and degree of scatter of class, formula is as follows:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
In the step (4), each effective sample point is calculated in training set P to cluster centre point apart from sum, and
Sequence, draws increment graph, and X-axis represents sample sequence number in training set P, and Y-axis represents that sample point, apart from sum, is drawn to central point
Training set P flex point, the value in Y-axis corresponding to the flex point is the threshold value y of classification;
Threshold=y(x=101)=2.239995.
The about car discrimination method concrete operations of net of the present embodiment based on clustering and discriminant model are as follows:
Data acquisition:
As shown in table 1, driver user is obtained based on following 3 raw data sets:
Table 1
Dataset name | Explanation |
A | Taxi driver's user list that transportation department provides |
B | Taxi group user list |
C | Base station occurred and using drop drop driver app driver user near southern station |
Taxi driver's user data set is:
D=A ∩ B ∩ C
In data set D, 150 known taxi driver users are randomly selected as sample set M.
E=C-D
In data set E, the driver user of 150 unknown classifications is randomly selected as sample set N.
Feature extraction:
The signaling data in 300 users 6 days to two weeks between March 19 March in 2017 is used as feature extraction more than extracting
Initial data.
Define the 9 of Mon-Fri:00-17:00 is busy, Mon-Fri 17:00-24:00 and 0:00-9:00 is the spare time
When.
The feature of extraction mainly includes cell and switches and be resident duration, as shown in table 2:
Table 2
Features above is extracted, scatter diagram is drawn by choosing any 2 dimensional feature, as shown in Figure 1, 2:In Fig. 1, abscissa table
Cell switching a few days standard deviation characteristic after indicating quasi- normalization, ordinate represents that the idle after standard normalization is resident standard deviation
Feature;In Fig. 2, abscissa represents the cell switching a few days characteristics of mean after standard normalization, and ordinate represents that standard is normalized
Cell switching a few days standard deviation characteristic afterwards.Red point represents sample set M, i.e. taxi driver, and blue point represents sample
Collect N, i.e., the driver user of unknown classification;By Fig. 1 and Fig. 2, intuitively, sample set M and sample set N distribution is present necessarily
Otherness, the behavior difference of two class drivers is reflected from side illustration feature to a certain extent.
Signature analysis:
T-SNE (t-Distributed Stochastic Neighbor Embedding) is by Laurens van der
Maaten and Geoffrey Hinton propose a kind of method of (Manifold) Data Dimensionality Reduction of manifold.It is on SNE basis
On develop, the t being distributed under lower dimensional space using heavier long-tail is distributed to avoid crowding problems and be difficult to optimize
The problem of.
Euclidean distance first is converted to conditional probability to express similarity between points by the algorithm.It is given one
The data x of N number of higher-dimension1..., xN, calculate Probability pj|iFor:
To the y under low dimensionali, 2 similarities after being distributed using t are:
The gradient of optimization is:
Dimension reduction and visualization is carried out to feature using t-SNE;As shown in figure 3, can be seen that base from Fig. 3 visualization result
In the feature of selection, there is certain otherness in the distribution of two class drivers.
Set up model:
The discrimination model based on cluster is used to differentiate that unknown driver user still nets Yue Che driver for taxi driver,
Specific analysis process is as shown in Figure 4.
1st, cluster numbers are selected
By sample set M according to 8:2 random divisions are cluster training set P and checking collection Q, regard sample set N as test set N.
For training set P, preferable clustering number K, profile are calculated using silhouette coefficient (Silhouette Coefficient)
Coefficient is the evaluation index of the intensive and degree of scatter of class:
Wherein:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
As shown in Figure 5, when cluster numbers are 3, s (i) value is maximum.Therefore, preferable clustering number K=3 is taken.
2nd, clustering
Clustering is carried out to training set P using K-Means algorithms.
K-Means belongs to division formula clustering algorithm, and cluster similarity is that the average for utilizing object in each cluster obtains one
Individual center is calculated.Its main working process is:Arbitrarily k object of selection is as first first from n data object
Beginning cluster centre, for remaining other objects, then according to their similarities (distance) with these cluster centres, respectively will
They distribute to the cluster most like with it;Then calculate that each to obtain the cluster centre newly clustered (all right in the cluster again
The average of elephant);This process is constantly repeated untill canonical measure function starts convergence.Typically mean square deviation is used as standard
Measure function.
Training set P is polymerized to 3 classes, obtained cluster result is as shown in Figure 6.
On the basis of above cluster result, abnormity point is handled, 108 effective sampling points are obtained.It mainly divides
Cloth situation is as shown in table 3.
Table 3
Classification | cluster1 | cluster2 | cluster3 | It is total |
Sample number | 46 | 45 | 17 | 108 |
As shown in fig. 7, accordingly, for each clustering cluster, each dimensional characteristics value corresponding to central point can be obtained.
3rd, user behavior signature analysis
To be characterized as abscissa, characteristic value is ordinate, draws line chart, checks the distribution of three cluster centre points, such as
Shown in Fig. 8.As shown in Figure 8, three above clustering cluster otherness in 6 indexs is larger:Mean_worktime (busy cells
Switch number average);Sd_worktime (busy cell switching number standard deviation);(idle cell switches number to mean_nonworktime
Average);Sd_nonworktime (idle cell switching number standard deviation);Switch_cell_number_daily_mean (cells
Switch a few days average);Switch_cell_number_daily_sd (cell switching a few days standard deviation).
Draw distribution box-shaped figure of three classification samples more than in 6 features respectively (see Fig. 9).Abscissa is in Fig. 9
Each classification, the lower edge of each box-shaped represents minimum value, and top edge represents maximum, and the bottom of chest represents a quarter point
Position, the top of chest represents that the line in the middle of 3/4ths points of positions, chest represents median.The width of chest illustrates such very
The number of this number.Generally speaking, box-shaped figure illustrates the distribution situation of sample in each classification.
As can be seen that in 6 above-mentioned features, cluster1 is compared close with cluster2 overall trend, and
The corresponding characteristic values of cluster2 are below the corresponding characteristic values of cluster1;But cluster3 and cluster1 are in trend
It is overall opposite.Specifically, have it is following some:
(1) for the driver in cluster1, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index highest, illustrates such taxi driver on Monday extremely
The 9 of Friday:00-17:00, i.e., daytime, activity was the most frequent;
Mean_nonworktime (idle cell switching number average) index is relatively low, illustrates such taxi driver on Monday
To Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are less;
Switch_cell_number_daily_mean (cell switching a few days average) index highest, illustrates that such is hired out
Car driver's mass activity is more frequent.
Therefore, such taxi driver is the driver with typical taxi crawler behavior feature.
(2) for the driver in cluster2, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index is relatively low, illustrates such taxi driver on Monday extremely
The 9 of Friday:00-17:00, i.e., daytime, activity was less frequent;
Mean_nonworktime (idle cell switching number average) index is relatively low, illustrates such taxi driver in week
One to Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are also less frequently less;
Switch_cell_number_daily_mean (cell switching a few days average) index is equally relatively low, illustrates such
The mass activity of taxi driver is infrequently.
As can be seen that such taxi driver switching cell number of times is relatively fewer, that is to say, that be more biased towards in some districts
Domain be resident and received guests, therefore, for the angle of subordinate act feature, and the resident behavior ratio received guests of net Yue Che driver is relatively similar.
(3) for the driver in cluster3, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index is relatively low, illustrates such taxi driver on Monday extremely
The 9 of Friday:00-17:00, i.e., daytime, activity was less frequent;
Mean_nonworktime (idle cell switching number average) index is higher, illustrates such taxi driver on Monday
To Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are more frequent;
Switch_cell_number_daily_mean (cell switching a few days average) index is higher, illustrates that such is hired out
The mass activity of car driver tends to be frequent.
As can be seen that such taxi driver has the characteristics of hiding by day and coming out at night, and therefore, for the angle of subordinate act feature,
The characteristics of being hidden by day and come out at night with typical case net Yue Che driver is also than relatively similar.
(4) all in all:
User in cluster1 has typical taxi driver's behavioural characteristic;
Although user in cluster2 and cluster3 is taxi driver, but in behavioural characteristic and net Yue Che driver
Than relatively similar;
4th, threshold value is set
Each effective sample point x is calculated in training set P to central point apart from sum, and is sorted, drafting increment graph, such as Figure 10
It is shown:In Figure 10, x-axis represents training sample sequence number, and y-axis represents sample point to central point apart from sum.
As can be seen from Figure:
Work as x<When 101, the growth rate of distance is more gentle;
Work as x>When 101, the growth rate of distance is very fast;
Thus draw:
X=101 is the flex point in sample set.Therefore, its corresponding distance, i.e. y values are set to the threshold value of classification:
Threshold=y(x=101)=2.239995.
5th, result is exported
Classification to unknown sample belongs to judgement, and this patent uses the method being combined based on cluster and threshold value to realize out
Hire a car the classification of driver and non-net of justice Yue Che drivers.
When the sample point in test set to three cluster centre points is more than threshold value apart from sum, that is, it is judged as the non-net of justice
About car, conversely, being then determined as taxi.
The result drawn is as shown in table 4 to be judged to checking collection Q and test set N:
Table 4
(1) as can be seen here:
For 30 samples in checking collection Q, judge there are 23 driver users to belong to according to the model
Taxi, achieve 76.7% recall rate.
For 150 samples in test set N, using the discrimination model based on cluster, discovery has 97 driver user's category
Taxi driver is identified as in the driver of taxi, i.e., 64.7%.
(2) further:
97 users to being judged as taxi in test set N, are classified according to it to the distance of three central points
Further classification results are obtained, summarized results is as shown in table 5:
Table 5
Classification | cluster1 | cluster2 | cluster3 | It is total |
Sample number | 11 | 86 | 0 | 97 |
In test set N accounting | 7.3% | 57.3% | 0 | 64.7% |
Therefore, only 7.3% driver is typical taxi department in test set N it can be seen from above classification results
Machine, remaining 57.3% be judged as taxi driver it is more similar with non-net of justice Yue Che driver in behavioural characteristic.
As shown in figure 12, the net of the invention based on clustering and discriminant model about car identification system includes Data Collection mould
Block, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's letter being collected into several described data collection modules
Data are made as sample set M;Randomly select the driver user for the unknown classification being collected into several described data collection modules
It is used as sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging to be somebody's turn to do by clustering and discriminant model
The classification of driver user.
The driver's subscriber signaling data newly obtained are introduced directly into system by traffic administration department, you can the result identified,
The non-net of justice about car can be hit for traffic administration department to be serviced, help them quickly to position suspected vehicles, reduction law enforcement
Human cost, lifts operating efficiency.
Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail it is bright, should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention;It is all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in the guarantor of the present invention
Within the scope of shield.
Claims (6)
1. a kind of net based on clustering and discriminant model about car discrimination method, it is characterised in that comprise the following steps:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, is taken out at random
The driver user of several unknown classifications is taken as sample set N;
Step (2):Obtain in the step (1) signaling number of the driver user within a period of time in sample set M and sample set N
According to progress feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi driver deposit
In certain otherness;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample set
N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample point in the training set P is rejected, obtains
Cluster centre point, calculate in training set P each effective sample point to cluster centre point apart from sum, and be based on distance increment
Situation of change draws the threshold value of classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and judged.
2. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step
Suddenly in (4), the model drawn in the step (4) is verified using checking collection Q, tested using test set N.
3. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that the step
(2) in, the feature of extraction includes cell and switches and be resident duration, wherein, the switching of feature cell include cell switch a few days average,
Cell switching a few days standard deviation, busy cell switching number average, busy cell switching number standard deviation, idle cell switching number average
Switch number standard deviation with idle cell;Feature, which is resident duration, includes the resident median of busy, the resident average of busy, the resident mark of busy
Accurate poor, idle is resident median, idle and is resident average and the resident standard deviation of idle.
4. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step
Suddenly in (4), for training set P, preferable clustering number K is calculated using silhouette coefficient, silhouette coefficient is the intensive and degree of scatter of class
Evaluation index, formula is as follows:
<mrow>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mi>b</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>a</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>max</mi>
<mrow>
<mo>{</mo>
<mrow>
<mi>a</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mi>b</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</mrow>
<mo>}</mo>
</mrow>
</mrow>
</mfrac>
<mo>,</mo>
</mrow>
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
5. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step
Suddenly in (4), each effective sample point is calculated in training set P to cluster centre point apart from sum, and is sorted, drafting increment graph, X
Axle represents sample sequence number in training set P, and Y-axis represents that sample point, apart from sum, draws training set P flex point to central point, should
The value in Y-axis corresponding to flex point, the threshold value y as classified;
Threshold=y(x=101)=2.239995.
6. a kind of net based on clustering and discriminant model about car identification system, it is characterised in that the system includes Data Collection mould
Block, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's signaling number being collected into several described data collection modules
According to being used as sample set M;Randomly select the driver user's conduct for the unknown classification being collected into several described data collection modules
Sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging the driver by clustering and discriminant model
The classification of user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710573249.5A CN107301433A (en) | 2017-07-14 | 2017-07-14 | Net based on clustering and discriminant model about car discrimination method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710573249.5A CN107301433A (en) | 2017-07-14 | 2017-07-14 | Net based on clustering and discriminant model about car discrimination method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107301433A true CN107301433A (en) | 2017-10-27 |
Family
ID=60133952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710573249.5A Pending CN107301433A (en) | 2017-07-14 | 2017-07-14 | Net based on clustering and discriminant model about car discrimination method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301433A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086793A (en) * | 2018-06-27 | 2018-12-25 | 东北大学 | A kind of abnormality recognition method of wind-driven generator |
CN109145982A (en) * | 2018-08-17 | 2019-01-04 | 上海汽车集团股份有限公司 | The personal identification method and device of driver, storage medium, terminal |
CN109727452A (en) * | 2019-01-08 | 2019-05-07 | 江苏交科能源科技发展有限公司 | Trip proportion accounting method based on mobile phone signaling data |
CN110473085A (en) * | 2019-08-13 | 2019-11-19 | 优必爱信息技术(北京)有限公司 | A kind of vehicle risk method of discrimination and device |
CN110544151A (en) * | 2019-08-20 | 2019-12-06 | 北京市天元网络技术股份有限公司 | Method and equipment for determining whether user is online car booking driver |
CN111091215A (en) * | 2019-12-11 | 2020-05-01 | 浙江大搜车软件技术有限公司 | Vehicle identification method and device, computer equipment and storage medium |
CN111275507A (en) * | 2018-12-04 | 2020-06-12 | 北京嘀嘀无限科技发展有限公司 | Order abnormity identification and order risk management and control method and system |
CN111310721A (en) * | 2020-03-12 | 2020-06-19 | 中设设计集团股份有限公司 | Intelligent detection method and system for illegal network car booking |
CN111368858A (en) * | 2018-12-25 | 2020-07-03 | 中国移动通信集团广东有限公司 | User satisfaction evaluation method and device |
CN113775929A (en) * | 2021-09-28 | 2021-12-10 | 上海天麦能源科技有限公司 | Urban gas pipe network layout area division method |
CN115631632A (en) * | 2022-12-19 | 2023-01-20 | 北京码牛科技股份有限公司 | Vehicle-based track feature identification network car booking method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156792A (en) * | 2016-06-24 | 2016-11-23 | 中国电力科学研究院 | A kind of low-voltage platform area clustering method based on platform district electric characteristic parameter |
-
2017
- 2017-07-14 CN CN201710573249.5A patent/CN107301433A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156792A (en) * | 2016-06-24 | 2016-11-23 | 中国电力科学研究院 | A kind of low-voltage platform area clustering method based on platform district electric characteristic parameter |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086793A (en) * | 2018-06-27 | 2018-12-25 | 东北大学 | A kind of abnormality recognition method of wind-driven generator |
CN109086793B (en) * | 2018-06-27 | 2021-11-16 | 东北大学 | Abnormity identification method for wind driven generator |
CN109145982A (en) * | 2018-08-17 | 2019-01-04 | 上海汽车集团股份有限公司 | The personal identification method and device of driver, storage medium, terminal |
CN111275507A (en) * | 2018-12-04 | 2020-06-12 | 北京嘀嘀无限科技发展有限公司 | Order abnormity identification and order risk management and control method and system |
CN111368858A (en) * | 2018-12-25 | 2020-07-03 | 中国移动通信集团广东有限公司 | User satisfaction evaluation method and device |
CN111368858B (en) * | 2018-12-25 | 2023-11-24 | 中国移动通信集团广东有限公司 | User satisfaction evaluation method and device |
CN109727452A (en) * | 2019-01-08 | 2019-05-07 | 江苏交科能源科技发展有限公司 | Trip proportion accounting method based on mobile phone signaling data |
CN110473085A (en) * | 2019-08-13 | 2019-11-19 | 优必爱信息技术(北京)有限公司 | A kind of vehicle risk method of discrimination and device |
CN110544151A (en) * | 2019-08-20 | 2019-12-06 | 北京市天元网络技术股份有限公司 | Method and equipment for determining whether user is online car booking driver |
CN111091215A (en) * | 2019-12-11 | 2020-05-01 | 浙江大搜车软件技术有限公司 | Vehicle identification method and device, computer equipment and storage medium |
CN111091215B (en) * | 2019-12-11 | 2023-10-20 | 浙江大搜车软件技术有限公司 | Vehicle identification method, device, computer equipment and storage medium |
CN111310721A (en) * | 2020-03-12 | 2020-06-19 | 中设设计集团股份有限公司 | Intelligent detection method and system for illegal network car booking |
CN111310721B (en) * | 2020-03-12 | 2024-03-26 | 华设设计集团股份有限公司 | Illegal network vehicle-closing intelligent detection method and system |
CN113775929A (en) * | 2021-09-28 | 2021-12-10 | 上海天麦能源科技有限公司 | Urban gas pipe network layout area division method |
CN115631632A (en) * | 2022-12-19 | 2023-01-20 | 北京码牛科技股份有限公司 | Vehicle-based track feature identification network car booking method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107301433A (en) | Net based on clustering and discriminant model about car discrimination method and system | |
CN111462488B (en) | Intersection safety risk assessment method based on deep convolutional neural network and intersection behavior characteristic model | |
CN104268599B (en) | Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis | |
CN103632168B (en) | Classifier integration method for machine learning | |
CN102346847B (en) | License plate character recognizing method of support vector machine | |
CN106846538B (en) | Cross vehicle record treating method and apparatus | |
CN105931068A (en) | Cardholder consumption figure generation method and device | |
CN108492557A (en) | Highway jam level judgment method based on multi-model fusion | |
CN106651027B (en) | Internet regular bus route optimization method based on social network | |
CN108764366A (en) | Feature selecting and cluster for lack of balance data integrate two sorting techniques | |
CN108847022B (en) | Abnormal value detection method of microwave traffic data acquisition equipment | |
CN108764375B (en) | Highway goods stock transprovincially matching process and device | |
CN105809193B (en) | A kind of recognition methods of the illegal vehicle in use based on kmeans algorithm | |
CN108682153B (en) | Urban road traffic jam state discrimination method based on RFID electronic license plate data | |
CN104750800A (en) | Motor vehicle clustering method based on travel time characteristic | |
CN102324038A (en) | A kind of floristics recognition methods based on digital picture | |
CN107274066B (en) | LRFMD model-based shared traffic customer value analysis method | |
CN110562261B (en) | Method for detecting risk level of driver based on Markov model | |
CN111046937A (en) | Two-segment passenger crowd trip purpose analysis method fusing public transportation data and POI data | |
CN101655909A (en) | Device and method for calculating matching degree | |
CN102360434B (en) | Target classification method of vehicle and pedestrian in intelligent traffic monitoring | |
CN105654574A (en) | Vehicle equipment-based driving behavior evaluation method and vehicle equipment-based driving behavior evaluation device | |
CN109191828B (en) | Traffic participant accident risk prediction method based on ensemble learning | |
CN103235954A (en) | Improved AdaBoost algorithm-based foundation cloud picture identification method | |
CN107194815B (en) | Client segmentation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171027 |
|
RJ01 | Rejection of invention patent application after publication |