CN107301433A - Net based on clustering and discriminant model about car discrimination method and system - Google Patents

Net based on clustering and discriminant model about car discrimination method and system Download PDF

Info

Publication number
CN107301433A
CN107301433A CN201710573249.5A CN201710573249A CN107301433A CN 107301433 A CN107301433 A CN 107301433A CN 201710573249 A CN201710573249 A CN 201710573249A CN 107301433 A CN107301433 A CN 107301433A
Authority
CN
China
Prior art keywords
driver
mrow
clustering
net
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710573249.5A
Other languages
Chinese (zh)
Inventor
冷婷
谈炜
石路路
王计斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Hua Su Science And Technology Ltd
Original Assignee
Nanjing Hua Su Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Hua Su Science And Technology Ltd filed Critical Nanjing Hua Su Science And Technology Ltd
Priority to CN201710573249.5A priority Critical patent/CN107301433A/en
Publication of CN107301433A publication Critical patent/CN107301433A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/02Reservations, e.g. for tickets, services or events
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of net based on clustering and discriminant model about car discrimination method and system, this method comprises the following steps:Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, the driver user of several unknown classifications is randomly selected as sample set N;Step (2):Carry out feature extraction;Step (3):Feature is analyzed;Step (4):Set up model;Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and judged.Based on the signaling data of mobile phone, extract the moving characteristic of driver, can be in the case where only knowing a class data label, whether the data for determining Unknown Label belong to known class, rapid and convenient, the result identified can hit the non-net of justice about car for traffic administration department and be serviced, and help them quickly to position suspected vehicles, the human cost of law enforcement is reduced, operating efficiency is lifted.

Description

Net based on clustering and discriminant model about car discrimination method and system
Technical field
The invention belongs to net about car administrative skill field, reflected more particularly, to a kind of net based on clustering and discriminant model about car Other method and system.
Background technology
Under the promotion of the background and market trend of " internet+", net about car is fast as a kind of emerging trip car mode Short-term training is the favorite in market, the important component gone on a journey as wisdom.
Net about car is online order taxi, is that one kind connects passenger, driver and vehicle, passenger passes through intelligence Mobile phone application software, preengages the trip mode of driver's pickup and delivery service.The appearance of net about car, meets public's variation trip need Ask, improve the utilization ratio of motor vehicle, but with the continuous expansion of net about car scale, a series of social supervisions that it brings Problem is also what be can not be ignored.
Net about car had not only been had any different but also had been related with traditional taxi.In vehicle color and vehicle, taxi typically has Unified color and mark, net about car is then varied.On operation way, taxi can cruise attract customers, website wait visitor and Reservation is received lodgers, and net about car cannot cruise prostitutions, can only preengage serve by the network platform.In supervision, Taxi is typically managed collectively by taxi company, and net about car then lacks certain oversight mechanism.
Initial stage, net about car is the supplement to taxi.With increasing for net about car dedicated driver, net about car is hired out to tradition Garage's industry forms certain impact, by the resistance of taxi driver to a certain extent.Further, since net about car platform pair The examination of driver and vehicle is not strict, and market confusion is lived again, and the social concern such as dispute, accident emerges in an endless stream, net about car market Need standardized administration badly.
In order to manage the confusion of net about car market,《The operating service of online order taxi manages Tentative Measures》In 2016 Implemented from November 1, in.Wherein clear stipulaties, in operating service, driver must not in the street cruise and attract customers, not Ying Ji Field, railway station etc. are set up uniformly to cruise car dispatch service station or carry out the objective place of time of queuing up and attracted customers.
Under the net overall background that about the new rule of car operation are put into effect, Department of Communications is used as public trip service management mechanism, it is necessary to plus Management to net about car by force.It is, to carry out by way of manually patrolling, but so to expend to the way to manage of net about car at present Substantial amounts of manpower, therefore, Department of Communications is in the urgent need to a kind of screening mode of automation, to help them to lock suspected vehicles, Realize law enforcement rapidly and efficiently.
The content of the invention
Based on the problem to be solved in the present invention is to provide a kind of signaling data by mobile phone, the mobile spy of driver is extracted The net based on clustering and discriminant model levied about car discrimination method.
To solve above-mentioned technical proposal, the technical solution adopted by the present invention is that the net based on clustering and discriminant model about car reflects Other method comprises the following steps:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, with Machine extracts the driver user of several unknown classifications as sample set N;
Step (2):Obtain in the step (1) signaling of the driver user within a period of time in sample set M and sample set N Data, carry out feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi department There is certain otherness in machine;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample This collection N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample in the training set P is rejected Point, obtains cluster centre point, calculates in training set P each effective sample point to cluster centre point apart from sum, and be based on away from From the threshold value that increment situation of change draws classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and sentenced It is fixed.
In the present invention, based on the signaling data of mobile phone, the moving characteristic of driver is extracted, a class can be only being known In the case of data label, whether the data for determining Unknown Label belong to known class, rapid and convenient;Pass through step (3) Signature analysis, could be aware that whether the feature of extraction in step (2) correct, if without otherness, illustrating feature extraction It is problematic;Clustering Model using taxi driver as sample is established by the step (4), so, can in step (5) Whether known taxi classification is belonged to the signaling data for determining unknown driver user rapidly and efficiently.
It is preferred that, in the step (4), the model drawn in the step (4) is verified using checking collection Q, Tested using test set N.
The accuracy of Clustering Model can be improved using checking collection Q and test set N.
It is preferred that, in the step (2), the feature of extraction includes cell and switches and be resident duration, wherein, feature cell is cut Change including cell switching a few days average, cell switching a few days standard deviation, busy cell switching number average, busy cell switching number mark Accurate poor, idle cell switching number average and idle cell switching number standard deviation;Feature be resident duration include busy be resident median, Busy is resident average, busy and is resident the resident median of standard deviation, idle, the resident average of idle and the resident standard deviation of idle.
It is preferred that, in the step (4), for training set P, preferable clustering number K, profile system are calculated using silhouette coefficient Number is the evaluation index of the intensive and degree of scatter of class, and formula is as follows:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
It is preferred that, in the step (4), calculate distance of each effective sample point to cluster centre point in training set P Sum, and sorting, draws increment graph, and X-axis represents sample sequence number in training set P, Y-axis represent sample point to central point distance it With draw training set P flex point, the value in Y-axis corresponding to the flex point is the threshold value y of classification;
Threshold=y(x=101)=2.239995.
The present invention is other problem solved is that provide a kind of net based on clustering and discriminant model about car identification system, this is System includes data collection module, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's letter being collected into several described data collection modules Data are made as sample set M;Randomly select the driver user for the unknown classification being collected into several described data collection modules It is used as sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging to be somebody's turn to do by clustering and discriminant model The classification of driver user.
Net based on the clustering and discriminant model mobile phone signaling data that about car identification system is provided with mobile operator of the invention Based on, using the discrimination model based on cluster, taxi driver and net Yue Che driver are judged, the result energy identified Enough the non-net of justice about car is hit for traffic administration department to be serviced, help them quickly to position suspected vehicles, reduce the people of law enforcement Power cost, lifts operating efficiency.
Brief description of the drawings
It is further described below in conjunction with the accompanying drawings with embodiments of the present invention:
Fig. 1 is resident the sample distribution scatterplot that standard deviation characteristic is drawn to choose cell switching a few days standard deviation characteristic and idle Figure;
Fig. 2 dissipates to choose the sample distribution that cell switching a few days characteristics of mean and cell switching a few days standard deviation characteristic are drawn Point diagram;
Fig. 3 is t-SNE Feature Dimension Reduction sample distribution figures;
Fig. 4 is modeling analysis flow chart;
Fig. 5 is to obtain preferable clustering number schematic diagram;
Fig. 6 is cluster analysis result schematic diagram;
Fig. 7 is the cluster analysis result schematic diagram after rejecting abnormalities value;
Fig. 8 is cluster centre distribution line chart;
Fig. 9 is cluster sample distribution box-shaped figure in notable feature;
Figure 10 is increment graphs apart from sum sequence after of each effective sample point x to central point in training set P;
Figure 11 is the net of the invention based on clustering and discriminant model about car discrimination method simple process structure chart;
Figure 12 is the net of the invention based on clustering and discriminant model about car identification system structure chart.
Embodiment
As shown in figure 11, the net based on clustering and discriminant model of the embodiment of the present invention about car discrimination method includes following step Suddenly:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, with Machine extracts the driver user of several unknown classifications as sample set N;
Step (2):Obtain in the step (1) signaling of the driver user within a period of time in sample set M and sample set N Data, carry out feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi department There is certain otherness in machine;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample This collection N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample in the training set P is rejected Point, obtains cluster centre point, calculates in training set P each effective sample point to cluster centre point apart from sum, and be based on away from From the threshold value that increment situation of change draws classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and sentenced It is fixed.
In the step (4), the model drawn in the step (4) is verified using checking collection Q, using test Collection N is tested.
In the step (2), the feature of extraction includes cell and switches and be resident duration, wherein, the switching of feature cell includes Cell switching a few days average, cell switching a few days standard deviation, busy cell switching number average, busy cell switching number standard deviation, Idle cell switches number average and idle cell switching number standard deviation;Feature, which is resident duration, includes busy resident median, busy Resident average, busy are resident standard deviation, idle and are resident the resident average of median, idle and the resident standard deviation of idle.
In addition, in the step (4), for training set P, preferable clustering number K, silhouette coefficient are calculated using silhouette coefficient It is the evaluation index of the intensive and degree of scatter of class, formula is as follows:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
In the step (4), each effective sample point is calculated in training set P to cluster centre point apart from sum, and Sequence, draws increment graph, and X-axis represents sample sequence number in training set P, and Y-axis represents that sample point, apart from sum, is drawn to central point Training set P flex point, the value in Y-axis corresponding to the flex point is the threshold value y of classification;
Threshold=y(x=101)=2.239995.
The about car discrimination method concrete operations of net of the present embodiment based on clustering and discriminant model are as follows:
Data acquisition:
As shown in table 1, driver user is obtained based on following 3 raw data sets:
Table 1
Dataset name Explanation
A Taxi driver's user list that transportation department provides
B Taxi group user list
C Base station occurred and using drop drop driver app driver user near southern station
Taxi driver's user data set is:
D=A ∩ B ∩ C
In data set D, 150 known taxi driver users are randomly selected as sample set M.
E=C-D
In data set E, the driver user of 150 unknown classifications is randomly selected as sample set N.
Feature extraction:
The signaling data in 300 users 6 days to two weeks between March 19 March in 2017 is used as feature extraction more than extracting Initial data.
Define the 9 of Mon-Fri:00-17:00 is busy, Mon-Fri 17:00-24:00 and 0:00-9:00 is the spare time When.
The feature of extraction mainly includes cell and switches and be resident duration, as shown in table 2:
Table 2
Features above is extracted, scatter diagram is drawn by choosing any 2 dimensional feature, as shown in Figure 1, 2:In Fig. 1, abscissa table Cell switching a few days standard deviation characteristic after indicating quasi- normalization, ordinate represents that the idle after standard normalization is resident standard deviation Feature;In Fig. 2, abscissa represents the cell switching a few days characteristics of mean after standard normalization, and ordinate represents that standard is normalized Cell switching a few days standard deviation characteristic afterwards.Red point represents sample set M, i.e. taxi driver, and blue point represents sample Collect N, i.e., the driver user of unknown classification;By Fig. 1 and Fig. 2, intuitively, sample set M and sample set N distribution is present necessarily Otherness, the behavior difference of two class drivers is reflected from side illustration feature to a certain extent.
Signature analysis:
T-SNE (t-Distributed Stochastic Neighbor Embedding) is by Laurens van der Maaten and Geoffrey Hinton propose a kind of method of (Manifold) Data Dimensionality Reduction of manifold.It is on SNE basis On develop, the t being distributed under lower dimensional space using heavier long-tail is distributed to avoid crowding problems and be difficult to optimize The problem of.
Euclidean distance first is converted to conditional probability to express similarity between points by the algorithm.It is given one The data x of N number of higher-dimension1..., xN, calculate Probability pj|iFor:
To the y under low dimensionali, 2 similarities after being distributed using t are:
The gradient of optimization is:
Dimension reduction and visualization is carried out to feature using t-SNE;As shown in figure 3, can be seen that base from Fig. 3 visualization result In the feature of selection, there is certain otherness in the distribution of two class drivers.
Set up model:
The discrimination model based on cluster is used to differentiate that unknown driver user still nets Yue Che driver for taxi driver, Specific analysis process is as shown in Figure 4.
1st, cluster numbers are selected
By sample set M according to 8:2 random divisions are cluster training set P and checking collection Q, regard sample set N as test set N.
For training set P, preferable clustering number K, profile are calculated using silhouette coefficient (Silhouette Coefficient) Coefficient is the evaluation index of the intensive and degree of scatter of class:
Wherein:
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
As shown in Figure 5, when cluster numbers are 3, s (i) value is maximum.Therefore, preferable clustering number K=3 is taken.
2nd, clustering
Clustering is carried out to training set P using K-Means algorithms.
K-Means belongs to division formula clustering algorithm, and cluster similarity is that the average for utilizing object in each cluster obtains one Individual center is calculated.Its main working process is:Arbitrarily k object of selection is as first first from n data object Beginning cluster centre, for remaining other objects, then according to their similarities (distance) with these cluster centres, respectively will They distribute to the cluster most like with it;Then calculate that each to obtain the cluster centre newly clustered (all right in the cluster again The average of elephant);This process is constantly repeated untill canonical measure function starts convergence.Typically mean square deviation is used as standard Measure function.
Training set P is polymerized to 3 classes, obtained cluster result is as shown in Figure 6.
On the basis of above cluster result, abnormity point is handled, 108 effective sampling points are obtained.It mainly divides Cloth situation is as shown in table 3.
Table 3
Classification cluster1 cluster2 cluster3 It is total
Sample number 46 45 17 108
As shown in fig. 7, accordingly, for each clustering cluster, each dimensional characteristics value corresponding to central point can be obtained.
3rd, user behavior signature analysis
To be characterized as abscissa, characteristic value is ordinate, draws line chart, checks the distribution of three cluster centre points, such as Shown in Fig. 8.As shown in Figure 8, three above clustering cluster otherness in 6 indexs is larger:Mean_worktime (busy cells Switch number average);Sd_worktime (busy cell switching number standard deviation);(idle cell switches number to mean_nonworktime Average);Sd_nonworktime (idle cell switching number standard deviation);Switch_cell_number_daily_mean (cells Switch a few days average);Switch_cell_number_daily_sd (cell switching a few days standard deviation).
Draw distribution box-shaped figure of three classification samples more than in 6 features respectively (see Fig. 9).Abscissa is in Fig. 9 Each classification, the lower edge of each box-shaped represents minimum value, and top edge represents maximum, and the bottom of chest represents a quarter point Position, the top of chest represents that the line in the middle of 3/4ths points of positions, chest represents median.The width of chest illustrates such very The number of this number.Generally speaking, box-shaped figure illustrates the distribution situation of sample in each classification.
As can be seen that in 6 above-mentioned features, cluster1 is compared close with cluster2 overall trend, and The corresponding characteristic values of cluster2 are below the corresponding characteristic values of cluster1;But cluster3 and cluster1 are in trend It is overall opposite.Specifically, have it is following some:
(1) for the driver in cluster1, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index highest, illustrates such taxi driver on Monday extremely The 9 of Friday:00-17:00, i.e., daytime, activity was the most frequent;
Mean_nonworktime (idle cell switching number average) index is relatively low, illustrates such taxi driver on Monday To Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are less;
Switch_cell_number_daily_mean (cell switching a few days average) index highest, illustrates that such is hired out Car driver's mass activity is more frequent.
Therefore, such taxi driver is the driver with typical taxi crawler behavior feature.
(2) for the driver in cluster2, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index is relatively low, illustrates such taxi driver on Monday extremely The 9 of Friday:00-17:00, i.e., daytime, activity was less frequent;
Mean_nonworktime (idle cell switching number average) index is relatively low, illustrates such taxi driver in week One to Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are also less frequently less;
Switch_cell_number_daily_mean (cell switching a few days average) index is equally relatively low, illustrates such The mass activity of taxi driver is infrequently.
As can be seen that such taxi driver switching cell number of times is relatively fewer, that is to say, that be more biased towards in some districts Domain be resident and received guests, therefore, for the angle of subordinate act feature, and the resident behavior ratio received guests of net Yue Che driver is relatively similar.
(3) for the driver in cluster3, have to draw a conclusion:
Mean_worktime (busy cell switching number average) index is relatively low, illustrates such taxi driver on Monday extremely The 9 of Friday:00-17:00, i.e., daytime, activity was less frequent;
Mean_nonworktime (idle cell switching number average) index is higher, illustrates such taxi driver on Monday To Friday 17:00-24:00 and 0:00-9:00, i.e. nocturnalism are more frequent;
Switch_cell_number_daily_mean (cell switching a few days average) index is higher, illustrates that such is hired out The mass activity of car driver tends to be frequent.
As can be seen that such taxi driver has the characteristics of hiding by day and coming out at night, and therefore, for the angle of subordinate act feature, The characteristics of being hidden by day and come out at night with typical case net Yue Che driver is also than relatively similar.
(4) all in all:
User in cluster1 has typical taxi driver's behavioural characteristic;
Although user in cluster2 and cluster3 is taxi driver, but in behavioural characteristic and net Yue Che driver Than relatively similar;
4th, threshold value is set
Each effective sample point x is calculated in training set P to central point apart from sum, and is sorted, drafting increment graph, such as Figure 10 It is shown:In Figure 10, x-axis represents training sample sequence number, and y-axis represents sample point to central point apart from sum.
As can be seen from Figure:
Work as x<When 101, the growth rate of distance is more gentle;
Work as x>When 101, the growth rate of distance is very fast;
Thus draw:
X=101 is the flex point in sample set.Therefore, its corresponding distance, i.e. y values are set to the threshold value of classification:
Threshold=y(x=101)=2.239995.
5th, result is exported
Classification to unknown sample belongs to judgement, and this patent uses the method being combined based on cluster and threshold value to realize out Hire a car the classification of driver and non-net of justice Yue Che drivers.
When the sample point in test set to three cluster centre points is more than threshold value apart from sum, that is, it is judged as the non-net of justice About car, conversely, being then determined as taxi.
The result drawn is as shown in table 4 to be judged to checking collection Q and test set N:
Table 4
(1) as can be seen here:
For 30 samples in checking collection Q, judge there are 23 driver users to belong to according to the model
Taxi, achieve 76.7% recall rate.
For 150 samples in test set N, using the discrimination model based on cluster, discovery has 97 driver user's category Taxi driver is identified as in the driver of taxi, i.e., 64.7%.
(2) further:
97 users to being judged as taxi in test set N, are classified according to it to the distance of three central points Further classification results are obtained, summarized results is as shown in table 5:
Table 5
Classification cluster1 cluster2 cluster3 It is total
Sample number 11 86 0 97
In test set N accounting 7.3% 57.3% 0 64.7%
Therefore, only 7.3% driver is typical taxi department in test set N it can be seen from above classification results Machine, remaining 57.3% be judged as taxi driver it is more similar with non-net of justice Yue Che driver in behavioural characteristic.
As shown in figure 12, the net of the invention based on clustering and discriminant model about car identification system includes Data Collection mould Block, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's letter being collected into several described data collection modules Data are made as sample set M;Randomly select the driver user for the unknown classification being collected into several described data collection modules It is used as sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging to be somebody's turn to do by clustering and discriminant model The classification of driver user.
The driver's subscriber signaling data newly obtained are introduced directly into system by traffic administration department, you can the result identified, The non-net of justice about car can be hit for traffic administration department to be serviced, help them quickly to position suspected vehicles, reduction law enforcement Human cost, lifts operating efficiency.
Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect Describe in detail it is bright, should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention;It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in the guarantor of the present invention Within the scope of shield.

Claims (6)

1. a kind of net based on clustering and discriminant model about car discrimination method, it is characterised in that comprise the following steps:
Step (1):Initial data is obtained, and randomly selects several known taxi driver users as sample set M, is taken out at random The driver user of several unknown classifications is taken as sample set N;
Step (2):Obtain in the step (1) signaling number of the driver user within a period of time in sample set M and sample set N According to progress feature extraction;
Step (3):By analyzing the feature that the step (2) is extracted, it is known that net Yue Che driver and taxi driver deposit In certain otherness;
Step (4):Model is set up, is cluster training set P and checking collection Q by the sample set M random divisions, by the sample set N is used as test set N;
Clustering is carried out for training set P, preferable clustering number K is calculated, the exceptional sample point in the training set P is rejected, obtains Cluster centre point, calculate in training set P each effective sample point to cluster centre point apart from sum, and be based on distance increment Situation of change draws the threshold value of classification;
Step (5):The unknown driver's signaling data collected is imported into the model of the step (4) foundation and judged.
2. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step Suddenly in (4), the model drawn in the step (4) is verified using checking collection Q, tested using test set N.
3. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that the step (2) in, the feature of extraction includes cell and switches and be resident duration, wherein, the switching of feature cell include cell switch a few days average, Cell switching a few days standard deviation, busy cell switching number average, busy cell switching number standard deviation, idle cell switching number average Switch number standard deviation with idle cell;Feature, which is resident duration, includes the resident median of busy, the resident average of busy, the resident mark of busy Accurate poor, idle is resident median, idle and is resident average and the resident standard deviation of idle.
4. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step Suddenly in (4), for training set P, preferable clustering number K is calculated using silhouette coefficient, silhouette coefficient is the intensive and degree of scatter of class Evaluation index, formula is as follows:
<mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>b</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>{</mo> <mrow> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>b</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mo>}</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow>
A (i) is the vectorial average values arrived with the dissimilar degree of other points in cluster of i, that is, measures the similarity in group;
B (i) is the minimum value for the average dissimilar degree that i vectors arrive other clusters, that is, measures the similarity between group;
Separating degree is relatively excellent between cohesion degree and group out of -1 to 1, value bigger explanation group for s (i) scope.
5. the net according to claim 1 based on clustering and discriminant model about car discrimination method, it is characterised in that in the step Suddenly in (4), each effective sample point is calculated in training set P to cluster centre point apart from sum, and is sorted, drafting increment graph, X Axle represents sample sequence number in training set P, and Y-axis represents that sample point, apart from sum, draws training set P flex point to central point, should The value in Y-axis corresponding to flex point, the threshold value y as classified;
Threshold=y(x=101)=2.239995.
6. a kind of net based on clustering and discriminant model about car identification system, it is characterised in that the system includes Data Collection mould Block, data clusters analysis module, data processing module;
Wherein, the data collection module:Signaling data for receiving net Yue Che driver and taxi driver;
Data clusters analysis module:Randomly select the taxi driver's signaling number being collected into several described data collection modules According to being used as sample set M;Randomly select the driver user's conduct for the unknown classification being collected into several described data collection modules Sample set N;Feature is extracted, based on sample set M, clustering and discriminant model is set up;
Data processing module:Obtained driver's subscriber signaling data are imported, carry out judging the driver by clustering and discriminant model The classification of user.
CN201710573249.5A 2017-07-14 2017-07-14 Net based on clustering and discriminant model about car discrimination method and system Pending CN107301433A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710573249.5A CN107301433A (en) 2017-07-14 2017-07-14 Net based on clustering and discriminant model about car discrimination method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710573249.5A CN107301433A (en) 2017-07-14 2017-07-14 Net based on clustering and discriminant model about car discrimination method and system

Publications (1)

Publication Number Publication Date
CN107301433A true CN107301433A (en) 2017-10-27

Family

ID=60133952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710573249.5A Pending CN107301433A (en) 2017-07-14 2017-07-14 Net based on clustering and discriminant model about car discrimination method and system

Country Status (1)

Country Link
CN (1) CN107301433A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086793A (en) * 2018-06-27 2018-12-25 东北大学 A kind of abnormality recognition method of wind-driven generator
CN109145982A (en) * 2018-08-17 2019-01-04 上海汽车集团股份有限公司 The personal identification method and device of driver, storage medium, terminal
CN109727452A (en) * 2019-01-08 2019-05-07 江苏交科能源科技发展有限公司 Trip proportion accounting method based on mobile phone signaling data
CN110473085A (en) * 2019-08-13 2019-11-19 优必爱信息技术(北京)有限公司 A kind of vehicle risk method of discrimination and device
CN110544151A (en) * 2019-08-20 2019-12-06 北京市天元网络技术股份有限公司 Method and equipment for determining whether user is online car booking driver
CN111091215A (en) * 2019-12-11 2020-05-01 浙江大搜车软件技术有限公司 Vehicle identification method and device, computer equipment and storage medium
CN111275507A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Order abnormity identification and order risk management and control method and system
CN111310721A (en) * 2020-03-12 2020-06-19 中设设计集团股份有限公司 Intelligent detection method and system for illegal network car booking
CN111368858A (en) * 2018-12-25 2020-07-03 中国移动通信集团广东有限公司 User satisfaction evaluation method and device
CN113775929A (en) * 2021-09-28 2021-12-10 上海天麦能源科技有限公司 Urban gas pipe network layout area division method
CN115631632A (en) * 2022-12-19 2023-01-20 北京码牛科技股份有限公司 Vehicle-based track feature identification network car booking method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156792A (en) * 2016-06-24 2016-11-23 中国电力科学研究院 A kind of low-voltage platform area clustering method based on platform district electric characteristic parameter

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156792A (en) * 2016-06-24 2016-11-23 中国电力科学研究院 A kind of low-voltage platform area clustering method based on platform district electric characteristic parameter

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086793A (en) * 2018-06-27 2018-12-25 东北大学 A kind of abnormality recognition method of wind-driven generator
CN109086793B (en) * 2018-06-27 2021-11-16 东北大学 Abnormity identification method for wind driven generator
CN109145982A (en) * 2018-08-17 2019-01-04 上海汽车集团股份有限公司 The personal identification method and device of driver, storage medium, terminal
CN111275507A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Order abnormity identification and order risk management and control method and system
CN111368858A (en) * 2018-12-25 2020-07-03 中国移动通信集团广东有限公司 User satisfaction evaluation method and device
CN111368858B (en) * 2018-12-25 2023-11-24 中国移动通信集团广东有限公司 User satisfaction evaluation method and device
CN109727452A (en) * 2019-01-08 2019-05-07 江苏交科能源科技发展有限公司 Trip proportion accounting method based on mobile phone signaling data
CN110473085A (en) * 2019-08-13 2019-11-19 优必爱信息技术(北京)有限公司 A kind of vehicle risk method of discrimination and device
CN110544151A (en) * 2019-08-20 2019-12-06 北京市天元网络技术股份有限公司 Method and equipment for determining whether user is online car booking driver
CN111091215A (en) * 2019-12-11 2020-05-01 浙江大搜车软件技术有限公司 Vehicle identification method and device, computer equipment and storage medium
CN111091215B (en) * 2019-12-11 2023-10-20 浙江大搜车软件技术有限公司 Vehicle identification method, device, computer equipment and storage medium
CN111310721A (en) * 2020-03-12 2020-06-19 中设设计集团股份有限公司 Intelligent detection method and system for illegal network car booking
CN111310721B (en) * 2020-03-12 2024-03-26 华设设计集团股份有限公司 Illegal network vehicle-closing intelligent detection method and system
CN113775929A (en) * 2021-09-28 2021-12-10 上海天麦能源科技有限公司 Urban gas pipe network layout area division method
CN115631632A (en) * 2022-12-19 2023-01-20 北京码牛科技股份有限公司 Vehicle-based track feature identification network car booking method and system

Similar Documents

Publication Publication Date Title
CN107301433A (en) Net based on clustering and discriminant model about car discrimination method and system
CN111462488B (en) Intersection safety risk assessment method based on deep convolutional neural network and intersection behavior characteristic model
CN104268599B (en) Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis
CN103632168B (en) Classifier integration method for machine learning
CN102346847B (en) License plate character recognizing method of support vector machine
CN106846538B (en) Cross vehicle record treating method and apparatus
CN105931068A (en) Cardholder consumption figure generation method and device
CN108492557A (en) Highway jam level judgment method based on multi-model fusion
CN106651027B (en) Internet regular bus route optimization method based on social network
CN108764366A (en) Feature selecting and cluster for lack of balance data integrate two sorting techniques
CN108847022B (en) Abnormal value detection method of microwave traffic data acquisition equipment
CN108764375B (en) Highway goods stock transprovincially matching process and device
CN105809193B (en) A kind of recognition methods of the illegal vehicle in use based on kmeans algorithm
CN108682153B (en) Urban road traffic jam state discrimination method based on RFID electronic license plate data
CN104750800A (en) Motor vehicle clustering method based on travel time characteristic
CN102324038A (en) A kind of floristics recognition methods based on digital picture
CN107274066B (en) LRFMD model-based shared traffic customer value analysis method
CN110562261B (en) Method for detecting risk level of driver based on Markov model
CN111046937A (en) Two-segment passenger crowd trip purpose analysis method fusing public transportation data and POI data
CN101655909A (en) Device and method for calculating matching degree
CN102360434B (en) Target classification method of vehicle and pedestrian in intelligent traffic monitoring
CN105654574A (en) Vehicle equipment-based driving behavior evaluation method and vehicle equipment-based driving behavior evaluation device
CN109191828B (en) Traffic participant accident risk prediction method based on ensemble learning
CN103235954A (en) Improved AdaBoost algorithm-based foundation cloud picture identification method
CN107194815B (en) Client segmentation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171027

RJ01 Rejection of invention patent application after publication