CN103927874A - Automatic incident detection method based on under-sampling and used for unbalanced data set - Google Patents

Automatic incident detection method based on under-sampling and used for unbalanced data set Download PDF

Info

Publication number
CN103927874A
CN103927874A CN201410177414.1A CN201410177414A CN103927874A CN 103927874 A CN103927874 A CN 103927874A CN 201410177414 A CN201410177414 A CN 201410177414A CN 103927874 A CN103927874 A CN 103927874A
Authority
CN
China
Prior art keywords
training set
traffic
sample
sampling
penalty factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410177414.1A
Other languages
Chinese (zh)
Inventor
陈淑燕
李苗华
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201410177414.1A priority Critical patent/CN103927874A/en
Publication of CN103927874A publication Critical patent/CN103927874A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an automatic incident detection method based on under-sampling and used for an unbalanced data set. The automatic incident detection method comprises the steps of (1) using a maximum and minimum normalization method to carry out normalization processing on actually-measured traffic flow data, carrying out under-sampling processing on a majority class in a training set on the basis of a neighborhood cleaning rule to obtain a new training set which is relatively balanced, (2) selecting a radial basis function as a kernel function of a support vector machine, using an improved grid search algorithm to optimize a penalty factor C and a kernel parameter g of the support vector machine, and (3) training the support vector machine through the training set which is relatively balanced so as to obtain an automatic incident detection model used for the unbalanced data set. According to the automatic incident detection method based on under-sampling and used for the unbalanced data set, the problem that an existing traffic incident detection algorithm is not applicable to unbalanced traffic data in reality is solved, detection performance of the traffic incident detection algorithm is remarkably improved, the average detection time is shortened, and the requirement of traffic incident detection for real-time performance is met.

Description

Based on owing the traffic event automatic detection method of sampling towards unbalanced data collection
Technical field
The invention belongs to traffic intelligent management and control technology field, relate to a kind of based on owing the traffic event automatic detection method of sampling towards unbalanced data collection.
Background technology
Traffic events not only causes and blocks up and incur loss through delay, and also easily causes second accident.Detect accurately and rapidly traffic events, carry out in time event rescue and processing, can effectively reduce the traffic congestion and the delay that are produced by traffic events, avoid the generation of second accident.Traffic events detects (AutomaticIncident Detection automatically, AID) be the important component part of Modern Traffic supervisory system, it is the basis of advanced traffic control system and Traveler Information system, to significantly reducing the delay being caused by traffic events, crowded and accident, improve traffic safety and service level and there is very important meaning.
In recent years, the research of AID algorithm mainly concentrates on the application aspect of the new technologies such as neural network, fuzzy theory, wavelet analysis and support vector machine.With respect to traditional incident Detection Algorithm, above-mentioned Algorithm for Traffic Incidents Detection can improve the detection performance of algorithm to a certain extent.But in real world, traffic normal operating condition is far away more than traffic events state, it is in fact uneven classification problem that traffic events detects, and this problem of the less consideration of traffic events automatic detection algorithm in the past.Mostly above-mentioned Algorithm for Traffic Incidents Detection is the algorithm of classifying based on equilibrium criterion collection, often causes higher rate of false alarm, lower verification and measurement ratio and longer average detected time for traffic events while detection, detects effect disappointing.
Support vector machine (SupportVectorMachine, SVM) detects for traffic events, but it shows significantly " having bias " in the time processing uneven classification problem, is unfavorable for the study of minority class sample.In order to overcome above-mentioned defect, the present invention is based on neighborhood cleaning rule, combination supporting vector machine, proposes a kind of based on owing the traffic event automatic detection method of sampling towards unbalanced data collection.First owe sampling to reduce its unbalancedness by the methods of sampling of owing based on neighborhood cleaning rule to the most classes in training set, then use the training set Training Support Vector Machines of relative equilibrium, make it to carry out traffic events as sorter and automatically detect.
Summary of the invention
Technical matters: the invention provides a kind of unbalancedness of number of samples between class that reduces in training set, can adapt to unbalanced traffic data in real world based on owing the traffic event automatic detection method of sampling towards unbalanced data collection.
Technical scheme: of the present invention based on owing the traffic event automatic detection method of sampling towards unbalanced data collection, comprise the steps:
1) utilize maximum-minimum specification method to carry out standardization processing to actual measurement traffic flow data, obtain original training set and test set;
2) based on neighborhood cleaning rule to step 1) most classes in the original training set that obtains owe sample process, reduce the unbalancedness of training set, obtain the training set of new relative equilibrium;
3) based on step 1) the original training set that obtains, the kernel function of support vector machine adopts radial basis function, adopt penalty factor and the nuclear parameter g of improved grid search algorithm optimization support vector machine, the optimum value of the optimum value of supported vector machine penalty factor and nuclear parameter g;
4) according to step 3) optimum value of support vector machine penalty factor and the optimum value of nuclear parameter g that obtain, use step 2) the training set Training Support Vector Machines of the new relative equilibrium that obtains, obtain the automatic detection model of traffic events towards unbalanced data collection;
5) using the automatic detection model of the traffic events towards unbalanced data collection that trains, to step 1) test set that obtains carries out traffic events and automatically detects, and determines whether generation traffic events according to the Output rusults of model.
In the preferred version of the inventive method, step 1) in actual measurement traffic flow data comprise speed, occupation rate and the flow three class data of the detection section upstream and downstream that detecting device detects in each sampling period.
In the preferred version of the inventive method, step 1) in maximum-minimum specification method for actual measurement traffic flow data being processed according to following formula:
x ij ‾ = x ij - x min j x max j - x min j ,
In formula, raw data x ijvalue after standardization processing; x ijrepresent j property value of i sample; x maxjand x minjbe respectively maximal value and the minimum value of attribute j; J=1,2 ..., 6, corresponding upstream speed, velocity of downstream, upstream occupation rate, downstream occupation rate, upstream flowrate, downstream flow totally 6 attributes of representing respectively.
In the preferred version of the inventive method, step 2) method idiographic flow be: to the sample x in training set i, find three neighbours nearest with it, comparative sample x iclassification and nearest three neighbours' classification, if x ibe most classes, and have two or three to be minority class sample in its three neighbours, in training set, remove sample x i, otherwise not to x ido any processing, continue to find the next sample in training set; If x ibe minority class, and in its three neighbours, to have two or three be most class samples, in training set, remove the most class samples in these three neighbours, otherwise not to x ido any processing, continue to find the next sample in training set; Wherein i is the sample sequence number in training set, i=1,2 ..., n, n is the total sample number in training set.
In the preferred version of the inventive method, step 3) idiographic flow be:
First allow penalty factor and nuclear parameter g at C=[2 -10, 2 10], g=[2 -10, 2 10] scope in the variation of 1.0 step-length, find penalty factor and the nuclear parameter g of corresponding maximum classification accuracy rate by cross validation, determine the optimum valuing range of penalty factor and nuclear parameter g with this;
Then at C=[2 -10, 2 0], g=[2 0, 2 10] scope in the variation of 0.5 step-length, by cross validation, in the optimum valuing range of penalty factor and nuclear parameter g, find the best value of penalty factor and nuclear parameter g.
In the preferred version of the inventive method, step 5) in, if be-1 towards the Output rusults of the automatic detection model of traffic events of unbalanced data collection, represent that the traffic circulation state in detection zone is now normal, otherwise represent to occur traffic events.
The inventive method is based on neighborhood cleaning rule, a kind of new methods of sampling of owing is proposed, owe sample process to the most classes in training set, reduce in training set the unbalancedness of number of samples between class, Training Support Vector Machines carries out traffic events as sorter and automatically detects on this basis.The method can be used for traffic events to carry out real time automatic detection, can make up existing Algorithm for Traffic Incidents Detection and be not suitable with the defect of unbalanced traffic data in real world.
Beneficial effect: the present invention compared with prior art, has the following advantages:
The Algorithm for Traffic Incidents Detection generally adopting at present, major part is the algorithm of classifying based on equilibrium criterion collection, is not suitable with the unbalanced traffic data in real world.Some algorithms carry out oversampling by the minority class in training set and increase minority class sample, reduce the unbalancedness of training set, but the minority class sample increasing may cause the information redundancy of minority class sample, brings problem concerning study.Some algorithms are by the most classes in training set being owed at random to sampling, and transport solution data nonbalance problem, has blindness and limitation but owe at random the most class samples of sampling removal, lacks the consideration to noise sample and boundary sample.This rare phenomenon compared with normal traffic data for event data in reality, the inventive method proposes a kind of new methods of sampling of owing based on neighborhood cleaning rule, owe sample process to the most classes in training set, reduce the unbalancedness of training set, overcome existing Algorithm for Traffic Incidents Detection and be not suitable for the defect of unbalanced traffic data in real world; The methods of sampling of owing proposing is removed most class samples by finding arest neighbors, has increased the probability that retains internal security sample, improves the quality of data cleansing, has reduced the impact of noise sample on minority class classification performance; Use support vector machine as sorter, and adopt improved grid search algorithm optimization penalty factor and nuclear parameter g, improved category of model performance; The present invention carries out traffic events and automatically detects in conjunction with a kind of the owe methods of sampling and support vector machine based on neighborhood cleaning rule, has improved event verification and measurement ratio, has shortened the average detected time, meets the requirement of real-time that traffic events detects, and is easy to realize.
Brief description of the drawings
Fig. 1 the present invention is based on to owe the process flow diagram of sampling towards the traffic event automatic detection method of unbalanced data collection;
Fig. 2 is the actual measurement traffic flow data sampling schematic diagram that the present invention adopts;
The curve map that when Fig. 3 (a) and Fig. 3 (b) are Support Vector Machines Optimized penalty factor and nuclear parameter g, classification accuracy rate changes with C and g, wherein Fig. 3 (a) is different with hunting zone and the step-length of C in Fig. 3 (b) and g.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment, the technical program is described further.Should understand these embodiment and only be not used in and limit the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the amendment of various equivalents of the present invention.
The traffic event automatic detection method based on owing to sample towards unbalanced data collection that the present invention proposes, its process flow diagram is shown in accompanying drawing 1, mainly comprises the steps:
1) utilize maximum-minimum specification method to carry out standardization processing to actual measurement traffic flow data, obtain original training set and test set.
The present invention is based on and owe sampling and all adopt I-880 actual measurement traffic flow data towards the training and testing of the traffic events automatic detection algorithm of unbalanced data collection, accompanying drawing 2 is shown in by traffic flow data sampling schematic diagram.I-880 data are collected the traffic flow data composition that gathers the period (16 days-March 19 February in 1993 and 27 days-October 29 September in 1993) northwards by 35 groups of detecting devices (trackside that travels has been installed 18 groups, and the trackside that travels has southwards been installed 17 groups).Traffic flow data comprises the magnitude of traffic flow, occupation rate, speed three class data, and data acquisition time is spaced apart 30s.
Choose at random February 16 North and South direction each 2 pairs the normal traffic flow data of totally 4 pairs of adjacent detector (totally 5272 groups) and 8 traffic events (totally 762 groups) as original training set; Choose at random the each 2 pairs of each 2 pairs of totally 4 pairs of detecting devices of the totally 4 pairs of detecting devices, February 18 North and South direction of North and South direction on February 17, add up to the normal traffic flow data (totally 10320 groups) of 8 pairs of detecting devices and 35 events (totally 3167 groups) as test set.
Test set and training set are merged into an entirety, utilize maximum-minimum specification method to carry out data normalization processing, concrete grammar is as follows:
x ij ‾ = x ij - x min j x max j - x min j \*MERGEFORMAT(1)
In formula, raw data x ijvalue after standardization processing; x ijrepresent j property value of i sample; x maxjand x minjfor maximal value and the minimum value of attribute j; J=1,2 ..., 6 represent upstream and downstream speed, occupation rate, flow totally 6 attributes.
2) owe sample process based on neighborhood cleaning rule to the most classes in training set, reduce the unbalancedness of training set, obtain the training set of new relative equilibrium.
For the internal security sample in reservation training set as much as possible, only reject the most class samples in training set, the present invention uses for reference the thought of neighborhood cleaning rule, proposes a kind of new methods of sampling of owing to be: to the sample x in training set i, find three neighbours nearest with it, comparative sample x iclassification and described three nearest neighbours' classification, if x ibe most classes, and have two or three to be minority class sample in its three neighbours, in training set, remove sample x i, otherwise not to x ido any processing, continue to find the next sample in training set; If x ibe minority class, and in its three neighbours, to have two or three be most class samples, in training set, remove the most class samples in these three neighbours, otherwise not to x ido any processing, continue to find the next sample in training set, wherein i is the sample sequence number in training set, i=1,2 ..., n, n is the total sample number in training set.
This methods of sampling of owing utilizes arest neighbors thought to remove the most class samples in training set, and its method of finding arest neighbors is taking sample Euclidean distance between any two as standard, and the computing method of distance are as follows:
d ( x a , x b ) = ( Σ j = 1 n ( x aj - x bj ) 2 ) 1 2 \*MERGEFORMAT(2)
In formula, d (x a, x b) represent the Euclidean distance between two samples; x ajrepresent j property value of a sample; x bjrepresent j property value of b sample, be the data after standardization processing.
Original training set is owed after sample process, and the ratio that in training set, event sample accounts for total sample is increased to 33.63% by 12.63%, and embodiment of the present invention traffic data pattern representation used is in table 1:
Table 1 embodiment of the present invention traffic data pattern representation used
3) based on step 1) the original training set that obtains, the kernel function of support vector machine adopts radial basis function, adopt penalty factor and the nuclear parameter g of improved grid search algorithm optimization support vector machine, the optimum value of the optimum value of supported vector machine penalty factor and nuclear parameter g.
Traffic flow data is high dimensional nonlinear, need to raw data be mapped to by nuclear technology to the feature space of higher-dimension, solves linear classification problem in high-dimensional feature space.Radial basis function (RadialBasis Function, RBF) is most widely used SVM kernel function, and relatively stable compared with the performance of other types kernel function, the present invention selects the kernel function of radial basis function as SVM.
Adopt the penalty factor of improved grid search algorithm optimization support vector machine and the concrete grammar of nuclear parameter g to be:
First allow penalty factor and nuclear parameter g (C=[2 within the specific limits -10, 2 10], g=[2 -10, 2 10]) with the variation of 1.0 step-length, find penalty factor and the nuclear parameter g of corresponding maximum classification accuracy rate by cross validation, determine the optimum valuing range of penalty factor and nuclear parameter g with this.As shown in accompanying drawing 3 (a), corresponding maximum classification accuracy rate 100% has obtained a classification rate equal pitch contour, this curve correspondence the combination of a series of penalty factor and nuclear parameter g, consider that higher penalty factor can cause problem concerning study, select less that group of C as optimum value.3 (a) are known with reference to the accompanying drawings, and the optimum valuing range of penalty factor and nuclear parameter g is in C=[2 -6, 2 0], g=[2 0, 2 6] scope in;
Then according to the optimum valuing range (C=[2 of fixed penalty factor and nuclear parameter g -6, 2 0], g=[2 0, 2 6]), and be positioned at hunting zone in order to ensure the best value of penalty factor and nuclear parameter g, and suitably expand above-mentioned optimum valuing range, getting hunting zone is C=[2 -10, 2 0], g=[2 0, 2 10], reduce step-size in search to 0.5 according to new hunting zone, in the optimum valuing range of above-mentioned penalty factor and nuclear parameter g, find the best value of penalty factor and nuclear parameter g by cross validation.As shown in accompanying drawing 3 (b), maximum classification accuracy rate is 100%, and to having obtained a classification rate equal pitch contour by classification accuracy rate, this curve correspondence the various combination of a series of penalty factor and nuclear parameter g.Although very high penalty factor can make the accuracy rate of cross validation improve, larger C tends to cause problem concerning study, therefore select that group of penalty factor minimum as optimum value.
Finding the corresponding maximum penalty factor of classification accuracy rate and the concrete grammar of nuclear parameter g to be by cross validation: by described step 1) the original training set that obtains is divided into two groups at random, one group as training set, one group as test set, utilize training set training classifier, then utilize test set verification model, record the performance index that corresponding classification accuracy rate is this sorter.
The curve map that while finding the penalty factor of support vector machine and the optimum value of nuclear parameter g, classification accuracy rate changes with C and g is shown in accompanying drawing 3 (a) and accompanying drawing 3 (b).Horizontal ordinate in accompanying drawing 3 (a) and accompanying drawing 3 (b) represents that it is the logarithm value at the end that penalty factor is got to 2, ordinate represents that it is the logarithm value at the end that nuclear parameter g is got to 2, in figure, curve represents classification accuracy rate level line, classification accuracy rate corresponding to digitized representation on curve.
Accompanying drawing 3 (a) is different with hunting zone and the step-length of penalty factor in accompanying drawing 3 (b) and nuclear parameter g, and in accompanying drawing 3 (a), the variation range of C and g is: C=[2 -10, 2 10], g=[2 -10, 2 10], step-size in search gets 1.0; In accompanying drawing 3 (b), the variation range of C and g is: C=[2 -10, 2 0], g=[2 0, 2 10], step-size in search gets 0.5.
Be respectively by the penalty factor of the supported vector machine of this optimization and the optimum value of nuclear parameter g: C=0.022097, g=16.
4) according to step 3) optimum value of support vector machine penalty factor and the optimum value of nuclear parameter g that obtain, use step 2) the training set Training Support Vector Machines of the new relative equilibrium that obtains, obtain the automatic detection model of a kind of traffic events towards unbalanced data collection.
Use the training set training SVM of the relative equilibrium that obtained, the vector that it is input as one 6 dimension, comprises speed, occupation rate and flow totally 6 attributes of the detection section upstream and downstream that detecting device detects in the t moment.Its output is state flag bit, and in described state flag bit, 1 represents traffic events state, and-1 represents normal traffic states.
5) using the automatic detection model of the traffic events towards unbalanced data collection that trains, to described step 1) test set that obtains carries out traffic events and automatically detects, and determines whether generation traffic events according to the Output rusults of model.If the Output rusults towards the support vector machine of unbalanced data collection is-1, represents that the traffic circulation state in detection zone is now normal, otherwise represent to occur traffic events.
For the validity that the methods of sampling detects automatically for traffic events of owing based on neighborhood cleaning rule is described, design one group of contrast experiment, use respectively original training set and through owing the training set Training Support Vector Machines of the new relative equilibrium that sample process obtains, and use same test set to contrast the detection performance of two event detectors.
Select following 4 evaluation indexes: verification and measurement ratio DR (DetectionRate), rate of false alarm FAR (False AlarmRate), average detected time MTTD (MeanTimeToDetect) and correct classification rate CR (ClassificationRate).The computing formula of each index is as follows:
MTTD = 1 n Σ i = 1 n [ TI ( i ) - AI ( i ) ] - - - ( 5 )
In formula, the actual time of origin that TI (i) is the event i that detected by AID algorithm; AI (i) detects time of event i for AID algorithm; N is the event number that AID algorithm detects.
Use same test set, to being trained the event detector obtaining by original training set and training the event detector obtaining to carry out respectively performance test by the training set of relative equilibrium, the results are shown in following table 2.
The testing result of table 2 different event detecting device
Training set DR/% FAR/% MTTD/min CR/%
Original training set 85.71 0.48 3.30 92.29
Owe sample process 91.43 1.74 0.52 95.15
As shown in Table 2, svm classifier device through the training set training of owing the relative equilibrium that methods of sampling processing obtains based on neighborhood cleaning rule is more good to the classification performance of same test set, verification and measurement ratio DR is increased to 91.43% by 85.71%, the average detected time, MTTD was reduced to 0.52min by 3.30min, and classification accuracy rate is increased to 95.15% by 92.29%.But rate of false alarm FAR is increased to 1.74% by 0.48%, this may be because some most classes of training set being owed to sample process removal have been lost part useful information, make in the time that most class samples are differentiated, some normal variation of traffic parameter are mistaken for generation traffic events, cause the classification accuracy rate of most class samples to reduce, large thereby rate of false alarm becomes.
The present invention proposes a kind of based on owing the traffic event automatic detection method of sampling towards unbalanced data collection.By owing methods of sampling reconstruct training set based on neighborhood cleaning rule, the unbalancedness of number of samples between class in reduction training set, DR, FAR, these three indexs of MTTD are all improved, and have overcome existing AID algorithm and be not suitable for the defect of unbalanced traffic data in real world.Verification and measurement ratio increases, and has greatly shortened detection time simultaneously, meets the requirement of real-time that traffic events detects, and can be applicable to road traffic accident and automatically detect.

Claims (6)

1. based on owing the traffic event automatic detection method of sampling towards unbalanced data collection, it is characterized in that, the method comprises the steps:
1) utilize maximum-minimum specification method to carry out standardization processing to actual measurement traffic flow data, obtain original training set and test set;
2) based on neighborhood cleaning rule to described step 1) most classes in the original training set that obtains owe sample process, reduce the unbalancedness of training set, obtain the training set of new relative equilibrium;
3) based on step 1) the original training set that obtains, the kernel function of support vector machine adopts radial basis function, adopt penalty factor and the nuclear parameter g of improved grid search algorithm optimization support vector machine, the optimum value of the optimum value of supported vector machine penalty factor and nuclear parameter g;
4) according to described step 3) optimum value of support vector machine penalty factor and the optimum value of nuclear parameter g that obtain, use described step 2) the training set Training Support Vector Machines of the new relative equilibrium that obtains, obtain the automatic detection model of traffic events towards unbalanced data collection;
5) using the automatic detection model of the traffic events towards unbalanced data collection that trains, to described step 1) test set that obtains carries out traffic events and automatically detects, and determines whether generation traffic events according to the Output rusults of model.
2. according to claim 1 based on owing sampling towards the traffic event automatic detection method of unbalanced data collection, it is characterized in that: described step 1) in actual measurement traffic flow data comprise speed, occupation rate and the flow three class data of the detection section upstream and downstream that detecting device detects in each sampling period.
3. according to claim 1 based on owing sampling towards the traffic event automatic detection method of unbalanced data collection, it is characterized in that: described step 1) in maximum-minimum specification method for actual measurement traffic flow data being processed according to following formula:
x ij ‾ = x ij - x min j x max j - x min j ,
In formula, raw data x ijvalue after standardization processing; x ijrepresent j property value of i sample; x maxjand x minjbe respectively maximal value and the minimum value of attribute j; J=1,2 ..., 6, corresponding upstream speed, velocity of downstream, upstream occupation rate, downstream occupation rate, upstream flowrate, downstream flow totally 6 attributes of representing respectively.
4. according to claim 1 based on owing sampling towards the traffic event automatic detection method of unbalanced data collection, it is characterized in that described step 2) method idiographic flow be: to the sample x in training set i, find three neighbours nearest with it, comparative sample x iclassification and described three nearest neighbours' classification, if x ibe most classes, and have two or three to be minority class sample in its three neighbours, in training set, remove sample x i, otherwise not to x ido any processing, continue to find the next sample in training set; If x ibe minority class, and in its three neighbours, to have two or three be most class samples, in training set, remove the most class samples in these three neighbours, otherwise not to x ido any processing, continue to find the next sample in training set; Wherein i is the sample sequence number in training set, i=1,2 ..., n, n is the total sample number in training set.
According to described in claim 1,2,3 or 4 based on owing sampling towards the traffic event automatic detection method of unbalanced data collection, it is characterized in that: described step 3) idiographic flow be:
First allow penalty factor and nuclear parameter g at C=[2 -10, 2 10], g=[2 -10, 2 10] scope in the variation of 1.0 step-length, find penalty factor and the nuclear parameter g of corresponding maximum classification accuracy rate by cross validation, determine the optimum valuing range of penalty factor and nuclear parameter g with this;
Then at C=[2 -10, 2 0], g=[2 0, 2 10] scope in the variation of 0.5 step-length, by cross validation, in the optimum valuing range of described penalty factor and nuclear parameter g, find the best value of penalty factor and nuclear parameter g.
6. according to the traffic event automatic detection method based on owing to sample towards unbalanced data collection described in claim 1,2,3 or 4, it is characterized in that: described step 5) in, if the Output rusults towards the automatic detection model of traffic events of unbalanced data collection is-1, represent that the traffic circulation state in detection zone is now normal, otherwise represent to occur traffic events.
CN201410177414.1A 2014-04-29 2014-04-29 Automatic incident detection method based on under-sampling and used for unbalanced data set Pending CN103927874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410177414.1A CN103927874A (en) 2014-04-29 2014-04-29 Automatic incident detection method based on under-sampling and used for unbalanced data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410177414.1A CN103927874A (en) 2014-04-29 2014-04-29 Automatic incident detection method based on under-sampling and used for unbalanced data set

Publications (1)

Publication Number Publication Date
CN103927874A true CN103927874A (en) 2014-07-16

Family

ID=51146083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410177414.1A Pending CN103927874A (en) 2014-04-29 2014-04-29 Automatic incident detection method based on under-sampling and used for unbalanced data set

Country Status (1)

Country Link
CN (1) CN103927874A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239516A (en) * 2014-09-17 2014-12-24 南京大学 Unbalanced data classification method
CN105975992A (en) * 2016-05-18 2016-09-28 天津大学 Unbalanced data classification method based on adaptive upsampling
CN106056130A (en) * 2016-05-18 2016-10-26 天津大学 Combined downsampling linear discrimination classification method for unbalanced data sets
CN106372655A (en) * 2016-08-26 2017-02-01 南京邮电大学 Synthetic method for minority class samples in non-balanced IPTV data set
CN106933805A (en) * 2017-03-14 2017-07-07 陈飞 The recognition methods of biological event trigger word in a kind of large data sets
CN107563453A (en) * 2017-09-19 2018-01-09 马上消费金融股份有限公司 A kind of uneven sample data sorting technique and system
CN111860638A (en) * 2020-07-17 2020-10-30 湖南大学 Parallel intrusion detection method and system based on unbalanced data deep belief network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0155317B1 (en) * 1995-11-03 1998-12-15 양승택 Communication system for its
US20030097217A1 (en) * 2001-05-07 2003-05-22 Wells Charles Hilliary AVL software specifications
CN101075377A (en) * 2007-05-30 2007-11-21 东南大学 Method for automatically inspecting highway traffic event based on offset minimum binary theory
CN101271625A (en) * 2008-04-03 2008-09-24 东南大学 Method for detecting freeway traffic event by integration supporting vector machine
CN102682601A (en) * 2012-05-04 2012-09-19 南京大学 Expressway traffic incident detection method based on optimized support vector machine (SVM)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0155317B1 (en) * 1995-11-03 1998-12-15 양승택 Communication system for its
US20030097217A1 (en) * 2001-05-07 2003-05-22 Wells Charles Hilliary AVL software specifications
CN101075377A (en) * 2007-05-30 2007-11-21 东南大学 Method for automatically inspecting highway traffic event based on offset minimum binary theory
CN101271625A (en) * 2008-04-03 2008-09-24 东南大学 Method for detecting freeway traffic event by integration supporting vector machine
CN102682601A (en) * 2012-05-04 2012-09-19 南京大学 Expressway traffic incident detection method based on optimized support vector machine (SVM)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙晓燕,张化祥,计华: "基于AdaBoost的欠抽样集成学习算法", 《山东大学学报》 *
赵自翔 等: "基于支持向量机的不平衡数据分类的改进欠采样方法", 《中山大学学报(自然科学版)》 *
郑文昌,陈淑燕,王宣强: "面向不平衡数据集的SMOTE-SVM交通事件检测算法", 《武汉理工大学学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239516A (en) * 2014-09-17 2014-12-24 南京大学 Unbalanced data classification method
CN105975992A (en) * 2016-05-18 2016-09-28 天津大学 Unbalanced data classification method based on adaptive upsampling
CN106056130A (en) * 2016-05-18 2016-10-26 天津大学 Combined downsampling linear discrimination classification method for unbalanced data sets
CN106372655A (en) * 2016-08-26 2017-02-01 南京邮电大学 Synthetic method for minority class samples in non-balanced IPTV data set
CN106933805A (en) * 2017-03-14 2017-07-07 陈飞 The recognition methods of biological event trigger word in a kind of large data sets
CN106933805B (en) * 2017-03-14 2020-04-28 陈一飞 Method for identifying biological event trigger words in big data set
CN107563453A (en) * 2017-09-19 2018-01-09 马上消费金融股份有限公司 A kind of uneven sample data sorting technique and system
CN107563453B (en) * 2017-09-19 2018-07-06 马上消费金融股份有限公司 A kind of imbalance sample data sorting technique and system
CN111860638A (en) * 2020-07-17 2020-10-30 湖南大学 Parallel intrusion detection method and system based on unbalanced data deep belief network
WO2022012144A1 (en) * 2020-07-17 2022-01-20 湖南大学 Parallel intrusion detection method and system based on unbalanced data deep belief network
CN111860638B (en) * 2020-07-17 2022-06-28 湖南大学 Parallel intrusion detection method and system based on unbalanced data deep belief network

Similar Documents

Publication Publication Date Title
CN103927874A (en) Automatic incident detection method based on under-sampling and used for unbalanced data set
CN109146705B (en) Method for detecting electricity stealing by using electricity characteristic index dimension reduction and extreme learning machine algorithm
CN102915447B (en) Binary tree-based SVM (support vector machine) classification method
CN104504901B (en) A kind of traffic abnormity point detecting method based on multidimensional data
CN103593973B (en) A kind of urban road traffic situation assessment system
CN102841131B (en) Intelligent steel cord conveyer belt defect identification method and intelligent steel cord conveyer belt defect identification system
CN102254177B (en) Bearing fault detection method for unbalanced data SVM (support vector machine)
CN101271625A (en) Method for detecting freeway traffic event by integration supporting vector machine
CN111159243B (en) User type identification method, device, equipment and storage medium
CN102592451B (en) Method for detecting road traffic incident based on double-section annular coil detector
CN102682601A (en) Expressway traffic incident detection method based on optimized support vector machine (SVM)
CN110259648B (en) Fan blade fault diagnosis method based on optimized K-means clustering
CN108765961B (en) Floating car data processing method based on improved amplitude limiting average filtering
CN104318772B (en) Freeway traffic flow data quality checking method
CN100481153C (en) Method for automatically inspecting highway traffic event based on offset minimum binary theory
CN103488800A (en) SVM (Support Vector Machine)-based power consumption abnormality detection method
CN104269057A (en) Bayonet sensor layout method based on floating car OD data
CN103593470A (en) Double-degree integrated unbalanced data stream classification algorithm
CN115691120A (en) Congestion identification method and system based on highway running water data
CN103603794A (en) Method and device for adaptive fault diagnosis of gas storage injection-production compressor unit
CN113236508B (en) Method for detecting wind speed-power abnormal data of wind generating set
CN103778782A (en) Traffic state partitioning method based on semi-supervised machine learning
CN104537392A (en) Object detection method based on distinguishing semantic component learning
CN101957941A (en) The method of discerning the problem of showing especially based on the fusion conspicuousness and the susceptibility of time trend
CN101694747B (en) Method and device for indentifying abnormal vehicle speed

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140716