CN109145953B - Adaboost algorithm-based traffic high-risk personnel identification method - Google Patents

Adaboost algorithm-based traffic high-risk personnel identification method Download PDF

Info

Publication number
CN109145953B
CN109145953B CN201810815618.1A CN201810815618A CN109145953B CN 109145953 B CN109145953 B CN 109145953B CN 201810815618 A CN201810815618 A CN 201810815618A CN 109145953 B CN109145953 B CN 109145953B
Authority
CN
China
Prior art keywords
data
personnel
illegal
risk
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810815618.1A
Other languages
Chinese (zh)
Other versions
CN109145953A (en
Inventor
吕伟韬
刘林
陈凝
饶欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Zhitong Traffic Technology Co ltd
Original Assignee
Jiangsu Zhitong Traffic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Zhitong Traffic Technology Co ltd filed Critical Jiangsu Zhitong Traffic Technology Co ltd
Priority to CN201810815618.1A priority Critical patent/CN109145953B/en
Publication of CN109145953A publication Critical patent/CN109145953A/en
Application granted granted Critical
Publication of CN109145953B publication Critical patent/CN109145953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/017Detecting movement of traffic to be counted or controlled identifying vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a traffic high-risk personnel identification method based on an Adaboost algorithm, which is characterized in that based on original traffic violation data and accident data, the Adaboost algorithm is adopted to train and correct a high-risk personnel identification model, and personnel violation attribute information is input into the model, so that the identification and prediction of high-risk personnel can be realized, and the method has practical significance in the aspects of improving the traffic safety control work efficiency, assisting the daily safety management work of traffic polices, and the like.

Description

Adaboost algorithm-based traffic high-risk personnel identification method
Technical Field
The invention relates to a method for identifying high-risk traffic personnel based on Adaboost algorithm.
Background
In the field of road traffic safety, research is mostly focused on analysis of association laws between external factors such as environment, road infrastructure, traffic flow running state and the like and traffic accidents, for example, chinese patents cn201710400521.x, CN201580075213.3, CN201611051192.4 and the like, or analysis of traffic accident laws and characteristics from the aspect of characteristics such as environment and traffic control measures. The internal factors such as behavior habits of traffic participants (motor vehicles, non-motor vehicle drivers and pedestrians) lack deep research and analysis at present due to the problems of wide information dimension, limited information perception means and the like, but the influence of human factors on traffic accidents is inevitable content of traffic safety research, and the method has great practical guiding significance on traffic safety management.
Research shows that the traffic violation and the traffic accident have a correlation; considering that a large amount of illegal data are accumulated in the current traffic control industry, reliable data support can be provided for feature mining of traffic accidents.
The AdaBoost algorithm utilizes the same weak classifier, distributes different weight parameters based on the error rate of the classifier, and takes the prediction result of accumulated weight as output. Adaboost provides a framework within which sub-classifiers can be constructed using various methods without screening features and without over-fitting. The method has good performance advantages when applied to data classification, can be used for mining valuable traffic safety information when applied to the processing of traffic violation data, but is lack of the application at present.
The invention takes the behavior characteristic mining of the traffic participants as a core, extracts the illegal driving behavior attribute of accident-related personnel, identifies high-risk personnel and realizes data-driven active traffic safety prevention.
Disclosure of Invention
The invention aims to realize data mining based on Adaboost, so that dangerous persons possibly suffering from traffic accidents are identified among traffic participants with traffic violation records, the effect of traffic safety risk prediction and evaluation of the persons is achieved, and scientific index basis for assisting decision making is provided in source management, field inspection and other works in traffic safety management application.
Based on original traffic violation data and accident data, the Adaboost algorithm is adopted to train and correct the classification model of the high-risk personnel, and violation attribute information is input into the model, so that the identification and prediction of the high-risk personnel can be realized, and the method has practical significance in improving the working efficiency of traffic safety control, assisting the daily safety management work of traffic polices, and the like.
The technical solution of the invention is as follows:
a method for identifying high-risk traffic personnel based on Adaboost algorithm comprises the following steps,
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data;
s2, classifying the illegal data set into two categories, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified;
s3, sampling a general personnel data subset N in the illegal data set, combining with a high-risk personnel data subset D, and splitting to obtain a training set and a test set;
s4, training a high-risk personnel recognition model by using training set data based on an Adaboost algorithm, and determining model parameters; the model parameters comprise a learning rate, the number of weak classifiers, a maximum tree depth, a node minimum splitting value, a leaf node minimum sample number and a maximum feature number;
s5, carrying out high-risk personnel identification model evaluation on the test set data, determining a classification probability critical threshold value, and correcting the model to obtain a final traffic high-risk personnel identification model;
and S6, inputting the data of the subset to be identified in the step S2 into the identification model of the high-risk traffic personnel obtained in the step S5, and obtaining the identification result of the high-risk personnel.
Further, the sampling method described in step S3 is specifically,
s31, randomly sampling the general personnel data subset to obtain a compressed general personnel sample N';
s32, performing variable processing and screening on the sample data of the compressed general personnel data subset N';
s33, splitting a high-risk personnel data subset D and a collection G of N' into a training set and a testing set;
s34, SMOTE sampling is carried out on the training set, the sample expansion and contraction proportion of the high-risk personnel data subset and the general personnel data subset is determined, the final sample number is obtained, and the training set sample is obtained after processing.
Further, the sample data variable processing and screening method in step S32 specifically includes:
s321, setting a dependent variable target, wherein the numerical value of the dependent variable target is determined according to a sample data label, and is selected from high-risk and general values; taking a data field of the illegal data set as an independent variable;
s322, deleting the constant independent variable and the independent variable with extremely small variance in the independent variables; the judgment condition that the variance is extremely small is as follows:
Figure BDA0001740236940000021
wherein freqcutX=xf/xlXf is the sample value with the maximum frequency of the variable X, xl is the sample value with the maximum frequency of the variable X, and Tf is the corresponding threshold value, which is usually 19; unisequentialX=mX/nXWhere mX is the number of samples after sample value deduplication, nX is the total amount of samples, and Tu is the inspection threshold of unisequential, and the value is usually 0.1;
s323, deleting the independent variable which is more than a threshold value in collinearity with other independent variables; wherein the threshold value is typically 0.75;
s324, checking the multiple collinearity of the independent variables, and determining the data independent variables.
Further, the method for assigning the corresponding data label value label based on the classification rule in step S2 specifically includes:
high-risk personnel: one category is traffic participants who have illegal records and have serious traffic accident records with major responsibility or all responsibility; the other type is that illegal records exist, only slight accident records exist, and the accident records are not less than 2 traffic participants;
the average person: traffic participants who have illegal records but no records of accidents;
the data which do not satisfy the above-mentioned discrimination condition constitute a subset to be recognized.
Further, the original traffic violation data and accident data in step S1 include the certificate information of the relevant person; collecting and classifying illegal records to obtain an illegal data set; the illegal data set records full sample data for the illegal, and the information of the illegal data set comprises personnel certificate numbers, illegal times, illegal types, punishment conditions, accident-related illegal behavior occurrence conditions and illegal occurrence time intervals.
Further, in step S1, the occurrence condition of the accident-related illegal activity is obtained by a corresponding analysis method, and the type of the violation with a high degree of influence of the traffic accident is extracted as the data attribute of the illegal data set.
Further, in step S1, the illegal occurrence time interval is obtained by converting a time continuous variable into a discrete variable and classifying the discrete variable according to the illegal time characteristics.
The invention has the beneficial effects that:
the invention provides a high-risk personnel identification method based on AdaBoost algorithm based on the relevance of traffic violation and traffic accidents, and achieves the effect of predicting the traffic safety risk of personnel by relying on traffic violation records. The method adopts a safety risk label determination method with strong implementation, and can be flexibly adjusted according to regional traffic regulation and safety degree in practical application and sensitivity required by a model.
The AdaBoost algorithm is adopted to fit the high-risk personnel identification model, the weak classifiers are well utilized for cascading, compared with the common integrated algorithm, the method has the advantages of low generalization error rate and high precision, the personnel classification requirement based on illegal data can be met, and the identification accuracy of the high-risk personnel is ensured.
And thirdly, compressing the large sample before SMOTE sampling, so that the problem that the accuracy of the model is influenced by the unbalanced data set can be relieved to a certain extent.
And fourthly, preprocessing the original data by adopting a characteristic engineering method, so that the accuracy of the model is improved.
Drawings
Fig. 1 is a schematic flow chart of a traffic high-risk person identification method based on the Adaboost algorithm in the embodiment of the present invention.
Fig. 2 is a schematic flow chart of sampling a general person data subset in the embodiment.
Fig. 3 is a schematic flowchart of a sample data variable processing and screening method in the embodiment.
FIG. 4 is an explanatory diagram of the data set in the embodiment.
FIG. 5 is a diagram illustrating attribute variables of the first 20 bits of importance in the embodiment.
FIG. 6 is a schematic representation of a test set ROC curve plotted for the examples.
FIG. 7 is a diagram of a test set PR curve plotted according to an embodiment.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A traffic high-risk personnel identification method based on an Adaboost algorithm extracts the safety behavior characteristic attributes of traffic participants from traffic violation records and trains a model to realize high-risk personnel identification and safety risk prediction; as shown in fig. 1, the specific process flow is as follows:
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data.
In an embodiment, the original traffic violation data and accident data in step S1 include the certificate information of the relevant person; preprocessing operations such as collection and classification are carried out on the original illegal records to obtain an illegal data set; the law violation data set is full sample data of law violation records of personnel, and the data set information comprises personnel certificate numbers, violation times, violation types, punishment conditions, accident-related law violation behavior occurrence conditions and violation occurrence time intervals.
The occurrence condition of the accident-related illegal activity in the step S1 is obtained through a corresponding analysis mode, and the illegal type with a high degree of influence of the traffic accident is extracted as the data attribute of the illegal data set.
In the illegal occurrence time period in the step S1, the time continuous variable is converted into a discrete variable, and classification is performed according to the illegal time characteristics.
S2, classifying the illegal data set into two categories, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified.
The method for giving the corresponding data label value label based on the classification rule in the S2 specifically includes that the classification rule of the high-risk person is as follows: (1) personnel who have illegal records and have major or total liability for heavy traffic accident records; (2) illegal records exist, only slight accident records exist, and the accident records are not less than 2 persons; the general personnel are personnel with illegal records but no accident records; the data which do not satisfy the above-mentioned discrimination condition constitute a subset to be recognized.
S3, sampling a general personnel data subset N in the illegal data set, combining with a high-risk personnel data subset D, and splitting to obtain a training set and a test set.
And S31, randomly sampling the general staff data subset to obtain a compressed general staff sample N'.
S32, performing variable processing and screening on the sample data of the compressed general personnel data subset N'; the processing steps are as shown in fig. 3, and specifically include:
s321, setting a dependent variable target, wherein the numerical value of the dependent variable target is determined according to a sample data label, and is selected from high-risk and general values; taking a data field of the illegal data set as an independent variable;
s322, deleting the constant independent variable and the independent variable with extremely small variance in the independent variables; the judgment condition that the variance is extremely small is as follows:
Figure BDA0001740236940000051
wherein freqcutX=xf/xlXf is the sample value with the maximum frequency of the variable X, xl is the sample value with the maximum frequency of the variable X, and Tf is the corresponding threshold value, which is usually 19; unisequentialX=mX/nXWhere mX is the number of samples after sample value deduplication, nX is the total amount of samples, and Tu is the inspection threshold of unisequential, and the value is usually 0.1;
s323, deleting the independent variable which is more than a threshold value in collinearity with other independent variables; wherein the threshold value is typically 0.75;
s324, checking the multiple collinearity of the independent variables, and determining the data independent variables.
S33, splitting a high-risk personnel data subset D and a collection G of N' into a training set and a testing set; in the examples, the split ratio was 9: 1.
S34, SMOTE sampling is carried out on the training set, the sample expansion and contraction proportion of the high-risk personnel data subset and the general personnel data subset is determined, the final sample number is obtained, and the training set sample is obtained after processing.
S4, training a high-risk personnel recognition model by using training set data based on an Adaboost algorithm, and determining model parameters; the model parameters comprise a learning rate, the number of weak classifiers, a maximum tree depth, a node minimum splitting value, a leaf node minimum sample number and a maximum feature number; in an embodiment, the Adaboost algorithm is performed using Python to call Adaboost classifier functions and decisiontreelsifier base functions in the skleann machine learning library.
And S5, carrying out high-risk personnel identification model evaluation by using the test set data, determining a classification probability critical threshold value, and correcting the model to obtain a final traffic high-risk personnel identification model.
And S6, inputting the data of the subset to be identified in the step S2 into the identification model of the high-risk traffic personnel obtained in the step S5, and obtaining the identification result of the high-risk personnel.
Specific examples
Step 1, obtaining 2-year traffic violation records and accident records in an area through docking with a database.
The present embodiment takes a driver of a motor vehicle as an analysis target. The traffic accident with death or serious injury or hit-and-run accident is taken as a serious accident, other accidents are taken as slight accidents, the original accident records are classified according to the serious accident or serious injury or hit-and-run accident, the accident type and personnel certificate information are taken as attribute characteristics of a serious accident data set and a slight accident data set, and sample data of the two data sets are obtained.
Further, the original illegal data are preprocessed, and illegal information of personnel is collected and counted, wherein the illegal information comprises accumulated illegal times, illegal types, accumulated deduction scores, average deduction scores (minutes/times), single maximum deduction scores, accumulated fines amount and average fines amount (yuan/times).
The method comprises the steps of performing dimensionality reduction treatment on traffic accident data and illegal original data by adopting a corresponding analysis method, classifying illegal types according to the relevance of the illegal and the type of the accident, and extracting five types with highest relevance as data attributes of an accident risk illegal behavior field, wherein the data attributes are shown in a table 1.
TABLE 1 event-related violation type partitioning
Figure BDA0001740236940000061
According to the traffic flow operation of the road network of the area where the embodiment is located and the characteristics of the occurrence rule of the traffic violation event, aggregating the time, dividing the analysis time period, and converting the continuous variable into the nominal variable; in another embodiment, the time interval division is performed by other statistical means such as clustering.
Extracting age, gender and province and city codes according to the certificate number by the personnel characteristic data; and generating an illegal data set according to the information extracted from each link, as shown in table 2.
TABLE 2. partial data of illegal data set
Figure BDA0001740236940000071
And 2, classifying the full sample I in the illegal data set into two categories of high-risk drivers and common drivers. Referring to fig. 4, in a case where a person who has illegal records and has serious traffic accident records with major responsibility or all responsibility is taken as a high-risk driver, qualified data is classified as a data set D1; taking a person with illegal records, only a slight accident record and no less than 2 accident records as another condition of the high-risk driver, and dividing data meeting conditions into a data set D2; the data set D of the high-risk drivers is D1+ D2. And synthesizing the corresponding data of the personnel with illegal records but no accident records into a general driver data set N.
Accordingly, a high-risk or general data tag value label is determined for the data satisfying the rule in the illegal data set, and if the data subset U that cannot be applied to the classification rule is I-N-D U1+ U2, the data subsets are to be identified, and U1 and U2 are two subsets of the data subsets to be identified, respectively.
Step 3, sampling a general driver data subset N in the illegal data set, combining with a high-risk driver data subset D, and splitting to obtain a training set and a test set; the specific sampling method comprises the following steps:
step 31, randomly sampling the general driver data subset to obtain a compressed general driver sample N', wherein the sampling rate is generally 2.5% -25%, and 4000 pieces of data are extracted from 84383 pieces of data in the embodiment.
Step 32, carrying out variable processing and screening on the sample data of the sampled data subset of the general drivers; the method comprises the following specific steps:
s321, setting a dependent variable target, wherein the numerical value of the dependent variable target is determined according to a sample data label, and is selected from high-risk and general values; taking a data field of the illegal data set as an independent variable; setting the provincial code field and the city code field as dummy variables, and increasing the number of independent variables to 93;
s322, deleting the constant independent variable and the independent variable with extremely small variance in the independent variables; the judgment condition that the variance is extremely small is as follows:
Figure BDA0001740236940000081
wherein freqcutX=xf/xlXf is the sample value with the maximum frequency of the variable X, xl is the sample value with the maximum frequency of the variable X, and Tf is the corresponding threshold value, and the value is 19; unisequentialX=mX/nXWhere mX is the number of samples after sample value deduplication, nX is the total amount of samples, and Tu is the inspection threshold of unisequential, and the value is 0.1; in the embodiment, the link deletes some independent variables of accumulated violation times, type2, type3, type5 and 19: 00-22: 00;
s323, deleting the independent variable which is more than a threshold value in collinearity with other independent variables; wherein the threshold value is typically 0.75; in this embodiment, the link deletes three independent variables of the cumulative score, the average score and other illegal activities;
s324, checking that multiple collinearity does not exist in the residual independent variables, and determining the data independent variables.
S33, splitting a high-risk driver data subset D and a high-risk driver data subset N' into a training set and a testing set; in the example, the sample size ratio of the training set to the test set is 9: 1.
S34, SMOTE sampling is carried out on the training set, the data subset of the high-risk driver and the data subset of the general driver are determined to be in sample expansion and sample contraction proportion, the final sample number is obtained, and the training set sample is obtained after processing. In the embodiment, the number of over-sampling samples of the high-risk driver data subset is 2 times of the original number, and the number of under-sampling samples of the general driver data subset is 2 times of the original number of high-risk driver samples.
And 4, training the model by using an Adaboost algorithm and adopting a 5-fold cross validation method for the training set data. The model parameters are specifically: the learning rate learning _ rate _ value is 0.1, the number of weak classifiers n _ estimators _ value is 500, the maximum tree depth max _ depth _ value is 2, the node minimum split value min _ samples _ split _ value is 2, the leaf node minimum sample number min _ samples _ leaf _ value is 2, and the maximum feature number max _ features _ value is 5. The number mtry of the selected attributes of the nodes in the model is 47, that is, 47 characteristic variables such as age, average fine amount, accumulated fine amount and gender and attribute variables 20 digits before the importance are selected from 93 attribute variables, as shown in fig. 5.
And 5, performing model evaluation by using the test set data, determining a classification probability critical threshold value, and correcting the model.
Specifically, firstly, inputting test set data into the model trained in the step 4, and obtaining a test sample classification class Fit _ class and a probability Fit _ probs thereof through model processing; secondly, drawing an ROC curve (figure 6) and a PR curve (figure 7) of the test set, and determining the accuracy and the recall rate; and determining a classification probability threshold according to the recall rate, wherein in the embodiment, the model accuracy is 0.8, the recall rate is 0.403, and the judgment probability threshold of corresponding high-risk personnel and general personnel is 0.736.
And 6, inputting the data of the subset U to be identified obtained in the step 2 into a model based on the high-risk personnel identification model trained in the step, predicting a target value through the model, and partially judging results are shown in a table 3.
TABLE 3 identification results of high-risk personnel using the method of the invention
Figure BDA0001740236940000091
Figure BDA0001740236940000101

Claims (4)

1. A traffic high-risk personnel identification method based on an Adaboost algorithm is characterized by comprising the following steps: judging the traffic accident risk according to the illegal attribute of the road traffic participant, comprising the following steps,
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data;
s2, classifying the illegal data set into two categories, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified; the method for assigning the corresponding data label value label based on the classification rule in step S2 specifically includes:
high-risk personnel: one category is traffic participants who have illegal records and have serious traffic accident records with major responsibility or all responsibility; the other type is that illegal records exist, only slight accident records exist, and the accident records are not less than 2 traffic participants;
the average person: traffic participants who have illegal records but no records of accidents;
the data which do not meet the judgment condition form a subset to be identified;
s3, sampling a general personnel data subset N in the illegal data set, combining with a high-risk personnel data subset D, and splitting to obtain a training set and a test set; in particular to a method for preparing a high-performance nano-silver alloy,
s31, randomly sampling the general personnel data subset to obtain a compressed general personnel sample N';
s32, performing variable processing and screening on the sample data of the compressed general personnel data subset N'; the method specifically comprises the following steps:
s321, setting a dependent variable target, wherein the numerical value of the dependent variable target is determined according to a sample data label, and is selected from high-risk and general values; taking a data field of the illegal data set as an independent variable;
s322, deleting the constant independent variable and the minimum variance in the independent variablesAn independent variable of (d); the judgment condition that the variance is extremely small is as follows:
Figure FDA0003175617010000011
wherein freqcutX=xf/xl,xfFor the sample value, X, of the variable X having the greatest frequencylFor sample values of variable X of greater frequency, TfIs a corresponding threshold; unisequentialX=mX/nX,mXNumber of samples after de-duplication of sample values, nXIs the total amount of the sample, TuA test threshold of unisequential;
s323, deleting the independent variable which is more than a threshold value in collinearity with other independent variables;
s324, checking the multiple collinearity of the independent variables, and determining the independent variables of the data;
s33, splitting a high-risk personnel data subset D and a collection G of N' into a training set and a testing set;
s34, SMOTE sampling is carried out on the training set, the sample expansion and contraction proportion of the high-risk personnel data subset and the general personnel data subset is determined, the final sample number is obtained, and the training set sample is obtained after processing;
s4, training a high-risk personnel recognition model by using training set data based on an Adaboost algorithm, and determining model parameters; the model parameters comprise a learning rate, the number of weak classifiers, a maximum tree depth, a node minimum splitting value, a leaf node minimum sample number and a maximum feature number;
s5, carrying out high-risk personnel identification model evaluation on the test set data, determining a classification probability critical threshold value, and correcting the model to obtain a final traffic high-risk personnel identification model;
and S6, inputting the data of the subset to be identified in the step S2 into the identification model of the high-risk traffic personnel obtained in the step S5, and obtaining the identification result of the high-risk personnel.
2. The method for identifying high-risk traffic personnel based on Adaboost algorithm as claimed in claim 1, wherein: the original traffic violation data and accident data in step S1 include the certificate information of the relevant person; collecting and classifying illegal records to obtain an illegal data set; the illegal data set records full sample data for the illegal, and the information of the illegal data set comprises personnel certificate numbers, illegal times, illegal types, punishment conditions, accident-related illegal behavior occurrence conditions and illegal occurrence time intervals.
3. The method for identifying high-risk traffic personnel based on Adaboost algorithm as claimed in claim 2, characterized in that: in step S1, the occurrence of the accident-related law violation is obtained by a corresponding analysis method, and the violation type with a high degree of influence of the traffic accident is extracted as the data attribute of the violation data set.
4. The method for identifying high-risk traffic personnel based on Adaboost algorithm as claimed in claim 2, characterized in that: in step S1, the time-continuous variable is converted into a discrete variable, and the discrete variable is classified according to the characteristics of the time of violation.
CN201810815618.1A 2018-07-16 2018-07-16 Adaboost algorithm-based traffic high-risk personnel identification method Active CN109145953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810815618.1A CN109145953B (en) 2018-07-16 2018-07-16 Adaboost algorithm-based traffic high-risk personnel identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810815618.1A CN109145953B (en) 2018-07-16 2018-07-16 Adaboost algorithm-based traffic high-risk personnel identification method

Publications (2)

Publication Number Publication Date
CN109145953A CN109145953A (en) 2019-01-04
CN109145953B true CN109145953B (en) 2021-09-07

Family

ID=64798930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810815618.1A Active CN109145953B (en) 2018-07-16 2018-07-16 Adaboost algorithm-based traffic high-risk personnel identification method

Country Status (1)

Country Link
CN (1) CN109145953B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360525A (en) * 2011-09-28 2012-02-22 东南大学 Discriminant analysis-based high road real-time traffic accident risk forecasting method
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN106991510A (en) * 2017-05-31 2017-07-28 福建江夏学院 A kind of method based on the traffic accident of spatial-temporal distribution characteristic predicted city
CN108275158A (en) * 2017-01-05 2018-07-13 大唐高鸿信息通信研究院(义乌)有限公司 The driving behavior evaluating method of vehicle-mounted short haul connection net

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137684A1 (en) * 2009-12-08 2011-06-09 Peak David F System and method for generating telematics-based customer classifications
US9905131B2 (en) * 2015-12-29 2018-02-27 Thunder Power New Energy Vehicle Development Company Limited Onboard vehicle notification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360525A (en) * 2011-09-28 2012-02-22 东南大学 Discriminant analysis-based high road real-time traffic accident risk forecasting method
CN108275158A (en) * 2017-01-05 2018-07-13 大唐高鸿信息通信研究院(义乌)有限公司 The driving behavior evaluating method of vehicle-mounted short haul connection net
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN106991510A (en) * 2017-05-31 2017-07-28 福建江夏学院 A kind of method based on the traffic accident of spatial-temporal distribution characteristic predicted city

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Dangerous Prediction by Case-Based Approach on Expressways;Fang C Y等;《International IEEE Conference on Intelligent Transportation Systems》;20081231;第1118-1123页 *
基于驾驶员操控行为特征的碰撞预判系统研究;余锋;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20170315(第03期);第C035-251页 *

Also Published As

Publication number Publication date
CN109145953A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109086808B (en) Traffic high-risk personnel identification method based on random forest algorithm
CN109101568B (en) XgBoost algorithm-based traffic high-risk personnel identification method
CN108596409B (en) Method for improving accident risk prediction precision of traffic hazard personnel
CN104268599B (en) Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis
EP1504412B1 (en) Processing mixed numeric and/or non-numeric data
CN109191828B (en) Traffic participant accident risk prediction method based on ensemble learning
CN107320115B (en) Self-adaptive mental fatigue assessment device and method
US8548934B2 (en) System and method for assessing risk
CN109063751B (en) Traffic high-risk personnel identification method based on gradient lifting decision tree algorithm
CN111415099A (en) Poverty-poverty identification method based on multi-classification BP-Adaboost
CN115689040A (en) Traffic accident severity prediction method and system based on convolutional neural network
CN107169515B (en) Personal income classification method based on improved naive Bayes
CN109145953B (en) Adaboost algorithm-based traffic high-risk personnel identification method
CN101923650A (en) Random forest classification method and classifiers based on comparison mode
CN113157913A (en) Ethical behavior discrimination method based on social news data set
CN109285344B (en) Identification method and intelligent decision-making system for key monitoring objects of high-risk traffic personnel
CN112287996B (en) Major event key factor mining method based on machine learning
Solanke et al. Analysis of roadway traffic using data mining techniques for providing safety measures to avoid fatal accidents
CN111598708A (en) Health insurance underwriting rule coding method suitable for big data processing
Saraswat et al. Comparison of different decision tree algorithms for predicting the heart disease
CN112183615B (en) Automobile risk user screening method with Markov chain data processing function
CN117614845B (en) Communication information processing method and device based on big data analysis
CN113378881B (en) Instruction set identification method and device based on information entropy gain SVM model
CN108573229B (en) Video behavior identification method based on intelligent high-level semantics
Mukherjee et al. A Perceiving System for Dementia Patients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 211100 No. 19 Suyuan Avenue, Jiangning Economic and Technological Development Zone, Nanjing City, Jiangsu Province

Applicant after: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

Address before: 210006, Qinhuai District, Jiangsu, Nanjing should be 388 days street, Chenguang 1865 Technology Creative Industry Park E10 building on the third floor

Applicant before: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant