CN110110085A - Traffic accident file classification method and system based on character level neural network and SVM - Google Patents
Traffic accident file classification method and system based on character level neural network and SVM Download PDFInfo
- Publication number
- CN110110085A CN110110085A CN201910334271.3A CN201910334271A CN110110085A CN 110110085 A CN110110085 A CN 110110085A CN 201910334271 A CN201910334271 A CN 201910334271A CN 110110085 A CN110110085 A CN 110110085A
- Authority
- CN
- China
- Prior art keywords
- accident
- svm
- character
- neural network
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Traffic Control Systems (AREA)
Abstract
This application discloses a kind of traffic accident file classification method and system based on character level neural network and SVM, method include obtaining cause of accident corpus, obtain training set and test set, and construct dictionary;Character level neural network model is established, the character level neural network model after being optimized using training set, and extract the accident text feature in training set, and the SVM model after being optimized using the accident text feature of extraction;Utilize the character level neural network model after test set test optimization and the SVM model after optimization;The cause of accident dismantling to be sorted that will acquire is character, the character that dismantling obtains is mapped as multi-dimensional matrix by dictionary, and multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature, optimal SVM model obtains accident text classification result by the accident text feature.The application is not limited by language, and can avoid the expense of pre-training, while avoiding blindness of the SVM model in Feature Selection.
Description
Technical field
The application belongs to intelligent traffic administration system field, and in particular to a kind of traffic based on character level neural network and SVM
Accident file classification method and system.
Background technique
In recent years, to deal carefully with traffic accident, fair and just identification accident responsibility people will spend a large amount of people every year
Power, financial resources are used for the processing of road traffic accident.Therefore, it is badly in need of a kind of traffic accident file classification method of automation, realizes
Analysis to traffic accident data, so that additional transport department finds accident black-spot, further to administer.
The problem of classifying to accident can be attributed to the scope of text classification.Artificial neural network (Artificial
Neural Networks) it is a kind of tissue according to human brain and activity principle and a kind of data driven type nonlinear model for constructing
Type.It is made of several elements such as neuronal structure model, network connection model, Learning Algorithms, is to have certain intelligence
The system of function.In text classification, neural network is the input and output neuron of one group of connection, and input neuron represents word
Item, output neuron indicate the classification of text, and the connection between neuron has corresponding weight.Training stage passes through certain
Algorithm adjusts weight such as Positive Propagation Algorithm and reversed correction algorithm, enable test text according to weight adjusted just
Really learn.To obtain multiple and different neural network models, the text of a unknown classification is then enabled successively to pass through these
Neural network model obtains different output valves, by comparing these output valves, the final classification for determining text.
SVM (Support Vector Machine) refers to support vector machines, is a kind of common method of discrimination.In machine
Device learning areas is the learning model for having supervision, commonly used to carry out pattern-recognition, classification and regression analysis.
Much the model based on artificial neural network is all the unit using higher level to text or language before this
Speech is modeled, such as word (statistical information or n-grams, word2vec etc.), phrase (phrases), sentence
(sentence) level, or semantic and syntactic structure is analyzed.Such as number of patent application is CN201710573388.8,
Patent name is the document of the Chinese Text Categorization based on ultra-deep convolutional neural networks structural model, discloses one kind and is based on
The sorting algorithm of word.However, it is necessary to acquire a large amount of corpus in advance to construct term vector model, after the quality of participle directly affects
Continuous classification accuracy, and Chinese can only be handled.
For another example number of patent application is CN201810353803.3, and patent name is based on data fusion and support vector machines
The document of expressway traffic accident Severity forecasting method, road conditions, driver when document collection traffic accident occurs
The Variable Factors such as situation, vehicle condition establish SVM model to predict the severity of expressway traffic accident.But about influence thing
Therefore the selection of severity feature, it is excessively subjective, and the factors such as " road conditions " listed by it, " driver conditions " also may not necessarily
Characterization Accident Characteristic completely.
It is badly in need of solving it can be seen that current traffic accidents text classification is primarily present following problems:
1) traditional traffic accident text classification generally requires manually to demarcate, and wastes a large amount of manpowers, financial resources, and manual operation
Inevitably careless omission, while being also difficult to meet the requirement of timeliness.
2) unit that existing convolutional neural networks are all based on high-level is modeled (word, phrase or sentence), and one
Aspect increases trained complexity, on the other hand also limits the universality of model.In addition, traditional softmax classifier exists
It is also to be improved in accuracy.
3) existing supporting vector machine model lacks objective basis in the extraction of feature, often only with artificial experience, system
The about promotion of model accuracy.
Summary of the invention
The application's is designed to provide a kind of traffic accident file classification method based on character level neural network and SVM
And system, it can avoid the expense of pre-training, and do not limited by language, while avoiding blindness of the SVM model in Feature Selection
Property.
To achieve the above object, the technical solution that the application is taken are as follows:
A kind of traffic accident file classification method based on character level neural network and SVM, it is described based on character level nerve
The traffic accident file classification method of network and SVM, comprising the following steps:
Cause of accident corpus is obtained, division obtains training set and test set, and the cause of accident corpus is disassembled as word
Symbol constructs dictionary;
Character level neural network model is established, optimizes the character level neural network model using the training set, utilizes
Character level neural network model after optimization extracts the accident text feature of training intensive data, and utilizes the accident text extracted
Feature trains SVM model, until the SVM model after being optimized;
The accident text feature of the test intensive data is extracted using the character level neural network model after optimization, and will
SVM model after accident text feature input optimization, if it is default to judge that the classification results error of the SVM model output is less than
Value, then obtain optimal SVM model;Otherwise training set optimization SVM model is continued with;
The cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary
Multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature,
Optimal SVM model obtains accident text classification result by the accident text feature.
Preferably, the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond
Change layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
Preferably, it is described using the accident text feature extracted training SVM model, until the SVM mould after being optimized
Type, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses
Gaussian kernel:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion
Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle
With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
If being obtained optimal preferably, the classification results error for judging the SVM model output is less than preset value
SVM model;Otherwise training set optimization SVM model is continued with, comprising:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model
Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM
Model;Otherwise training set optimization SVM model is continued with.
Preferably, the cause of accident dismantling to be sorted that will acquire is character, will be disassembled by the dictionary
To character be mapped as multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary
In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up
In vector, a N*M dimension matrix is obtained.
Present invention also provides a kind of traffic accident Text Classification System based on character level neural network and SVM, it is described
Traffic accident Text Classification System based on character level neural network and SVM includes:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the accident
The dismantling of reason corpus is character, constructs dictionary;
Training module optimizes the character level nerve using the training set for establishing character level neural network model
Network model, the accident text feature of training intensive data is extracted using the character level neural network model after optimization, and is utilized
The accident text feature training SVM model of extraction, until the SVM model after being optimized;
Test module, for extracting the accident of the test intensive data using the character level neural network model after optimization
Text feature, and by the SVM model after accident text feature input optimization, if judging the classification knot of the SVM model output
Fruit error is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, will be disassembled by the dictionary
To character be mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and is extracted
Accident text feature, optimal SVM model obtain accident text classification result by the accident text feature.
Preferably, the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond
Change layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
Preferably, it is described using the accident text feature extracted training SVM model, until the SVM mould after being optimized
Type, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses
Gaussian kernel:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion
Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle
With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
If being obtained optimal preferably, the classification results error for judging the SVM model output is less than preset value
SVM model;Otherwise training set optimization SVM model is continued with, comprising:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model
Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM
Model;Otherwise training set optimization SVM model is continued with.
Preferably, the cause of accident dismantling to be sorted that will acquire is character, will be disassembled by the dictionary
To character be mapped as multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary
In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up
In vector, a N*M dimension matrix is obtained.
Traffic accident file classification method and system provided by the present application based on character level neural network and SVM, first
SVM classifier is inputted using the feature of neural metwork training study traffic accident text, and then by this feature, carries out accident point
Class.This mode efficiently avoids blindness of traditional SVM model in Feature Selection;Simultaneously as SVM is mathematically
Completeness, avoid the embarrassing phenomenon that traditional neural network is easily trapped into local optimum.The application from execution efficiency, by
In do not need to corpus carry out word segmentation processing, so reducing time complexity, while also avoiding the expense of pre-training;From Shandong
It is seen on stick, the feature that the mode that CNN and SVM are combined extracts, more traditional NLP model is more healthy and strong, can more characterize not
Generic essential difference, simultaneously because algorithm be based on character level low-level classification, therefore nicety of grading independent of point
The quality of word result will not be influenced by inessential character, thus robustness is higher.
Detailed description of the invention
Fig. 1 is the traffic accident file classification method flow diagram based on character level neural network and SVM of the application;
Fig. 2 is a kind of example structure schematic diagram of the character level neural network model of the application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that the described embodiments are only a part but not all of the embodiments of the present application.Based on this
Embodiment in application, every other reality obtained by those of ordinary skill in the art without making creative efforts
Example is applied, shall fall in the protection scope of this application.
Unless otherwise defined, all technical and scientific terms used herein and the technical field for belonging to the application
The normally understood meaning of technical staff is identical.The term used in the description of the present application is intended merely to description tool herein
The purpose of the embodiment of body is not to be to limit the application.
There is no stringent sequences to limit for the execution of the step of involved in the present processes, these steps can be with other
Sequence execute.Moreover, at least part step may include multiple sub-steps perhaps these sub-steps of multiple stages or rank
Section is not necessarily to execute completion in synchronization, but can execute at different times, these sub-steps or stage
Execution sequence is also not necessarily and successively carries out, but can be with the sub-step or stage of other steps or other steps extremely
Few a part executes in turn or alternately.
As shown in Figure 1, wherein providing a kind of traffic accident text based on character level neural network and SVM in an embodiment
This classification method, method includes the following steps:
S1, cause of accident corpus is obtained, division obtains training set and test set, and is by cause of accident corpus dismantling
Character constructs dictionary.
S1.1, training set and test set are obtained
Cause of accident corpus can be obtained by receiving the word content directly inputted, can also be searched for automatically and be obtained according to keyword
?.When being divided to cause of accident corpus, since the size of training set will affect the instruction of following model to a certain extent
Practice, therefore is divided cause of accident corpus to obtain training set and test set according to the ratio of 7:3 in the present embodiment.
S1.2, building dictionary
According to all cause of accident corpus of acquisition, repeat character (RPT) is removed, obtains summarizing the character set after alphabet, and
Using the character set as dictionary, which includes M character.In one embodiment, dictionary includes 274 characters, 274 characters
Successively are as follows:
" vehicle people's vapour, thing do not have two to touch storage battery to hit.Wipe with mutually hurt three-wheel and 1 motor 20 with fortunately it is not a protect to cry do not rescue not
Goods, which will practise medicine to the existing side of alarm to run, knows that I of a logical back court opens that antithetical phrase is wired inner to be dragged with by the own single pole drawing of machine on column from sedan-chair point in institute
When main road surface go card without door stop thus 9 escape to give away 7 tree board ditches 5 and moved wall packet Zhejiang bolt water F: in walk public affairs be that colored gently disappeared presses anti-friendship
His cell tail Xia Chu anxious two is turned over from column class mouthful day to rent 3 (people Pi Zuo, which defends title 8 and pounds right dividing pier east and say that west etc. chases after, before the first rubbish rubbish that escapes out sees stone
Face the brick ramp of a bridge and sail ability S person's yesterday) the micro- boat foot of her ring drink two disease visitors accuse 6 area Zhang Qiaozai case seven of largo live pipe words all pieces hang it is high stirring
Feel it is black toward slip bad leakage shovel Shuai Chang building team protect ask Pu scrape viviparous road lie standby 4 half peaceful disconnected department more slopes of cable of child's reason) the big wine T of tank altar drives heap
Red iron early dig when yellow Fuan the present examine bar flower library porcelain this to mix the city Nan Qicheng extra large noon bucket Dai Bianfen Qi Lingxianmai luminous energy outer into heart oil clot
Factory's note ".
S1.3, cause of accident classification
Classify to each cause of accident in collected corpus, is each cause of accident addition mark according to classification results
Label.It should be noted that preferably carried out using artificial to the operation that each cause of accident is classified, to guarantee to subsequent mould herein
The validity of type training.
According to accident common type, in one embodiment, accident pattern belonging to cause of accident is main are as follows: two motor vehicles,
Motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and fixture, two non-motor vehicles, non-motor vehicle and pedestrian.According to this
Six classes divide cause of accident, and each cause of accident respectively in training set and test set adds label, i.e. accident is former
The label of cause equally includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and fixture, two non-machines
Motor-car, non-motor vehicle and pedestrian.
S2, character level neural network model is established.
As shown in Fig. 2, in one embodiment, the character level neural network model of foundation are as follows: from input layer (Embedding)
I starts, successively by convolutional layer (CNN) C, pond layer (Max Pooling) M, full articulamentum (Fully Connected) F1,
Full F2, Softmax layers of articulamentum (Fully Connected).
Wherein, input layer I: supporting while inputting multiple characters, each character by one-hot be encoded to a M tie up to
It measures (depending on the number of characters in dictionary).W in Fig. 20~W5It indicates the character of input, only indicates to support while input is more
A character, not as the limitation to the number of characters of input simultaneously.
Input layer is the square that the corresponding word vector of character is successively arranged (from top to bottom) in cause of accident to be sorted
Battle array.Such as in one embodiment, cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary
For M, maps to obtain a M dimensional vector using each character in dictionary, each character in cause of accident to be sorted is reflected
It is mapped in the M dimensional vector, obtains a N*M dimension matrix.
Convolutional layer C: convolution operation obtains several Feature Map, and the size of convolution window is 5 × M, wherein 5 indicate
Convolution kernel size, M indicate the dimension of word vector.By the convolution window, several Feature Map will be obtained.
Pond layer M: maximum value is proposed from Feature Map one-dimensional before, represents most important signal.It can be with
Find out, this Pooling mode can solve the alarm content input problem of variable-length (because no matter in Feature Map
How many value, it is only necessary to extract maximum value therein).Finally, the output of pond layer is the maximum of each Feature Map
Value, i.e. an one-dimensional vector.
Full articulamentum F1: accident text feature is the feature of full articulamentum F1 output, and the characteristic value of this layer output is accurate
Rate is high.
Full articulamentum F2: characteristic pattern is mapped to a regular length (classification number).
Softmax layers: output reflection cause of accident belongs to " two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and row
The probability distribution of people, motor vehicle and fixture, two non-motor vehicles, non-motor vehicle and pedestrian " these sixth types.
It should be noted that the traffic accident file classification method of the application is inadequate in order to overcome Softmax layers of accuracy rate
Ideal problem is inputted using the output of full articulamentum F1 as the feature of SVM classifier when carrying out text classification prediction,
It can overcome the problems, such as that SVM classifier relies on artificial experience in the selection of feature simultaneously, to be obviously improved the standard of text classification
True property.
And full F2 and Softmax layers of articulamentum is applied in the character level neural network model training stage, in order to obtain
The characteristic value of full articulamentum F1 output, it is necessary to full articulamentum F2 and Softmax layers of formation closed network is relied on, thus using gradient
Descent method obtains the weight parameter of construction feature.Training in relation to character level neural network model is described in detail in the next steps.
S3, optimize the character level neural network model using the training set, utilize the character level nerve net after optimization
The accident text feature of network model extraction training intensive data.
The training set for adding label is input to the character level neural network model constructed in step S2, and utilizes gradient
The continuous iteration optimization network parameter of descent algorithm.
Specifically, when optimizing network parameter: training set data being carried out to primary positive transmitting in neural network first, is obtained
To prediction result y_hat;Secondly error gradient (error gradient) δ of output layer neuron is calculated;Final updating weight
Changes delta w_i.Complete once to the traversal of entire data set after, Δ w_i (weight changing value) and w_i is (preset
Weight) it is added, new weight w_i is obtained, that is, completes the update once to weight.
Continuous iteration updates weight as procedure described above, until error meets the requirements to arrive the optimal character level
Neural network model.In one embodiment, the optimal network parameter that character level neural network model obtains after continuing to optimize is as follows:
Embedding_dim=64# term vector dimension;
Seq_length=100# sequence length;
Num_classes=6# classification number;
Num_filters=256# convolution kernel number;
Kernel_size=5# convolution kernel size, step-length 1;
Pool_size=2# Chi Huahe size, step-length 2;
Vocab_size=274# lexical representation is small;
The full articulamentum neuron of hidden_dim=128#;
Dropout_keep_prob=0.5#dropout retaining ratio;
Learning_rate=1e-3# learning rate;
Batch_size=32# every batch of trains size;
The total iteration round of num_epochs=200#;
Print_per_batch=5# exports primary result per how many wheels;
Save_per_batch=5# is stored in tensorboard per how many wheels.
It is complete in model using the accident text feature of the character level neural network model extraction training intensive data after optimization
The output of articulamentum F1 is that can most react the accident text feature of incident attributes after neural network iteration optimizing.
The present embodiment using the vector exported after first full articulamentum, can by by it is original based on Softmax to feature
The method classified, is improved to the method based on SVM, and this automatic study extracts the mode of feature, also avoids SVM and selecting
Select the too strong disadvantage of characteristic aspect subjectivity.
Character level neural network model i.e. in the present embodiment Softmax layer output probability distribution not directly as
Accident text classification is as a result, use only for reference.And character level neural network model is mainly used for mentioning using first full articulamentum
Taking-up accident text feature.
S4, the accident text feature training SVM model using extraction, until the SVM model after being optimized.
In order to solve the problems, such as accidents classification be it is nonlinear, pass through Nonlinear Mapping Φ: Rd→ H, by the former input space
Sample is mapped in the feature space H of higher-dimension, then constructs optimal separating hyper plane in high-dimensional feature space H.In higher dimensional space
Need to calculate the dot product of sample point vector when solution, operand is very big, therefore using the kernel function K for meeting Mercer condition
(x,xi) replace dot-product operation.
In one embodiment, when accident text feature being transformed into higher dimensional space progress linear partition by kernel function, core
Function uses Gaussian kernel (RBF):
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function.
The accident text feature of input can be mapped on high-dimensional feature space by RBF kernel function, solved linearly inseparable and asked
Topic.Then pass through grid search in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion
Method obtains the optimized parameter of SVM model, to complete the optimization of SVM model.
S5, SVM model is tested using test set.
The accident text feature of test intensive data is extracted using the character level neural network model after optimization, and by the thing
Therefore the SVM model after text feature input optimization, judge that the classification results error of the SVM model output is less than preset value, then
Obtain optimal SVM model;Otherwise training set optimization SVM model is continued with.
In the relationship of the classification results and preset value that judge the output of SVM model, can be in one embodiment by model
Classification results and test set sample legitimate reading are made comparisons, and when being compared, are defined confusion matrix, are obscured square according to described
Each cause of accident and corresponding label in battle array and the classification results and test set of the output of SVM model, calculate accuracy rate and recall
Rate;If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model;
Otherwise training set optimization SVM model is continued with.
S6, determine accident pattern.
The cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary
Multi-dimensional matrix.When disassembling to cause of accident, a N*M dimension matrix is obtained using the method in step S2, and the N*M is tieed up into square
Battle array is input in the character level neural network model after optimization, completes accident text spy by convolutional layer, pond layer, full articulamentum
The extraction of sign, accident text classification is finally obtained by the accident text feature using optimal SVM model as a result, be should be to
The accident pattern that the cause of accident of classification is belonged to.
The application carries out traffic accident text classification as unit of character, more traditional based on word, phrase or sentence etc.
The classification mode of high-level is not needed using information such as the good term vector of pre-training and grammer syntactic structures, and not by language
Limitation, universality are strong.
In addition, it is not ideal enough in order to solve Softmax layers of accuracy rate of neural network, and SVM classifier is in the selection of feature
The problem of upper dependence artificial experience.The application using neural metwork training study traffic accident text feature, and then by the spy
Sign input SVM classifier, carries out accidents classification.This mode efficiently avoids traditional SVM model in Feature Selection
Blindness;Simultaneously as the completeness of SVM mathematically, avoids the embarrassment that traditional neural network is easily trapped into local optimum
Phenomenon.
The application is from execution efficiency, due to not needing to carry out word segmentation processing to corpus, so reduce time complexity
Degree, while also avoiding the expense of pre-training;From robustness, the mode that CNN (convolutional neural networks) and SVM are combined is mentioned
The feature of taking-up, more traditional NLP model is more healthy and strong, can more characterize different classes of essential difference, simultaneously because algorithm
It is the low-level classification based on character level, therefore quality of the nicety of grading independent of word segmentation result, it will not be by inessential character
Influence, thus robustness is higher.Can determine rapidly type of fault with additional transport department, efficiently find accident black-spot, for into
The accident diagnosis of one step and prevention are offered an opinion and are referred to.
In one embodiment, a kind of traffic accident text classification system based on character level neural network and SVM is additionally provided
System, the traffic accident Text Classification System based on character level neural network and SVM include:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the accident
The dismantling of reason corpus is character, constructs dictionary;
Training module optimizes the character level nerve using the training set for establishing character level neural network model
Network model, the accident text feature of training intensive data is extracted using the character level neural network model after optimization, and is utilized
The accident text feature training SVM model of extraction, until the SVM model after being optimized;
Test module, for extracting the accident of the test intensive data using the character level neural network model after optimization
Text feature, and by the SVM model after accident text feature input optimization, if judging the classification knot of the SVM model output
Fruit error is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, will be disassembled by the dictionary
To character be mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and is extracted
Accident text feature, optimal SVM model obtain accident text classification result by the accident text feature.
In another embodiment, used character level neural network model are as follows: since input layer I, successively through pulleying
It is lamination C, pond layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;And the accident text feature is full articulamentum F1
The feature of output.
Specifically, training module is using the accident text feature training SVM model extracted, until the SVM after being optimized
Model performs the following operations in one embodiment:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, kernel function uses Gauss
Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion
Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle
With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
In another embodiment, if test module judges that the classification results error of SVM model output is less than preset value,
To optimal SVM model;Otherwise training set optimization SVM model is continued with, is performed the following operations:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model
Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM
Model;Otherwise training set optimization SVM model is continued with.
Specifically, the cause of accident dismantling to be sorted that categorization module will acquire is character, will be disassembled by the dictionary
Obtained character is mapped as multi-dimensional matrix, performs the following operations in one embodiment:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary
In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up
In vector, a N*M dimension matrix is obtained.
About other restrictions of the traffic accident Text Classification System based on character level neural network and SVM, referring to above-mentioned
About the specific restriction based on character level neural network and the traffic accident file classification method of SVM, no longer repeated herein.
In one embodiment, a kind of computer equipment, i.e., a kind of friendship based on character level neural network and SVM are provided
Interpreter's event Text Classification System, the computer equipment can be terminal, and internal structure may include being connected by system bus
Processor, memory, network interface, display screen and input unit.Wherein, the processor of the computer equipment is based on providing
Calculation and control ability.The memory of the computer equipment includes non-volatile memory medium, built-in storage.The non-volatile memories
Media storage has operating system and computer program.The built-in storage is the operating system and calculating in non-volatile memory medium
The operation of machine program provides environment.The network interface of the computer equipment is used to communicate with external terminal by network connection.
To realize the above-mentioned traffic accident text classification based on character level neural network and SVM when the computer program is executed by processor
Method.The display screen of the computer equipment can be liquid crystal display or electric ink display screen, the computer equipment it is defeated
Entering device can be the touch layer covered on display screen, be also possible to the key being arranged on computer equipment shell, trace ball or
Trackpad can also be external keyboard, Trackpad or mouse etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not present
Contradiction all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application
Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of traffic accident file classification method based on character level neural network and SVM, which is characterized in that described to be based on word
Accord with the traffic accident file classification method of grade neural network and SVM, comprising the following steps:
Cause of accident corpus is obtained, division obtains training set and test set, and the cause of accident corpus is disassembled as character, structure
Build dictionary;
Character level neural network model is established, optimizes the character level neural network model using the training set, utilizes optimization
Character level neural network model afterwards extracts the accident text feature of training intensive data, and utilizes the accident text feature extracted
Training SVM model, until the SVM model after being optimized;
The accident text feature of the test intensive data is extracted using the character level neural network model after optimization, and by the thing
Therefore the SVM model after text feature input optimization, if judging, the classification results error of the SVM model output is less than preset value,
Then obtain optimal SVM model;Otherwise training set optimization SVM model is continued with;
The cause of accident dismantling to be sorted that will acquire is character, and the character that dismantling obtains is mapped as multidimensional by the dictionary
Matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature, it is optimal
SVM model accident text classification result is obtained by the accident text feature.
2. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1
In the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond layer M, full articulamentum
It is F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
3. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1
In, it is described using the accident text feature extracted training SVM model, until the SVM model after being optimized, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gauss
Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
Pass through grid data service in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion
The optimized parameter of SVM model is obtained, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and consolidates
Earnest, two non-motor vehicles, non-motor vehicle and pedestrian.
4. the traffic accident file classification method based on character level neural network and SVM, feature exist as claimed in claim 3
In if the classification results error for judging the SVM model output obtains optimal SVM model less than preset value;Otherwise
Continue with training set optimization SVM model, comprising:
Define confusion matrix;
Each cause of accident and corresponding label in the classification results and test set exported according to the confusion matrix and SVM model,
Calculate accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model;
Otherwise training set optimization SVM model is continued with.
5. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1
In the cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary
Multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, using in dictionary
Each character maps to obtain a M dimensional vector, and each character in cause of accident to be sorted is mapped to the M dimensional vector
In, obtain a N*M dimension matrix.
6. a kind of traffic accident Text Classification System based on character level neural network and SVM, which is characterized in that described to be based on word
Symbol grade neural network and the traffic accident Text Classification System of SVM include:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the cause of accident
Corpus dismantling is character, constructs dictionary;
Training module optimizes the character level neural network using the training set for establishing character level neural network model
Model is extracted the accident text feature of training intensive data using the character level neural network model after optimization, and utilizes extraction
Accident text feature training SVM model, until the SVM model after being optimized;
Test module, for extracting the accident text of the test intensive data using the character level neural network model after optimization
Feature, and by the SVM model after accident text feature input optimization, if judging, the classification results of the SVM model output are missed
Difference is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, is obtained dismantling by the dictionary
Character is mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident
Text feature, optimal SVM model obtain accident text classification result by the accident text feature.
7. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 6
In the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond layer M, full articulamentum
It is F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
8. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 6
In the training module is using the accident text feature training SVM model extracted, until the SVM model after being optimized, executes
Following operation:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gauss
Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
Pass through grid data service in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion
The optimized parameter of SVM model is obtained, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and consolidates
Earnest, two non-motor vehicles, non-motor vehicle and pedestrian.
9. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 8
In if the test module judges that the classification results error of the SVM model output less than preset value, obtains optimal SVM
Model;Otherwise training set optimization SVM model is continued with, is performed the following operations:
Define confusion matrix;
Each cause of accident and corresponding label in the classification results and test set exported according to the confusion matrix and SVM model,
Calculate accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model;
Otherwise training set optimization SVM model is continued with.
10. the traffic accident Text Classification System based on character level neural network and SVM as claimed in claim 6, feature
It is, the cause of accident dismantling to be sorted that the categorization module will acquire is character, is obtained dismantling by the dictionary
Character is mapped as multi-dimensional matrix, performs the following operations:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, using in dictionary
Each character maps to obtain a M dimensional vector, and each character in cause of accident to be sorted is mapped to the M dimensional vector
In, obtain a N*M dimension matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910334271.3A CN110110085A (en) | 2019-04-24 | 2019-04-24 | Traffic accident file classification method and system based on character level neural network and SVM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910334271.3A CN110110085A (en) | 2019-04-24 | 2019-04-24 | Traffic accident file classification method and system based on character level neural network and SVM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110110085A true CN110110085A (en) | 2019-08-09 |
Family
ID=67486560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910334271.3A Pending CN110110085A (en) | 2019-04-24 | 2019-04-24 | Traffic accident file classification method and system based on character level neural network and SVM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110085A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291552A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and system for correcting text content |
CN111599170A (en) * | 2020-04-13 | 2020-08-28 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN112115965A (en) * | 2020-08-04 | 2020-12-22 | 西安交通大学 | SVM-based passive operating system identification method, storage medium and equipment |
CN112395528A (en) * | 2019-08-13 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Text label distinguishing method and device, electronic equipment and storage medium |
CN113341894A (en) * | 2021-05-27 | 2021-09-03 | 河钢股份有限公司承德分公司 | Accident rule data generation method and device and terminal equipment |
CN113592040A (en) * | 2021-09-27 | 2021-11-02 | 山东蓝湾新材料有限公司 | Method and device for classifying dangerous chemical accidents |
CN114157411A (en) * | 2021-11-29 | 2022-03-08 | 中信数智(武汉)科技有限公司 | Grouping encryption identification method based on LeNet5-SVM |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN107895000A (en) * | 2017-10-30 | 2018-04-10 | 昆明理工大学 | A kind of cross-cutting semantic information retrieval method based on convolutional neural networks |
CN108710967A (en) * | 2018-04-19 | 2018-10-26 | 东南大学 | Expressway traffic accident Severity forecasting method based on data fusion and support vector machines |
CN108920586A (en) * | 2018-06-26 | 2018-11-30 | 北京工业大学 | A kind of short text classification method based on depth nerve mapping support vector machines |
CN109446332A (en) * | 2018-12-25 | 2019-03-08 | 银江股份有限公司 | A kind of people's mediation case classification system and method based on feature migration and adaptive learning |
-
2019
- 2019-04-24 CN CN201910334271.3A patent/CN110110085A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN107895000A (en) * | 2017-10-30 | 2018-04-10 | 昆明理工大学 | A kind of cross-cutting semantic information retrieval method based on convolutional neural networks |
CN108710967A (en) * | 2018-04-19 | 2018-10-26 | 东南大学 | Expressway traffic accident Severity forecasting method based on data fusion and support vector machines |
CN108920586A (en) * | 2018-06-26 | 2018-11-30 | 北京工业大学 | A kind of short text classification method based on depth nerve mapping support vector machines |
CN109446332A (en) * | 2018-12-25 | 2019-03-08 | 银江股份有限公司 | A kind of people's mediation case classification system and method based on feature migration and adaptive learning |
Non-Patent Citations (4)
Title |
---|
QU HUA.ET.L: "A Character-Level Method for Text Classification", 《2018 2ND IEEE ADVANCED INFORMATION MANAGEMENT,COMMUNICATES,ELECTRONIC AND AUTOMATION CONTROL CONFERENCE》 * |
YULING CHEN.ET.L: "Research on text sentiment analysis based on CNNs and SVM", 《2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA)》 * |
ZHIQUAN WANG.ET.L: "Research on Web text classification algorithm based on improved CNN and SVM", 《2017 IEEE 17TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》 * |
刘敬学等: "字符级卷积神经网络短文本分类算法", 《计算机工程与应用》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395528A (en) * | 2019-08-13 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Text label distinguishing method and device, electronic equipment and storage medium |
CN112395528B (en) * | 2019-08-13 | 2022-10-21 | 阿里巴巴集团控股有限公司 | Text label distinguishing method and device, electronic equipment and storage medium |
CN111599170A (en) * | 2020-04-13 | 2020-08-28 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN111599170B (en) * | 2020-04-13 | 2021-12-17 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN111291552A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and system for correcting text content |
CN112115965A (en) * | 2020-08-04 | 2020-12-22 | 西安交通大学 | SVM-based passive operating system identification method, storage medium and equipment |
CN113341894A (en) * | 2021-05-27 | 2021-09-03 | 河钢股份有限公司承德分公司 | Accident rule data generation method and device and terminal equipment |
CN113592040A (en) * | 2021-09-27 | 2021-11-02 | 山东蓝湾新材料有限公司 | Method and device for classifying dangerous chemical accidents |
CN114157411A (en) * | 2021-11-29 | 2022-03-08 | 中信数智(武汉)科技有限公司 | Grouping encryption identification method based on LeNet5-SVM |
CN114157411B (en) * | 2021-11-29 | 2024-04-05 | 中信数智(武汉)科技有限公司 | LeNet 5-SVM-based packet encryption identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110085A (en) | Traffic accident file classification method and system based on character level neural network and SVM | |
CN110298037A (en) | The matched text recognition method of convolutional neural networks based on enhancing attention mechanism | |
CN107273490B (en) | Combined wrong question recommendation method based on knowledge graph | |
CN107133960A (en) | Image crack dividing method based on depth convolutional neural networks | |
CN106156003B (en) | A kind of question sentence understanding method in question answering system | |
CN105975931A (en) | Convolutional neural network face recognition method based on multi-scale pooling | |
CN110189334A (en) | The medical image cutting method of the full convolutional neural networks of residual error type based on attention mechanism | |
CN109299262A (en) | A kind of text implication relation recognition methods for merging more granular informations | |
CN106570513A (en) | Fault diagnosis method and apparatus for big data network system | |
CN106650725A (en) | Full convolutional neural network-based candidate text box generation and text detection method | |
CN107578106A (en) | A kind of neutral net natural language inference method for merging semanteme of word knowledge | |
CN108009285A (en) | Forest Ecology man-machine interaction method based on natural language processing | |
CN106934352A (en) | A kind of video presentation method based on two-way fractal net work and LSTM | |
CN109992779A (en) | A kind of sentiment analysis method, apparatus, equipment and storage medium based on CNN | |
CN110490242A (en) | Training method, eye fundus image classification method and the relevant device of image classification network | |
CN110517790B (en) | Compound hepatotoxicity early prediction method based on deep learning and gene expression data | |
Montalbo et al. | Classification of fish species with augmented data using deep convolutional neural network | |
CN112990296A (en) | Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation | |
CN105740227A (en) | Genetic simulated annealing method for solving new words in Chinese segmentation | |
CN106372630A (en) | Face direction detection method based on deep learning | |
CN107578092A (en) | A kind of emotion compounding analysis method and system based on mood and opinion mining | |
CN106970981A (en) | A kind of method that Relation extraction model is built based on transfer matrix | |
CN110009030A (en) | Sewage treatment method for diagnosing faults based on stacking meta learning strategy | |
CN113947161A (en) | Attention mechanism-based multi-label text classification method and system | |
CN109918649A (en) | A kind of suicide Risk Identification Method based on microblogging text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200424 Address after: 314501 room 116, floor 1, building 2, No. 87 Hexi, Changfeng street, Wuzhen Town, Tongxiang City, Jiaxing City, Zhejiang Province Applicant after: Zhejiang Haikang Zhilian Technology Co.,Ltd. Address before: Yuhang District, Hangzhou City, Zhejiang Province, 311121 West No. 1500 Building 1 room 311 Applicant before: CETHIK GROUP Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190809 |