CN110110085A - Traffic accident file classification method and system based on character level neural network and SVM - Google Patents

Traffic accident file classification method and system based on character level neural network and SVM Download PDF

Info

Publication number
CN110110085A
CN110110085A CN201910334271.3A CN201910334271A CN110110085A CN 110110085 A CN110110085 A CN 110110085A CN 201910334271 A CN201910334271 A CN 201910334271A CN 110110085 A CN110110085 A CN 110110085A
Authority
CN
China
Prior art keywords
accident
svm
character
neural network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910334271.3A
Other languages
Chinese (zh)
Inventor
刘彦斌
智伟
温熙华
程元晖
李志伟
陈鹏飞
孙炯炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Haikang Zhilian Technology Co ltd
Original Assignee
CETHIK Group Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETHIK Group Ltd filed Critical CETHIK Group Ltd
Priority to CN201910334271.3A priority Critical patent/CN110110085A/en
Publication of CN110110085A publication Critical patent/CN110110085A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Traffic Control Systems (AREA)

Abstract

This application discloses a kind of traffic accident file classification method and system based on character level neural network and SVM, method include obtaining cause of accident corpus, obtain training set and test set, and construct dictionary;Character level neural network model is established, the character level neural network model after being optimized using training set, and extract the accident text feature in training set, and the SVM model after being optimized using the accident text feature of extraction;Utilize the character level neural network model after test set test optimization and the SVM model after optimization;The cause of accident dismantling to be sorted that will acquire is character, the character that dismantling obtains is mapped as multi-dimensional matrix by dictionary, and multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature, optimal SVM model obtains accident text classification result by the accident text feature.The application is not limited by language, and can avoid the expense of pre-training, while avoiding blindness of the SVM model in Feature Selection.

Description

Traffic accident file classification method and system based on character level neural network and SVM
Technical field
The application belongs to intelligent traffic administration system field, and in particular to a kind of traffic based on character level neural network and SVM Accident file classification method and system.
Background technique
In recent years, to deal carefully with traffic accident, fair and just identification accident responsibility people will spend a large amount of people every year Power, financial resources are used for the processing of road traffic accident.Therefore, it is badly in need of a kind of traffic accident file classification method of automation, realizes Analysis to traffic accident data, so that additional transport department finds accident black-spot, further to administer.
The problem of classifying to accident can be attributed to the scope of text classification.Artificial neural network (Artificial Neural Networks) it is a kind of tissue according to human brain and activity principle and a kind of data driven type nonlinear model for constructing Type.It is made of several elements such as neuronal structure model, network connection model, Learning Algorithms, is to have certain intelligence The system of function.In text classification, neural network is the input and output neuron of one group of connection, and input neuron represents word Item, output neuron indicate the classification of text, and the connection between neuron has corresponding weight.Training stage passes through certain Algorithm adjusts weight such as Positive Propagation Algorithm and reversed correction algorithm, enable test text according to weight adjusted just Really learn.To obtain multiple and different neural network models, the text of a unknown classification is then enabled successively to pass through these Neural network model obtains different output valves, by comparing these output valves, the final classification for determining text.
SVM (Support Vector Machine) refers to support vector machines, is a kind of common method of discrimination.In machine Device learning areas is the learning model for having supervision, commonly used to carry out pattern-recognition, classification and regression analysis.
Much the model based on artificial neural network is all the unit using higher level to text or language before this Speech is modeled, such as word (statistical information or n-grams, word2vec etc.), phrase (phrases), sentence (sentence) level, or semantic and syntactic structure is analyzed.Such as number of patent application is CN201710573388.8, Patent name is the document of the Chinese Text Categorization based on ultra-deep convolutional neural networks structural model, discloses one kind and is based on The sorting algorithm of word.However, it is necessary to acquire a large amount of corpus in advance to construct term vector model, after the quality of participle directly affects Continuous classification accuracy, and Chinese can only be handled.
For another example number of patent application is CN201810353803.3, and patent name is based on data fusion and support vector machines The document of expressway traffic accident Severity forecasting method, road conditions, driver when document collection traffic accident occurs The Variable Factors such as situation, vehicle condition establish SVM model to predict the severity of expressway traffic accident.But about influence thing Therefore the selection of severity feature, it is excessively subjective, and the factors such as " road conditions " listed by it, " driver conditions " also may not necessarily Characterization Accident Characteristic completely.
It is badly in need of solving it can be seen that current traffic accidents text classification is primarily present following problems:
1) traditional traffic accident text classification generally requires manually to demarcate, and wastes a large amount of manpowers, financial resources, and manual operation Inevitably careless omission, while being also difficult to meet the requirement of timeliness.
2) unit that existing convolutional neural networks are all based on high-level is modeled (word, phrase or sentence), and one Aspect increases trained complexity, on the other hand also limits the universality of model.In addition, traditional softmax classifier exists It is also to be improved in accuracy.
3) existing supporting vector machine model lacks objective basis in the extraction of feature, often only with artificial experience, system The about promotion of model accuracy.
Summary of the invention
The application's is designed to provide a kind of traffic accident file classification method based on character level neural network and SVM And system, it can avoid the expense of pre-training, and do not limited by language, while avoiding blindness of the SVM model in Feature Selection Property.
To achieve the above object, the technical solution that the application is taken are as follows:
A kind of traffic accident file classification method based on character level neural network and SVM, it is described based on character level nerve The traffic accident file classification method of network and SVM, comprising the following steps:
Cause of accident corpus is obtained, division obtains training set and test set, and the cause of accident corpus is disassembled as word Symbol constructs dictionary;
Character level neural network model is established, optimizes the character level neural network model using the training set, utilizes Character level neural network model after optimization extracts the accident text feature of training intensive data, and utilizes the accident text extracted Feature trains SVM model, until the SVM model after being optimized;
The accident text feature of the test intensive data is extracted using the character level neural network model after optimization, and will SVM model after accident text feature input optimization, if it is default to judge that the classification results error of the SVM model output is less than Value, then obtain optimal SVM model;Otherwise training set optimization SVM model is continued with;
The cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary Multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature, Optimal SVM model obtains accident text classification result by the accident text feature.
Preferably, the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond Change layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
Preferably, it is described using the accident text feature extracted training SVM model, until the SVM mould after being optimized Type, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gaussian kernel:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
If being obtained optimal preferably, the classification results error for judging the SVM model output is less than preset value SVM model;Otherwise training set optimization SVM model is continued with, comprising:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM Model;Otherwise training set optimization SVM model is continued with.
Preferably, the cause of accident dismantling to be sorted that will acquire is character, will be disassembled by the dictionary To character be mapped as multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up In vector, a N*M dimension matrix is obtained.
Present invention also provides a kind of traffic accident Text Classification System based on character level neural network and SVM, it is described Traffic accident Text Classification System based on character level neural network and SVM includes:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the accident The dismantling of reason corpus is character, constructs dictionary;
Training module optimizes the character level nerve using the training set for establishing character level neural network model Network model, the accident text feature of training intensive data is extracted using the character level neural network model after optimization, and is utilized The accident text feature training SVM model of extraction, until the SVM model after being optimized;
Test module, for extracting the accident of the test intensive data using the character level neural network model after optimization Text feature, and by the SVM model after accident text feature input optimization, if judging the classification knot of the SVM model output Fruit error is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, will be disassembled by the dictionary To character be mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and is extracted Accident text feature, optimal SVM model obtain accident text classification result by the accident text feature.
Preferably, the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond Change layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
Preferably, it is described using the accident text feature extracted training SVM model, until the SVM mould after being optimized Type, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gaussian kernel:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
If being obtained optimal preferably, the classification results error for judging the SVM model output is less than preset value SVM model;Otherwise training set optimization SVM model is continued with, comprising:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM Model;Otherwise training set optimization SVM model is continued with.
Preferably, the cause of accident dismantling to be sorted that will acquire is character, will be disassembled by the dictionary To character be mapped as multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up In vector, a N*M dimension matrix is obtained.
Traffic accident file classification method and system provided by the present application based on character level neural network and SVM, first SVM classifier is inputted using the feature of neural metwork training study traffic accident text, and then by this feature, carries out accident point Class.This mode efficiently avoids blindness of traditional SVM model in Feature Selection;Simultaneously as SVM is mathematically Completeness, avoid the embarrassing phenomenon that traditional neural network is easily trapped into local optimum.The application from execution efficiency, by In do not need to corpus carry out word segmentation processing, so reducing time complexity, while also avoiding the expense of pre-training;From Shandong It is seen on stick, the feature that the mode that CNN and SVM are combined extracts, more traditional NLP model is more healthy and strong, can more characterize not Generic essential difference, simultaneously because algorithm be based on character level low-level classification, therefore nicety of grading independent of point The quality of word result will not be influenced by inessential character, thus robustness is higher.
Detailed description of the invention
Fig. 1 is the traffic accident file classification method flow diagram based on character level neural network and SVM of the application;
Fig. 2 is a kind of example structure schematic diagram of the character level neural network model of the application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that the described embodiments are only a part but not all of the embodiments of the present application.Based on this Embodiment in application, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall in the protection scope of this application.
Unless otherwise defined, all technical and scientific terms used herein and the technical field for belonging to the application The normally understood meaning of technical staff is identical.The term used in the description of the present application is intended merely to description tool herein The purpose of the embodiment of body is not to be to limit the application.
There is no stringent sequences to limit for the execution of the step of involved in the present processes, these steps can be with other Sequence execute.Moreover, at least part step may include multiple sub-steps perhaps these sub-steps of multiple stages or rank Section is not necessarily to execute completion in synchronization, but can execute at different times, these sub-steps or stage Execution sequence is also not necessarily and successively carries out, but can be with the sub-step or stage of other steps or other steps extremely Few a part executes in turn or alternately.
As shown in Figure 1, wherein providing a kind of traffic accident text based on character level neural network and SVM in an embodiment This classification method, method includes the following steps:
S1, cause of accident corpus is obtained, division obtains training set and test set, and is by cause of accident corpus dismantling Character constructs dictionary.
S1.1, training set and test set are obtained
Cause of accident corpus can be obtained by receiving the word content directly inputted, can also be searched for automatically and be obtained according to keyword ?.When being divided to cause of accident corpus, since the size of training set will affect the instruction of following model to a certain extent Practice, therefore is divided cause of accident corpus to obtain training set and test set according to the ratio of 7:3 in the present embodiment.
S1.2, building dictionary
According to all cause of accident corpus of acquisition, repeat character (RPT) is removed, obtains summarizing the character set after alphabet, and Using the character set as dictionary, which includes M character.In one embodiment, dictionary includes 274 characters, 274 characters Successively are as follows:
" vehicle people's vapour, thing do not have two to touch storage battery to hit.Wipe with mutually hurt three-wheel and 1 motor 20 with fortunately it is not a protect to cry do not rescue not Goods, which will practise medicine to the existing side of alarm to run, knows that I of a logical back court opens that antithetical phrase is wired inner to be dragged with by the own single pole drawing of machine on column from sedan-chair point in institute When main road surface go card without door stop thus 9 escape to give away 7 tree board ditches 5 and moved wall packet Zhejiang bolt water F: in walk public affairs be that colored gently disappeared presses anti-friendship His cell tail Xia Chu anxious two is turned over from column class mouthful day to rent 3 (people Pi Zuo, which defends title 8 and pounds right dividing pier east and say that west etc. chases after, before the first rubbish rubbish that escapes out sees stone Face the brick ramp of a bridge and sail ability S person's yesterday) the micro- boat foot of her ring drink two disease visitors accuse 6 area Zhang Qiaozai case seven of largo live pipe words all pieces hang it is high stirring Feel it is black toward slip bad leakage shovel Shuai Chang building team protect ask Pu scrape viviparous road lie standby 4 half peaceful disconnected department more slopes of cable of child's reason) the big wine T of tank altar drives heap Red iron early dig when yellow Fuan the present examine bar flower library porcelain this to mix the city Nan Qicheng extra large noon bucket Dai Bianfen Qi Lingxianmai luminous energy outer into heart oil clot Factory's note ".
S1.3, cause of accident classification
Classify to each cause of accident in collected corpus, is each cause of accident addition mark according to classification results Label.It should be noted that preferably carried out using artificial to the operation that each cause of accident is classified, to guarantee to subsequent mould herein The validity of type training.
According to accident common type, in one embodiment, accident pattern belonging to cause of accident is main are as follows: two motor vehicles, Motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and fixture, two non-motor vehicles, non-motor vehicle and pedestrian.According to this Six classes divide cause of accident, and each cause of accident respectively in training set and test set adds label, i.e. accident is former The label of cause equally includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and fixture, two non-machines Motor-car, non-motor vehicle and pedestrian.
S2, character level neural network model is established.
As shown in Fig. 2, in one embodiment, the character level neural network model of foundation are as follows: from input layer (Embedding) I starts, successively by convolutional layer (CNN) C, pond layer (Max Pooling) M, full articulamentum (Fully Connected) F1, Full F2, Softmax layers of articulamentum (Fully Connected).
Wherein, input layer I: supporting while inputting multiple characters, each character by one-hot be encoded to a M tie up to It measures (depending on the number of characters in dictionary).W in Fig. 20~W5It indicates the character of input, only indicates to support while input is more A character, not as the limitation to the number of characters of input simultaneously.
Input layer is the square that the corresponding word vector of character is successively arranged (from top to bottom) in cause of accident to be sorted Battle array.Such as in one embodiment, cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary For M, maps to obtain a M dimensional vector using each character in dictionary, each character in cause of accident to be sorted is reflected It is mapped in the M dimensional vector, obtains a N*M dimension matrix.
Convolutional layer C: convolution operation obtains several Feature Map, and the size of convolution window is 5 × M, wherein 5 indicate Convolution kernel size, M indicate the dimension of word vector.By the convolution window, several Feature Map will be obtained.
Pond layer M: maximum value is proposed from Feature Map one-dimensional before, represents most important signal.It can be with Find out, this Pooling mode can solve the alarm content input problem of variable-length (because no matter in Feature Map How many value, it is only necessary to extract maximum value therein).Finally, the output of pond layer is the maximum of each Feature Map Value, i.e. an one-dimensional vector.
Full articulamentum F1: accident text feature is the feature of full articulamentum F1 output, and the characteristic value of this layer output is accurate Rate is high.
Full articulamentum F2: characteristic pattern is mapped to a regular length (classification number).
Softmax layers: output reflection cause of accident belongs to " two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and row The probability distribution of people, motor vehicle and fixture, two non-motor vehicles, non-motor vehicle and pedestrian " these sixth types.
It should be noted that the traffic accident file classification method of the application is inadequate in order to overcome Softmax layers of accuracy rate Ideal problem is inputted using the output of full articulamentum F1 as the feature of SVM classifier when carrying out text classification prediction, It can overcome the problems, such as that SVM classifier relies on artificial experience in the selection of feature simultaneously, to be obviously improved the standard of text classification True property.
And full F2 and Softmax layers of articulamentum is applied in the character level neural network model training stage, in order to obtain The characteristic value of full articulamentum F1 output, it is necessary to full articulamentum F2 and Softmax layers of formation closed network is relied on, thus using gradient Descent method obtains the weight parameter of construction feature.Training in relation to character level neural network model is described in detail in the next steps.
S3, optimize the character level neural network model using the training set, utilize the character level nerve net after optimization The accident text feature of network model extraction training intensive data.
The training set for adding label is input to the character level neural network model constructed in step S2, and utilizes gradient The continuous iteration optimization network parameter of descent algorithm.
Specifically, when optimizing network parameter: training set data being carried out to primary positive transmitting in neural network first, is obtained To prediction result y_hat;Secondly error gradient (error gradient) δ of output layer neuron is calculated;Final updating weight Changes delta w_i.Complete once to the traversal of entire data set after, Δ w_i (weight changing value) and w_i is (preset Weight) it is added, new weight w_i is obtained, that is, completes the update once to weight.
Continuous iteration updates weight as procedure described above, until error meets the requirements to arrive the optimal character level Neural network model.In one embodiment, the optimal network parameter that character level neural network model obtains after continuing to optimize is as follows:
Embedding_dim=64# term vector dimension;
Seq_length=100# sequence length;
Num_classes=6# classification number;
Num_filters=256# convolution kernel number;
Kernel_size=5# convolution kernel size, step-length 1;
Pool_size=2# Chi Huahe size, step-length 2;
Vocab_size=274# lexical representation is small;
The full articulamentum neuron of hidden_dim=128#;
Dropout_keep_prob=0.5#dropout retaining ratio;
Learning_rate=1e-3# learning rate;
Batch_size=32# every batch of trains size;
The total iteration round of num_epochs=200#;
Print_per_batch=5# exports primary result per how many wheels;
Save_per_batch=5# is stored in tensorboard per how many wheels.
It is complete in model using the accident text feature of the character level neural network model extraction training intensive data after optimization The output of articulamentum F1 is that can most react the accident text feature of incident attributes after neural network iteration optimizing.
The present embodiment using the vector exported after first full articulamentum, can by by it is original based on Softmax to feature The method classified, is improved to the method based on SVM, and this automatic study extracts the mode of feature, also avoids SVM and selecting Select the too strong disadvantage of characteristic aspect subjectivity.
Character level neural network model i.e. in the present embodiment Softmax layer output probability distribution not directly as Accident text classification is as a result, use only for reference.And character level neural network model is mainly used for mentioning using first full articulamentum Taking-up accident text feature.
S4, the accident text feature training SVM model using extraction, until the SVM model after being optimized.
In order to solve the problems, such as accidents classification be it is nonlinear, pass through Nonlinear Mapping Φ: Rd→ H, by the former input space Sample is mapped in the feature space H of higher-dimension, then constructs optimal separating hyper plane in high-dimensional feature space H.In higher dimensional space Need to calculate the dot product of sample point vector when solution, operand is very big, therefore using the kernel function K for meeting Mercer condition (x,xi) replace dot-product operation.
In one embodiment, when accident text feature being transformed into higher dimensional space progress linear partition by kernel function, core Function uses Gaussian kernel (RBF):
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function.
The accident text feature of input can be mapped on high-dimensional feature space by RBF kernel function, solved linearly inseparable and asked Topic.Then pass through grid search in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion Method obtains the optimized parameter of SVM model, to complete the optimization of SVM model.
S5, SVM model is tested using test set.
The accident text feature of test intensive data is extracted using the character level neural network model after optimization, and by the thing Therefore the SVM model after text feature input optimization, judge that the classification results error of the SVM model output is less than preset value, then Obtain optimal SVM model;Otherwise training set optimization SVM model is continued with.
In the relationship of the classification results and preset value that judge the output of SVM model, can be in one embodiment by model Classification results and test set sample legitimate reading are made comparisons, and when being compared, are defined confusion matrix, are obscured square according to described Each cause of accident and corresponding label in battle array and the classification results and test set of the output of SVM model, calculate accuracy rate and recall Rate;If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model; Otherwise training set optimization SVM model is continued with.
S6, determine accident pattern.
The cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary Multi-dimensional matrix.When disassembling to cause of accident, a N*M dimension matrix is obtained using the method in step S2, and the N*M is tieed up into square Battle array is input in the character level neural network model after optimization, completes accident text spy by convolutional layer, pond layer, full articulamentum The extraction of sign, accident text classification is finally obtained by the accident text feature using optimal SVM model as a result, be should be to The accident pattern that the cause of accident of classification is belonged to.
The application carries out traffic accident text classification as unit of character, more traditional based on word, phrase or sentence etc. The classification mode of high-level is not needed using information such as the good term vector of pre-training and grammer syntactic structures, and not by language Limitation, universality are strong.
In addition, it is not ideal enough in order to solve Softmax layers of accuracy rate of neural network, and SVM classifier is in the selection of feature The problem of upper dependence artificial experience.The application using neural metwork training study traffic accident text feature, and then by the spy Sign input SVM classifier, carries out accidents classification.This mode efficiently avoids traditional SVM model in Feature Selection Blindness;Simultaneously as the completeness of SVM mathematically, avoids the embarrassment that traditional neural network is easily trapped into local optimum Phenomenon.
The application is from execution efficiency, due to not needing to carry out word segmentation processing to corpus, so reduce time complexity Degree, while also avoiding the expense of pre-training;From robustness, the mode that CNN (convolutional neural networks) and SVM are combined is mentioned The feature of taking-up, more traditional NLP model is more healthy and strong, can more characterize different classes of essential difference, simultaneously because algorithm It is the low-level classification based on character level, therefore quality of the nicety of grading independent of word segmentation result, it will not be by inessential character Influence, thus robustness is higher.Can determine rapidly type of fault with additional transport department, efficiently find accident black-spot, for into The accident diagnosis of one step and prevention are offered an opinion and are referred to.
In one embodiment, a kind of traffic accident text classification system based on character level neural network and SVM is additionally provided System, the traffic accident Text Classification System based on character level neural network and SVM include:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the accident The dismantling of reason corpus is character, constructs dictionary;
Training module optimizes the character level nerve using the training set for establishing character level neural network model Network model, the accident text feature of training intensive data is extracted using the character level neural network model after optimization, and is utilized The accident text feature training SVM model of extraction, until the SVM model after being optimized;
Test module, for extracting the accident of the test intensive data using the character level neural network model after optimization Text feature, and by the SVM model after accident text feature input optimization, if judging the classification knot of the SVM model output Fruit error is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, will be disassembled by the dictionary To character be mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and is extracted Accident text feature, optimal SVM model obtain accident text classification result by the accident text feature.
In another embodiment, used character level neural network model are as follows: since input layer I, successively through pulleying It is lamination C, pond layer M, full articulamentum F1, articulamentum F2, Softmax layers complete;And the accident text feature is full articulamentum F1 The feature of output.
Specifically, training module is using the accident text feature training SVM model extracted, until the SVM after being optimized Model performs the following operations in one embodiment:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, kernel function uses Gauss Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
It is searched in conjunction with the label of each cause of accident in the training set by grid using the accident text feature after conversion Rope method obtains the optimized parameter of SVM model, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle With fixture, two non-motor vehicles, non-motor vehicle and pedestrian.
In another embodiment, if test module judges that the classification results error of SVM model output is less than preset value, To optimal SVM model;Otherwise training set optimization SVM model is continued with, is performed the following operations:
Define confusion matrix;
Each cause of accident and correspondence in the classification results and test set exported according to the confusion matrix and SVM model Label calculates accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM Model;Otherwise training set optimization SVM model is continued with.
Specifically, the cause of accident dismantling to be sorted that categorization module will acquire is character, will be disassembled by the dictionary Obtained character is mapped as multi-dimensional matrix, performs the following operations in one embodiment:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, utilizes dictionary In each character map to obtain a M dimensional vector, each character in cause of accident to be sorted is mapped to the M and is tieed up In vector, a N*M dimension matrix is obtained.
About other restrictions of the traffic accident Text Classification System based on character level neural network and SVM, referring to above-mentioned About the specific restriction based on character level neural network and the traffic accident file classification method of SVM, no longer repeated herein.
In one embodiment, a kind of computer equipment, i.e., a kind of friendship based on character level neural network and SVM are provided Interpreter's event Text Classification System, the computer equipment can be terminal, and internal structure may include being connected by system bus Processor, memory, network interface, display screen and input unit.Wherein, the processor of the computer equipment is based on providing Calculation and control ability.The memory of the computer equipment includes non-volatile memory medium, built-in storage.The non-volatile memories Media storage has operating system and computer program.The built-in storage is the operating system and calculating in non-volatile memory medium The operation of machine program provides environment.The network interface of the computer equipment is used to communicate with external terminal by network connection. To realize the above-mentioned traffic accident text classification based on character level neural network and SVM when the computer program is executed by processor Method.The display screen of the computer equipment can be liquid crystal display or electric ink display screen, the computer equipment it is defeated Entering device can be the touch layer covered on display screen, be also possible to the key being arranged on computer equipment shell, trace ball or Trackpad can also be external keyboard, Trackpad or mouse etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not present Contradiction all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of traffic accident file classification method based on character level neural network and SVM, which is characterized in that described to be based on word Accord with the traffic accident file classification method of grade neural network and SVM, comprising the following steps:
Cause of accident corpus is obtained, division obtains training set and test set, and the cause of accident corpus is disassembled as character, structure Build dictionary;
Character level neural network model is established, optimizes the character level neural network model using the training set, utilizes optimization Character level neural network model afterwards extracts the accident text feature of training intensive data, and utilizes the accident text feature extracted Training SVM model, until the SVM model after being optimized;
The accident text feature of the test intensive data is extracted using the character level neural network model after optimization, and by the thing Therefore the SVM model after text feature input optimization, if judging, the classification results error of the SVM model output is less than preset value, Then obtain optimal SVM model;Otherwise training set optimization SVM model is continued with;
The cause of accident dismantling to be sorted that will acquire is character, and the character that dismantling obtains is mapped as multidimensional by the dictionary Matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident text feature, it is optimal SVM model accident text classification result is obtained by the accident text feature.
2. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1 In the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond layer M, full articulamentum It is F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
3. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1 In, it is described using the accident text feature extracted training SVM model, until the SVM model after being optimized, comprising:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gauss Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
Pass through grid data service in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion The optimized parameter of SVM model is obtained, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and consolidates Earnest, two non-motor vehicles, non-motor vehicle and pedestrian.
4. the traffic accident file classification method based on character level neural network and SVM, feature exist as claimed in claim 3 In if the classification results error for judging the SVM model output obtains optimal SVM model less than preset value;Otherwise Continue with training set optimization SVM model, comprising:
Define confusion matrix;
Each cause of accident and corresponding label in the classification results and test set exported according to the confusion matrix and SVM model, Calculate accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model; Otherwise training set optimization SVM model is continued with.
5. the traffic accident file classification method based on character level neural network and SVM, feature exist as described in claim 1 In the cause of accident dismantling to be sorted that will acquire is character, is mapped as the character that dismantling obtains by the dictionary Multi-dimensional matrix, comprising:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, using in dictionary Each character maps to obtain a M dimensional vector, and each character in cause of accident to be sorted is mapped to the M dimensional vector In, obtain a N*M dimension matrix.
6. a kind of traffic accident Text Classification System based on character level neural network and SVM, which is characterized in that described to be based on word Symbol grade neural network and the traffic accident Text Classification System of SVM include:
Corpus processing module, for obtaining cause of accident corpus, division obtains training set and test set, and by the cause of accident Corpus dismantling is character, constructs dictionary;
Training module optimizes the character level neural network using the training set for establishing character level neural network model Model is extracted the accident text feature of training intensive data using the character level neural network model after optimization, and utilizes extraction Accident text feature training SVM model, until the SVM model after being optimized;
Test module, for extracting the accident text of the test intensive data using the character level neural network model after optimization Feature, and by the SVM model after accident text feature input optimization, if judging, the classification results of the SVM model output are missed Difference is less than preset value, then obtains optimal SVM model;Otherwise training set optimization SVM model is continued with;
Categorization module, the cause of accident dismantling to be sorted for will acquire is character, is obtained dismantling by the dictionary Character is mapped as multi-dimensional matrix, and the multi-dimensional matrix is input in the character level neural network model after optimization and extracts accident Text feature, optimal SVM model obtain accident text classification result by the accident text feature.
7. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 6 In the character level neural network model are as follows: since input layer I, successively pass through convolutional layer C, pond layer M, full articulamentum It is F1, articulamentum F2, Softmax layers complete;
The accident text feature is the feature of full articulamentum F1 output.
8. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 6 In the training module is using the accident text feature training SVM model extracted, until the SVM model after being optimized, executes Following operation:
The accident text feature is transformed into higher dimensional space by kernel function and carries out linear partition, the kernel function uses Gauss Core:
Wherein, xiFor accident text feature sample, x is kernel function center, the width parameter of sigma function;
Pass through grid data service in conjunction with the label of each cause of accident in the training set using the accident text feature after conversion The optimized parameter of SVM model is obtained, to complete the optimization of SVM model;
The label of the cause of accident includes: two motor vehicles, motor vehicle and non-motor vehicle, motor vehicle and pedestrian, motor vehicle and consolidates Earnest, two non-motor vehicles, non-motor vehicle and pedestrian.
9. the traffic accident Text Classification System based on character level neural network and SVM, feature exist as claimed in claim 8 In if the test module judges that the classification results error of the SVM model output less than preset value, obtains optimal SVM Model;Otherwise training set optimization SVM model is continued with, is performed the following operations:
Define confusion matrix;
Each cause of accident and corresponding label in the classification results and test set exported according to the confusion matrix and SVM model, Calculate accuracy rate and recall rate;
If accuracy rate > 95% and recall rate > 0.9, terminate to train and export current SVM model as optimal SVM model; Otherwise training set optimization SVM model is continued with.
10. the traffic accident Text Classification System based on character level neural network and SVM as claimed in claim 6, feature It is, the cause of accident dismantling to be sorted that the categorization module will acquire is character, is obtained dismantling by the dictionary Character is mapped as multi-dimensional matrix, performs the following operations:
Cause of accident to be sorted is enabled to disassemble to obtain N number of character, the number of characters for including in the dictionary is M, using in dictionary Each character maps to obtain a M dimensional vector, and each character in cause of accident to be sorted is mapped to the M dimensional vector In, obtain a N*M dimension matrix.
CN201910334271.3A 2019-04-24 2019-04-24 Traffic accident file classification method and system based on character level neural network and SVM Pending CN110110085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910334271.3A CN110110085A (en) 2019-04-24 2019-04-24 Traffic accident file classification method and system based on character level neural network and SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910334271.3A CN110110085A (en) 2019-04-24 2019-04-24 Traffic accident file classification method and system based on character level neural network and SVM

Publications (1)

Publication Number Publication Date
CN110110085A true CN110110085A (en) 2019-08-09

Family

ID=67486560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910334271.3A Pending CN110110085A (en) 2019-04-24 2019-04-24 Traffic accident file classification method and system based on character level neural network and SVM

Country Status (1)

Country Link
CN (1) CN110110085A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291552A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for correcting text content
CN111599170A (en) * 2020-04-13 2020-08-28 浙江工业大学 Traffic running state classification method based on time sequence traffic network diagram
CN112115965A (en) * 2020-08-04 2020-12-22 西安交通大学 SVM-based passive operating system identification method, storage medium and equipment
CN112395528A (en) * 2019-08-13 2021-02-23 阿里巴巴集团控股有限公司 Text label distinguishing method and device, electronic equipment and storage medium
CN113341894A (en) * 2021-05-27 2021-09-03 河钢股份有限公司承德分公司 Accident rule data generation method and device and terminal equipment
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents
CN114157411A (en) * 2021-11-29 2022-03-08 中信数智(武汉)科技有限公司 Grouping encryption identification method based on LeNet5-SVM

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN107895000A (en) * 2017-10-30 2018-04-10 昆明理工大学 A kind of cross-cutting semantic information retrieval method based on convolutional neural networks
CN108710967A (en) * 2018-04-19 2018-10-26 东南大学 Expressway traffic accident Severity forecasting method based on data fusion and support vector machines
CN108920586A (en) * 2018-06-26 2018-11-30 北京工业大学 A kind of short text classification method based on depth nerve mapping support vector machines
CN109446332A (en) * 2018-12-25 2019-03-08 银江股份有限公司 A kind of people's mediation case classification system and method based on feature migration and adaptive learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN107895000A (en) * 2017-10-30 2018-04-10 昆明理工大学 A kind of cross-cutting semantic information retrieval method based on convolutional neural networks
CN108710967A (en) * 2018-04-19 2018-10-26 东南大学 Expressway traffic accident Severity forecasting method based on data fusion and support vector machines
CN108920586A (en) * 2018-06-26 2018-11-30 北京工业大学 A kind of short text classification method based on depth nerve mapping support vector machines
CN109446332A (en) * 2018-12-25 2019-03-08 银江股份有限公司 A kind of people's mediation case classification system and method based on feature migration and adaptive learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
QU HUA.ET.L: "A Character-Level Method for Text Classification", 《2018 2ND IEEE ADVANCED INFORMATION MANAGEMENT,COMMUNICATES,ELECTRONIC AND AUTOMATION CONTROL CONFERENCE》 *
YULING CHEN.ET.L: "Research on text sentiment analysis based on CNNs and SVM", 《2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA)》 *
ZHIQUAN WANG.ET.L: "Research on Web text classification algorithm based on improved CNN and SVM", 《2017 IEEE 17TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》 *
刘敬学等: "字符级卷积神经网络短文本分类算法", 《计算机工程与应用》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395528A (en) * 2019-08-13 2021-02-23 阿里巴巴集团控股有限公司 Text label distinguishing method and device, electronic equipment and storage medium
CN112395528B (en) * 2019-08-13 2022-10-21 阿里巴巴集团控股有限公司 Text label distinguishing method and device, electronic equipment and storage medium
CN111599170A (en) * 2020-04-13 2020-08-28 浙江工业大学 Traffic running state classification method based on time sequence traffic network diagram
CN111599170B (en) * 2020-04-13 2021-12-17 浙江工业大学 Traffic running state classification method based on time sequence traffic network diagram
CN111291552A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for correcting text content
CN112115965A (en) * 2020-08-04 2020-12-22 西安交通大学 SVM-based passive operating system identification method, storage medium and equipment
CN113341894A (en) * 2021-05-27 2021-09-03 河钢股份有限公司承德分公司 Accident rule data generation method and device and terminal equipment
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents
CN114157411A (en) * 2021-11-29 2022-03-08 中信数智(武汉)科技有限公司 Grouping encryption identification method based on LeNet5-SVM
CN114157411B (en) * 2021-11-29 2024-04-05 中信数智(武汉)科技有限公司 LeNet 5-SVM-based packet encryption identification method

Similar Documents

Publication Publication Date Title
CN110110085A (en) Traffic accident file classification method and system based on character level neural network and SVM
CN110298037A (en) The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
CN107273490B (en) Combined wrong question recommendation method based on knowledge graph
CN107133960A (en) Image crack dividing method based on depth convolutional neural networks
CN106156003B (en) A kind of question sentence understanding method in question answering system
CN105975931A (en) Convolutional neural network face recognition method based on multi-scale pooling
CN110189334A (en) The medical image cutting method of the full convolutional neural networks of residual error type based on attention mechanism
CN109299262A (en) A kind of text implication relation recognition methods for merging more granular informations
CN106570513A (en) Fault diagnosis method and apparatus for big data network system
CN106650725A (en) Full convolutional neural network-based candidate text box generation and text detection method
CN107578106A (en) A kind of neutral net natural language inference method for merging semanteme of word knowledge
CN108009285A (en) Forest Ecology man-machine interaction method based on natural language processing
CN106934352A (en) A kind of video presentation method based on two-way fractal net work and LSTM
CN109992779A (en) A kind of sentiment analysis method, apparatus, equipment and storage medium based on CNN
CN110490242A (en) Training method, eye fundus image classification method and the relevant device of image classification network
CN110517790B (en) Compound hepatotoxicity early prediction method based on deep learning and gene expression data
Montalbo et al. Classification of fish species with augmented data using deep convolutional neural network
CN112990296A (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
CN105740227A (en) Genetic simulated annealing method for solving new words in Chinese segmentation
CN106372630A (en) Face direction detection method based on deep learning
CN107578092A (en) A kind of emotion compounding analysis method and system based on mood and opinion mining
CN106970981A (en) A kind of method that Relation extraction model is built based on transfer matrix
CN110009030A (en) Sewage treatment method for diagnosing faults based on stacking meta learning strategy
CN113947161A (en) Attention mechanism-based multi-label text classification method and system
CN109918649A (en) A kind of suicide Risk Identification Method based on microblogging text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200424

Address after: 314501 room 116, floor 1, building 2, No. 87 Hexi, Changfeng street, Wuzhen Town, Tongxiang City, Jiaxing City, Zhejiang Province

Applicant after: Zhejiang Haikang Zhilian Technology Co.,Ltd.

Address before: Yuhang District, Hangzhou City, Zhejiang Province, 311121 West No. 1500 Building 1 room 311

Applicant before: CETHIK GROUP Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190809