CN110134963A - A kind of text mining is applied to the method for road traffic accident data processing - Google Patents

A kind of text mining is applied to the method for road traffic accident data processing Download PDF

Info

Publication number
CN110134963A
CN110134963A CN201910418287.2A CN201910418287A CN110134963A CN 110134963 A CN110134963 A CN 110134963A CN 201910418287 A CN201910418287 A CN 201910418287A CN 110134963 A CN110134963 A CN 110134963A
Authority
CN
China
Prior art keywords
data
text
accident
traffic accident
road traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910418287.2A
Other languages
Chinese (zh)
Inventor
黄合来
周汉楚
潘震宇
张子钰
秦炜志
张馨尹
丁雨童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201910418287.2A priority Critical patent/CN110134963A/en
Publication of CN110134963A publication Critical patent/CN110134963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of methods that text mining is applied to road traffic accident data processing, Chinese word segmentation is carried out to road traffic accident data sample, by word incorporation model by the sample data set three-dimensional vector, large-scale text categorization network TextCNN network struction model is built by neural network CNN again, exports crucial traffic information;The present invention is based on natural language processing techniques to handle traffic accident recording text, integrated use python and c++ language development Accident-causing repair system again, it can automate, mass disposal casualty data record, casualty data quality is effectively improved, the Accident-causing repair system is easy to operate, processing is efficient, information is intuitive, causation analysis is accurate;Abundant applicating text data simultaneously facilitate it in the use established in model process.Cost of manufacture of the invention is lower, effectively overcomes the shortcomings of China's road traffic accident structured record, and can efficiently and accurately repair traffic accident data.

Description

A kind of text mining is applied to the method for road traffic accident data processing
Technical field
The present invention relates to road accident processing technology fields more particularly to a kind of text mining to be applied to road traffic accident The method of data processing.
Background technique
Under the promotion of traffic power strategy, China's road traffic has entered the transition developed from rapid growth to high quality Phase, traffic safety problem receive much attention and pay attention to.And traffic accident data are the core data sources of traffic safety research, are road Road safety improvement provides basic information support.In recent years, public safety traffic management integrated application platform (" six directions one " platform) is complete Face application effectively increases the level of IT application of traffic accident treatment and history accident archive.However according to the Chinese people in 2014 Republic's road traffic statistics annual report is shown, in China's Mainland accident main cause statistics, the indefinite motor vehicle of Accident-causing Illegal activities are shown as " behavior of other influences safety " and occupy 43% in cause of accident, this significantly impacts accident number According to availability, decline the accuracy for analyzing and improving China's road traffic safety situation and effect sharply.Compare Hong Kong fortune The Hong-Kong cause of accident statistics of defeated administration's publication, the illegal activities of " other " classification only account for 13%.It is worth noting that, " six The accident text description that traffic police's record when accident occurs is stored in unification " platform, has recorded the scene of accident generation in detail.So And due to the non-structured feature of natural language, the effective information that contains in accident text cannot direct batch extracting, it is difficult to quilt It is included in road safety data data system.
In previous traffic accident research, the incomplete problem of casualty data is coped with, method that there are mainly two types of: is 1. answered Reduced and joined caused by data information incomplete recording by flexible extensive parameter with the statistical method that can solve data heterogeneity Number estimated bias, including stochastic parameter model, latent variable model, multi-factor structure model and introducing space structure etc..Such side The advantage of method is that treatment process is simple, without collecting data again, but is difficult to complete correction missing information, can not quantify to lack The influence of information, and analysis result has biggish dependence to model selection;2. the investigation of accident depth is unfolded, utilization is retrievable Video data, Trace Data witness the original materials such as data, using accident reconstruction technology, deeply probe into accident genesis mechanism.It should The advantage of class method is can be completely restored to accident occurrence scene, but be only adapted to that initial data is complete and small sample analysis of cases.
The duration of traffic events (congestion, accident etc.) is carried out in advance currently, having a small amount of research and utilization text mining It surveys, but not yet carries out the relevant rudimentaries Journal of Sex Research works such as the traffic accident information based on text mining is analyzed and the quality of data improves Make.
To sum up, the quality of traffic accident data is in urgent need to be improved, and the existing imperfect solution to the problem of data that is directed to has Repair ability is unreliable, relies on single accident is difficult to the short slabs such as the excessive initial data all found out.And " six directions one " platform The accident of middle storage, which describes text data, to be used effectively.Therefore, applicating text digging technology proposed by the present invention repairs thing Therefore data will become the completely new effective solution for solving the problems, such as to repair casualty data.
Summary of the invention
The present invention is directed at least solve the technical problems existing in the prior art.For this purpose, the invention discloses a kind of texts The method for being applied to road traffic accident data processing is excavated, Chinese word segmentation is carried out to road traffic accident data sample, is passed through Word incorporation model builds large-scale text categorization network by the sample data set three-dimensional vector, then by neural network CNN TextCNN network struction model exports crucial traffic information.
Further, described includes: in open source library jieba sheet to road traffic accident data sample progress Chinese word segmentation On the basis of the pervasive corpus of body, according to scene feature, import traffic safety corpus as customized dictionary, to sample into Row participle, then stop words is removed, leave out and sentence the unallied text of duty, enhances ambiguity error correcting capability.
Further, described to further comprise by the sample data set three-dimensional vector by word incorporation model: root The corresponding row vector of each of the phrase of text word spell up and down by the characteristics of according to road traffic accident scene short text It connects, the text is indicated with a two-dimensional matrix (x, y), and the feature of each word saves in a matrix, and all texts are turned After turning to two-dimensional matrix, multiple text planes are stacked, splicing up and down are carried out in a vertical direction, with a stereoscopic three-dimensional matrix (x, y, z) indicates entire data set, if text number differs, with 0 filling, obtains the consistent three-dimensional matrice of structure with guarantee.
Further, described that large-scale text categorization network TextCNN network struction mould is built by neural network CNN Type further comprises: dividing to the data set, mark value y binaryzation is predicted, then at random by training pattern Change data, and divides training dataset and test data set by preset ratio;Use based on TensorFlow as rear end Keras frame is modeled, and building Conv1D convolutional layer, global maximum pond layer, Dropout prevent over-fitting and output;It is right Training dataset is after 200epoch training, then performance test data the set pair analysis model is tested, and modelling effect, precision are obtained And error.
Further, described to be by preset ratio division training dataset and test data set, preset ratio therein 8:2.
Further, cause of accident is divided into five classes, status consideration, turns to factor, four class of spacing factor at speed factor Reason is set as model classification, and the not yet apparent data that will classify are set as " other ", is first " other " by classification before model training Data be completely drawn out, after the completion of with training set to model training, will classification not yet apparent data import model carry out it is pre- It surveys, obtains the output classification of model judgement.
Further, data visualization is carried out, the geographic region for showing traffic accident frequency with different highlighted fashions is generated Domain figure, visual representation traffic accident big data analysis result.
Further, in Excel with correlation function extract every accident record in detail in place where the accident occurred Column, then the table is imported into Tableau as the casualty data table to connect with geographic information database, Tableau is by basis The geographic information database that accident source data table and Tableau are carried is attached, and obtains traffic accident property loss and accident Scale area distribution thermodynamic chart.
Further, traffic accident causation repair system is developed under java environment, it will be in traffic accident text entry Key message extract, repair assert reason be " other " Accident-causing;Retrieval traffic accident text note in systems Record, system will carry out structuring processing, final output traffic key message to accident text entry.
Further, the traffic key message includes: final classification, traffic injury time, Pilot Name, license plate Number, type of vehicle, place where the accident occurred point, loss, whether state is related.
It is limited to traditional technology, traffic accident text entry is as height unstructured data, it is difficult to efficiently use.With it is existing There is technology to compare, invention handles traffic accident recording text, then integrated use python and c++ based on natural language processing technique Language development Accident-causing repair system can automate, mass disposal casualty data record, effectively improve casualty data matter Amount, the Accident-causing repair system is easy to operate, processing is efficient, information is intuitive, causation analysis is accurate.
For the particularity that Chinese word segmentation cannot be identified by space or other punctuates, in open source library jieba language itself On the basis of expecting library, the traffic safety corpus built according to relevant laws and regulations and implementing regulations, as customized dictionary, Again to sample Chinese word segmentation and removal stop words, ambiguity error correcting capability can be enhanced, guarantee participle accuracy.
By word2vec model by text vector, entire data set is indicated with three-dimensional matrice, and can be by each word Feature preferably save the limitation for breaching the unstructured feature of text data in a matrix, abundant applicating text data are simultaneously It is facilitated in the use established in model process.
Cost of manufacture of the invention is lower, effectively overcomes the shortcomings of China's road traffic accident structured record, and can Efficiently and accurately repair traffic accident data.Meanwhile this system is that domestic applicating text digging technology for the first time repairs casualty data, The text analysis technique of application has reached advanced international standard.
Detailed description of the invention
From following description with reference to the accompanying drawings it will be further appreciated that the present invention.Component in figure is not drawn necessarily to scale, But it focuses on and shows in the principle of embodiment.In the figure in different views, identical appended drawing reference is specified to be corresponded to Part.
Fig. 1 is sample stereoscopic three-dimensional matrix data collection figure of the invention;
Fig. 2 is the accident scale area distribution thermodynamic chart in one embodiment of the invention;
Fig. 3 is the exemplary diagram at Accident-causing repair system processing interface in one embodiment of the invention;
Fig. 4 is Accident-causing repair system output interface exemplary diagram in one embodiment of the invention.
Specific embodiment
Embodiment one
In the present embodiment, processing and model construction are carried out to data first, the present invention uses python, as subject Speech carries out Chinese word segmentation to sample using open source library jieba, then relies on word2vec model, by data set three-dimensional vector, most Convolutional neural networks CNN is used afterwards, takes out TextCNN network, implementation model.
1.1 Chinese word segmentation
As shown in Figure 1, Chinese word segmentation, which refers to, is reassembled into word sequence according to the specification for extracting special term for chinese character sequence Process, wherein removal stop words as committed step, refer in information retrieval, processing natural language data when filter out certain Word to this text data without physical meaning, character and word a bit save memory space with this and improve search efficiency.
On the basis of the pervasive corpus for the library jieba that increases income itself, according to scene feature, traffic safety corpus is imported As customized dictionary, sample is segmented, then removes stop words, leave out many and sentences the unallied text of duty, enhances discrimination Adopted error correcting capability.
It is concentrated in initial data, field ' x ' is the description of original casualty data merit, and field ' y ' is finally determining new point Class classification, field ' split ' is the result after participle.
1.2 construct three-dimensional vector data collection
Raw data set is conventional two-dimensional table, one-dimensional representation feature, two-dimensional representation data bulk, which can only carry out Vector equalization represents entire text with row vector, characteristic attribute abstracts, therefore ineffective.
According to the characteristic of this scene short text, if by the corresponding row vector of each of the phrase of one section of text word into Row splicing up and down, then this section of text just can be used a matrix (x, y) to indicate, and the feature of each word can be saved preferably In a matrix, effect is more preferable compared to equalization.
Due to including multistage text in data set, it also needs to increase a dimension z, if each text is used one A plane indicates, i.e., after converting two-dimensional matrix for all texts, then carries out splicing up and down in a vertical direction, i.e., by multiple texts Plane is stacked, then can be used a stereoscopic three-dimensional matrix (x, y, z) to indicate entire data set, is filled out if text number differs, available 0 It fills, to guarantee to obtain the consistent three-dimensional matrice of structure.
1.3 establish model
Data set is divided, then mark value y binaryzation training pattern and is predicted, then randomization data, and press The ratio cut partition training dataset and test data set of 8:2.Use and is carried out based on TensorFlow as the Keras frame of rear end Modeling.Building Conv1D convolutional layer, global maximum pond layer, Dropout prevent over-fitting and output.Training dataset is passed through After 200epoch training, then performance test data the set pair analysis model is tested, and modelling effect, precision and error are obtained.
Cause of accident is divided into five classes by the present invention, and the 1st class is " other ", that is, not yet apparent data of classifying, therefore will be left 2 (status considerations), 3 (speed factors), 4 (turn to factors), 5 (spacing factor) four class reasons are set as model classification.Model It is first that 1 1/3 data for accounting for about cause of accident total data are completely drawn out by classification, with training the set pair analysis model instruction before training After the completion of white silk, 1 class data importing model is predicted, the output classification of model judgement can be obtained, on inspection, test effect Fruit accuracy is higher.
2 data visualizations
As shown in Fig. 2, the present embodiment by using Tableau to 2012 to 2018 Hunan Province's highways 7 with Upper bus and coach accident carries out data visualization, generates the geography that each city's traffic accident frequency in Hunan Province is shown with different highlighted fashions Administrative division map, visual representation traffic accident big data analysis result.
The data source of visualized operation is the 7 years traffic accidents recording text data in Hunan Province, is transported in Excel With correlation function extract every accident record in detail in arranged for information about to place where the accident occurred with the city under the jurisdiction of the provincial government, Hunan Province, then should Table imported into Tableau as the casualty data table to connect with geographic information database, and Tableau will be according to accident source data The geographic information database that table and Tableau are carried is attached, and obtains Hunan Province's traffic accident property loss and accident scale Area distribution thermodynamic chart.
, can be with apparent Hunan Province's traffic accident high-incidencely by shown in Fig. 2, future will further to the ground, accident be caused Because carrying out classification analysis, constructs and identify the high-risk scene of the traffic accident of different regions, and pointedly plan and improve the ground Means of transportation.
The exploitation of 3 Accident-causing repair systems
Traffic accident causation repair system is developed at system front end interface shown in as shown in Figure 3-4 under java environment, real Now by natural language text structuring function, the key message in traffic accident text entry is extracted, repairs and assert original Because of the Accident-causing of " other ".Traffic accident text entry is retrieved in systems, and system will solve accident text entry It releases, i.e. structuring is handled, final exportable " final classification ", " traffic injury time ", " Pilot Name ", " license plate number ", " vehicle The key messages such as type ", " place where the accident occurred point ", " loss ", " whether state is related ".The system can be in history casualty data In there are casualty data reason is accurately repaired in the incomplete situation of reason, improve the quality of data, and can be advantageously applied to locate The reason of traffic accident data from now on is managed, data integrity and availability are enhanced.
Classified and can be obtained by the statistics of road traffic accident statistics annual report reason main for traffic accident, caseload Most reasons be do not give way, drive without a license by regulation, driving when intoxicated, illegal meeting etc., but can be seen by further analysis Out, part of reason (drive without a license, drive when intoxicated, fatigue driving) is only driving condition factor, is not that accident is caused to send out Raw direct reason.In this way, the direct reason of accident and driving condition factor then will lead to analysis on accident cause, hand over the case where mixing Logical safety improvement work lacks reliability.
Embodiment two
A kind of text mining is applied to the method for road traffic accident data processing, to road traffic accident data sample into Row Chinese word segmentation by word incorporation model by the sample data set three-dimensional vector, then by neural network CNN builds big rule Mould text classification network TextCNN network struction model exports crucial traffic information.
Further, described includes: in open source library jieba sheet to road traffic accident data sample progress Chinese word segmentation On the basis of the pervasive corpus of body, according to scene feature, import traffic safety corpus as customized dictionary, to sample into Row participle, then stop words is removed, leave out and sentence the unallied text of duty, enhances ambiguity error correcting capability.
Further, described to further comprise by the sample data set three-dimensional vector by word incorporation model: root The corresponding row vector of each of the phrase of text word spell up and down by the characteristics of according to road traffic accident scene short text It connects, the text is indicated with a two-dimensional matrix (x, y), and the feature of each word saves in a matrix, and all texts are turned After turning to two-dimensional matrix, multiple text planes are stacked, splicing up and down are carried out in a vertical direction, with a stereoscopic three-dimensional matrix (x, y, z) indicates entire data set, if text number differs, with 0 filling, obtains the consistent three-dimensional matrice of structure with guarantee.
Further, described that large-scale text categorization network TextCNN network struction mould is built by neural network CNN Type further comprises: dividing to the data set, mark value y binaryzation is predicted, then at random by training pattern Change data, and divides training dataset and test data set by preset ratio;Use based on TensorFlow as rear end Keras frame is modeled, and building Conv1D convolutional layer, global maximum pond layer, Dropout prevent over-fitting and output;It is right Training dataset is after 200epoch training, then performance test data the set pair analysis model is tested, and modelling effect, precision are obtained And error.
Further, described to be by preset ratio division training dataset and test data set, preset ratio therein 8:2.
Further, cause of accident is divided into five classes, status consideration, turns to factor, four class of spacing factor at speed factor Reason is set as model classification, and the not yet apparent data that will classify are set as " other ", is first " other " by classification before model training Data be completely drawn out, after the completion of with training set to model training, will classification not yet apparent data import model carry out it is pre- It surveys, obtains the output classification of model judgement.
Further, data visualization is carried out, the geographic region for showing traffic accident frequency with different highlighted fashions is generated Domain figure, visual representation traffic accident big data analysis result.
Further, in Excel with correlation function extract every accident record in detail in place where the accident occurred Column, then the table is imported into Tableau as the casualty data table to connect with geographic information database, Tableau is by basis The geographic information database that accident source data table and Tableau are carried is attached, and obtains traffic accident property loss and accident Scale area distribution thermodynamic chart.
Further, traffic accident causation repair system is developed under java environment, it will be in traffic accident text entry Key message extract, repair assert reason be " other " Accident-causing;Retrieval traffic accident text note in systems Record, system will carry out structuring processing, final output traffic key message to accident text entry.
Further, the traffic key message includes: final classification, traffic injury time, Pilot Name, license plate Number, type of vehicle, place where the accident occurred point, loss, whether state is related.
In the present embodiment, the method proposes reconstruct Variational Design exploratoryly, using model and text handling method, also The direct reason of accident of the original under different conditions, final common recognition not Chu 9 kinds of states such as naturally in poor shape, fatigue driving, and be The accident that reason is judged as " status consideration " before finds the direct reason of accident and counts, to differentiate the direct of accident Reason and status consideration reduce Biased estimator, effectively improve safety analysis result robustness.Traffic Safety Analysis work can be in shape On the basis of state statistical analysis, accurate assurance is made to driving behavior of the driver under different conditions, to take reasonable measure The generation of reduction accident.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
Although describing the present invention by reference to various embodiments above, but it is to be understood that of the invention not departing from In the case where range, many changes and modifications can be carried out.Therefore, be intended to foregoing detailed description be considered as it is illustrative and It is unrestricted, and it is to be understood that following following claims (including all equivalents) is intended to limit spirit and model of the invention It encloses.The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.It is reading After the content of record of the invention, technical staff can be made various changes or modifications the present invention, these equivalence changes and Modification equally falls into the scope of the claims in the present invention.

Claims (10)

1. a kind of text mining is applied to the method for road traffic accident data processing, which is characterized in that road traffic accident Data sample carries out Chinese word segmentation, by word incorporation model by the sample data set three-dimensional vector, then passes through neural network CNN builds large-scale text categorization network TextCNN network struction model, exports crucial traffic information.
2. a kind of text mining as described in claim 1 is applied to the method for road traffic accident data processing, feature exists In described includes: in the pervasive corpus of library jieba itself of increasing income to road traffic accident data sample progress Chinese word segmentation On the basis of, it according to scene feature, imports traffic safety corpus and sample is segmented as customized dictionary, then remove and stop Word is left out and is sentenced the unallied text of duty, enhances ambiguity error correcting capability.
3. a kind of text mining as claimed in claim 2 is applied to the method for road traffic accident data processing, feature exists In described to further comprise by the sample data set three-dimensional vector by word incorporation model: according to road traffic accident field The corresponding row vector of each of the phrase of text word is carried out splicing up and down by the characteristics of scape short text, and the text is with one A two-dimensional matrix (x, y) indicates that the feature of each word saves in a matrix, after converting two-dimensional matrix for all texts, Multiple text planes are stacked, carry out splicing up and down in a vertical direction, indicate entire number with a stereoscopic three-dimensional matrix (x, y, z) According to collection, if text number differs, with 0 filling, the consistent three-dimensional matrice of structure is obtained with guarantee.
4. a kind of text mining as claimed in claim 3 is applied to the method for road traffic accident data processing, feature exists In described to build large-scale text categorization network TextCNN network struction model by neural network CNN and further comprise: right The data set is divided, and mark value y binaryzation is predicted, then randomization data by training pattern, and by default Ratio cut partition training dataset and test data set;Use and modeled based on TensorFlow as the Keras frame of rear end, Building Conv1D convolutional layer, global maximum pond layer, Dropout prevent over-fitting and output;Training dataset is passed through After 200epoch training, then performance test data the set pair analysis model is tested, and modelling effect, precision and error are obtained.
5. a kind of text mining as claimed in claim 4 is applied to the method for road traffic accident data processing, feature exists In described to divide training dataset and test data set by preset ratio, preset ratio therein is 8:2.
6. a kind of text mining as claimed in claim 5 is applied to the method for road traffic accident data processing, feature exists In cause of accident being divided into five classes, status consideration, speed factor, steering factor, four class reason of spacing factor are set as model class Not, the not yet apparent data that will classify are set as " other ", before model training, are first completely drawn out the data that classification is " other ", After the completion of with training set to model training, the not yet apparent data of classification are imported into model and are predicted, model is obtained and sentences Disconnected output classification.
7. a kind of text mining as claimed in claim 6 is applied to the method for road traffic accident data processing, feature exists In progress data visualization generates the geographic area figure for showing traffic accident frequency with different highlighted fashions, visual representation traffic Accident big data analysis result.
8. a kind of text mining as claimed in claim 7 is applied to the method for road traffic accident data processing, feature exists In, in Excel with correlation function extract every accident record in detail in arrange with place where the accident occurred, then the table imported into Tableau as the casualty data table to be connect with geographic information database, Tableau will according to accident source data table and Tableau included geographic information database is attached, and obtains traffic accident property loss and accident scale area distribution heat Try hard to.
9. a kind of text mining as claimed in claim 8 is applied to the method for road traffic accident data processing, feature exists In exploitation traffic accident causation repair system, the key message in traffic accident text entry is mentioned under java environment It takes, repairs the Accident-causing for assert that reason is " other ";Traffic accident text entry is retrieved in systems, and system will be to accident text This record carries out structuring processing, final output traffic key message.
10. a kind of text mining as claimed in claim 9 is applied to the method for road traffic accident data processing, feature exists In the traffic key message includes: final classification, traffic injury time, Pilot Name, license plate number, type of vehicle, accident Scene, loss, whether state is related.
CN201910418287.2A 2019-05-20 2019-05-20 A kind of text mining is applied to the method for road traffic accident data processing Pending CN110134963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910418287.2A CN110134963A (en) 2019-05-20 2019-05-20 A kind of text mining is applied to the method for road traffic accident data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910418287.2A CN110134963A (en) 2019-05-20 2019-05-20 A kind of text mining is applied to the method for road traffic accident data processing

Publications (1)

Publication Number Publication Date
CN110134963A true CN110134963A (en) 2019-08-16

Family

ID=67571364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910418287.2A Pending CN110134963A (en) 2019-05-20 2019-05-20 A kind of text mining is applied to the method for road traffic accident data processing

Country Status (1)

Country Link
CN (1) CN110134963A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807930A (en) * 2019-11-07 2020-02-18 中国联合网络通信集团有限公司 Dangerous vehicle early warning method and device
CN111209472A (en) * 2019-12-24 2020-05-29 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway accident fault association and accident fault reason analysis method and system
CN111914687A (en) * 2020-07-15 2020-11-10 深圳民太安智能科技有限公司 Method for actively identifying accident based on Internet of vehicles
CN112364627A (en) * 2020-10-23 2021-02-12 北京建筑大学 Safety production accident analysis method and device based on text mining, electronic equipment and storage medium
CN112732744A (en) * 2021-01-12 2021-04-30 重庆长安汽车股份有限公司 Method for efficiently processing CIDAS database based on Tcl/Tk and R languages
CN113470357A (en) * 2021-06-30 2021-10-01 中国汽车工程研究院股份有限公司 Road traffic accident information processing system and method
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents
CN114999161A (en) * 2022-07-29 2022-09-02 河北博士林科技开发有限公司 Be used for intelligent traffic jam edge management system
CN115100861A (en) * 2022-06-22 2022-09-23 公安部交通管理科学研究所 Drunk driving vehicle identification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN108280173A (en) * 2018-01-22 2018-07-13 深圳市和讯华谷信息技术有限公司 A kind of key message method for digging, medium and the equipment of non-structured text
CN109410588A (en) * 2018-12-20 2019-03-01 湖南晖龙集团股份有限公司 A kind of traffic accident evolution analysis method based on traffic big data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN108280173A (en) * 2018-01-22 2018-07-13 深圳市和讯华谷信息技术有限公司 A kind of key message method for digging, medium and the equipment of non-structured text
CN109410588A (en) * 2018-12-20 2019-03-01 湖南晖龙集团股份有限公司 A kind of traffic accident evolution analysis method based on traffic big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PZYSEERE: "利用word2vec、textCNN、jieba对事故文本多分类及致因修复(三维向量)", 《HTTP://C.360WEBCACHE.COM》 *
韦凌翔 等: "诱发道路交通事故的关键因子分析方法研究", 《交通信息与安全》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807930A (en) * 2019-11-07 2020-02-18 中国联合网络通信集团有限公司 Dangerous vehicle early warning method and device
CN111209472A (en) * 2019-12-24 2020-05-29 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway accident fault association and accident fault reason analysis method and system
CN111209472B (en) * 2019-12-24 2023-08-18 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway accident fault association and accident fault cause analysis method and system
CN111914687A (en) * 2020-07-15 2020-11-10 深圳民太安智能科技有限公司 Method for actively identifying accident based on Internet of vehicles
CN111914687B (en) * 2020-07-15 2023-11-17 深圳民太安智能科技有限公司 Method for actively identifying accidents based on Internet of vehicles
CN112364627A (en) * 2020-10-23 2021-02-12 北京建筑大学 Safety production accident analysis method and device based on text mining, electronic equipment and storage medium
CN112364627B (en) * 2020-10-23 2023-07-25 北京建筑大学 Text mining-based safety production accident analysis method and device, electronic equipment and storage medium
CN112732744B (en) * 2021-01-12 2023-03-14 重庆长安汽车股份有限公司 Method for efficiently processing CIDAS database based on Tcl/Tk and R languages
CN112732744A (en) * 2021-01-12 2021-04-30 重庆长安汽车股份有限公司 Method for efficiently processing CIDAS database based on Tcl/Tk and R languages
CN113470357A (en) * 2021-06-30 2021-10-01 中国汽车工程研究院股份有限公司 Road traffic accident information processing system and method
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents
CN115100861A (en) * 2022-06-22 2022-09-23 公安部交通管理科学研究所 Drunk driving vehicle identification method
CN114999161B (en) * 2022-07-29 2022-10-28 河北博士林科技开发有限公司 Be used for intelligent traffic jam edge management system
CN114999161A (en) * 2022-07-29 2022-09-02 河北博士林科技开发有限公司 Be used for intelligent traffic jam edge management system

Similar Documents

Publication Publication Date Title
CN110134963A (en) A kind of text mining is applied to the method for road traffic accident data processing
CN110019396B (en) Data analysis system and method based on distributed multidimensional analysis
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN110781254A (en) Automatic case knowledge graph construction method, system, equipment and medium
CN113656805B (en) Event map automatic construction method and system for multi-source vulnerability information
Banchs Text mining with MATLAB®
CN106202514A (en) Accident based on Agent is across the search method of media information and system
CN103049532A (en) Method for creating knowledge base engine on basis of sudden event emergency management and method for inquiring knowledge base engine
CN114003791B (en) Depth map matching-based automatic classification method and system for medical data elements
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
US11841839B1 (en) Preprocessing and imputing method for structural data
CN110888943A (en) Method and system for auxiliary generation of court referee document based on micro-template
KR20100127036A (en) A method for providing idea maps by using classificaion in terms of viewpoints
CN103065009B (en) Intelligent design system and method of traffic sign lines
CN116010612A (en) River basin flood control knowledge graph construction method and device and electronic equipment
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN103793373B (en) Tracking relation recovery method based on syntax
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN106815320B (en) Investigation big data visual modeling method and system based on expanded three-dimensional histogram
Lesbegueries et al. Associating spatial patterns to text-units for summarizing geographic information
Terblanche et al. Ontology‐based employer demand management
Das et al. Transportation research record articles: A case study of trend mining
CN102436472B (en) Multi- category WEB object extract method based on relationship mechanism
CN113486676B (en) Geological entity semantic relation extraction method and device for geological text
CN113535810B (en) Mining method, device, equipment and medium for traffic violation objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816