CN110232188A - The Automatic document classification method of power grid user troublshooting work order - Google Patents
The Automatic document classification method of power grid user troublshooting work order Download PDFInfo
- Publication number
- CN110232188A CN110232188A CN201910480766.7A CN201910480766A CN110232188A CN 110232188 A CN110232188 A CN 110232188A CN 201910480766 A CN201910480766 A CN 201910480766A CN 110232188 A CN110232188 A CN 110232188A
- Authority
- CN
- China
- Prior art keywords
- text
- training
- power grid
- classification
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of Automatic document classification methods of power grid user troublshooting work order, the data of the text data set of power grid user troublshooting work order are segmented first, filter stop words and training term vector, training term vector uses Word2vec method, the distributed of Word2vec learning text indicates, by text vector, text is indicated using term vector;Term vector after training is loaded into progress feature extraction and model training, probability distribution of the final output on each label in convolutional neural networks and obtains classification results.Relative to traditional file classification method, accuracy rate significantly improves and greatly reduces the classification time.Meanwhile this method can help power grid department more reasonably to configure various emergency maintenance resources, greatly shorten the breakdown maintenance time, be conducive to the service quality for improving power grid department.
Description
Technical field
The present invention relates to a kind of Text Classification, in particular to a kind of text of power grid user troublshooting work order is automatic
Classification method.
Background technique
Recently as economic and information-based fast development, power grid scale is constantly expanding, and number of users is also continuous
Increase, the troublshooting amount of power grid user also shows an increasing trend year by year.Power grid user troublshooting work order is that record user comes
The important information of electricity feedback, these information are mainly presented in the form of Chinese text, and content is brief, and feature is sparse.Traditionally, power grid
Troublshooting work order relies primarily on artificial progress manual classification.But not only working efficiency is low for this processing mode, Er Qieji
It is error-prone.
Summary of the invention
The present invention be directed to current power grid departments to be difficult to the problem of reporting work order automatic processing and classification for repairment to user malfunction,
A kind of Automatic document classification method of power grid user troublshooting work order is proposed, realizes automatic classification.
The technical solution of the present invention is as follows: a kind of Automatic document classification method of power grid user troublshooting work order, specific to wrap
Include following steps:
1) based on the data preprocessing phase of word2vec: first to the text data set of power grid user troublshooting work order
Data segmented, filter stop words and training term vector, training term vector use Word2vec method, Word2vec
Practising the distributed of text indicates, by text vector, indicates text using term vector;
2) based on the text classification of convolutional neural networks: the term vector after training, which is loaded into convolutional neural networks, to carry out
Feature extraction and model training are classified using substituting into model to text data to be tested, classification accuracy rate, most
Classified automatically by the qualified convolutional neural networks of training to text eventually.
The specific implementation steps are as follows for the step 2)
(1) convolution: will in data prediction after training obtained term vector be input to the convolutional layers of convolutional neural networks into
Row feature extraction, convolutional layer obtain Feature Mapping by convolution algorithm, and the text for converting two-dimensional matrix form for term vector is special
Sign indicates;
(2) it pond: is operated in the text feature mapping that convolutional layer obtains using the pond k-max, captures each Feature Mapping
Obtained most important feature simplifies the output of convolutional layer;
(3) Softmax classifies: the feature that pondization is extracted is transmitted to softmax layers, most using softmax classifier calculated
Probability distribution of the output on each label eventually, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classified
Accuracy rate is up to standard, and the convolutional neural networks after training of judgement can directly apply to text classification, such as below standard, increases training data
Re-start training.
The beneficial effects of the present invention are: the Automatic document classification method of power grid user troublshooting work order of the present invention, phase
For traditional file classification method, accuracy rate significantly improves and greatly reduces the classification time.Meanwhile this method can help
Power grid department more reasonably configures various emergency maintenance resources, greatly shortens the breakdown maintenance time, is conducive to improve power grid department
Service quality.
Detailed description of the invention
Fig. 1 is that the present invention is based on the automatic text classification methods of the power grid user troublshooting work order of word2vec and CNN
Flow diagram;
Fig. 2 is that the present invention is based on CNN to carry out feature extraction and model training schematic diagram.
Specific embodiment
The automatic text classification method stream of power grid user troublshooting work order based on word2vec and CNN as shown in Figure 1
Journey schematic diagram, the method for the present invention realization include the following steps:
1, based on the data preprocessing phase of word2vec: Word2vec is the relevant mode that a group is used to generate term vector
Type is used to training with the word text of construction linguistics again.Text in this stage, first to power grid user troublshooting work order
The data of notebook data collection are segmented, and stop words and training term vector are filtered.Training term vector uses Word2vec method,
The distributed of Word2vec learning text indicates, by text vector, indicates text using term vector.
2, it is based on the text classification of CNN (Convolutional Neural Networks, convolutional neural networks).Training
Term vector afterwards, which is loaded into CNN, carries out feature extraction and model training, substitutes into model using to text data to be tested
In classify, classification accuracy rate classifies automatically to text eventually by the qualified CNN of training.
The Text Classification of troublshooting work order based on word2vec and CNN is the core entirely invented, traditional
Document representation method is indicated based on vector space model or one-hot.Vector dimension and word in dictionary in vector space model
Number it is linearly related, " dimension disaster " is easily caused as word number increases.Although and one-hot is simple but has ignored between word
Semantic dependency.Word2vec solves the problems, such as vector space model and one-hot, and the sparse feature vector of higher-dimension is reflected
The term vector dense for low-dimensional is penetrated, effectively prevents the generation of " dimension disaster ", and can directly calculate the semanteme between word
Correlation.Feature learning algorithm is using the deep learning algorithm CNN for obtaining marvelous results in field of image processing in recent years.CNN
Main target be extract text characteristic information, the algorithm can still be kept in the case where data volume is big preferably training effect
Fruit.
As shown in Fig. 2, the key step of progress feature extraction and model training based on CNN is as follows:
(1) convolution.The convolutional layer that term vector after training in data prediction is input to CNN is subjected to feature extraction, volume
Lamination obtains Feature Mapping by convolution algorithm, can convert term vector to the Text Representation of two-dimensional matrix form.
(2) pond.Using the pond k-max operation (maximizing) in the text feature mapping that convolutional layer obtains, capture is every
The most important feature that a Feature Mapping obtains simplifies the output of convolutional layer.
(3) Softmax classifies.The feature that pondization is extracted is transmitted to softmax layers, most using softmax classifier calculated
Probability distribution of the output on each label eventually, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classified
Accuracy rate is up to standard, and the CNN after training of judgement can directly apply to text classification, such as below standard, increases training data and re-starts
Training.
Claims (2)
1. a kind of Automatic document classification method of power grid user troublshooting work order, which is characterized in that specifically comprise the following steps:
1) based on the data preprocessing phase of word2vec: first to the number of the text data set of power grid user troublshooting work order
According to being segmented, stop words and training term vector are filtered, training term vector uses Word2vec method, Word2vec study text
The distributed of this indicates, by text vector, indicates text using term vector;
2) based on the text classification of convolutional neural networks: the term vector after training, which is loaded into convolutional neural networks, carries out feature
Extraction and model training are classified using substituting into model to text data to be tested, classification accuracy rate, final logical
The convolutional neural networks for crossing training qualification classify automatically to text.
2. the Automatic document classification method of power grid user troublshooting work order according to claim 1, which is characterized in that described
The specific implementation steps are as follows for step 2)
(1) convolution: the convolutional layer that the term vector obtained after training in data prediction is input to convolutional neural networks is carried out special
Sign is extracted, and convolutional layer obtains Feature Mapping by convolution algorithm, converts term vector in the text feature table of two-dimensional matrix form
Show;
(2) it pond: is operated in the text feature mapping that convolutional layer obtains using the pond k-max, captures each Feature Mapping and obtain
Most important feature, simplify convolutional layer output;
(3) Softmax classifies: the feature that pondization is extracted is transmitted to softmax layers, finally defeated using softmax classifier calculated
Probability distribution on each label out, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classification is accurate
Rate is up to standard, and the convolutional neural networks after training of judgement can directly apply to text classification, such as below standard, increases training data again
It is trained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480766.7A CN110232188A (en) | 2019-06-04 | 2019-06-04 | The Automatic document classification method of power grid user troublshooting work order |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480766.7A CN110232188A (en) | 2019-06-04 | 2019-06-04 | The Automatic document classification method of power grid user troublshooting work order |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110232188A true CN110232188A (en) | 2019-09-13 |
Family
ID=67859148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910480766.7A Pending CN110232188A (en) | 2019-06-04 | 2019-06-04 | The Automatic document classification method of power grid user troublshooting work order |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110232188A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178054A (en) * | 2019-12-05 | 2020-05-19 | 国网浙江省电力有限公司杭州供电公司 | Text processing method based on neural network language model vectorization |
CN111191529A (en) * | 2019-12-17 | 2020-05-22 | 中移(杭州)信息技术有限公司 | Method and system for processing abnormal work order |
CN111460164A (en) * | 2020-05-22 | 2020-07-28 | 南京大学 | Intelligent barrier judgment method for telecommunication work order based on pre-training language model |
CN111553807A (en) * | 2019-10-28 | 2020-08-18 | 国网辽宁省电力有限公司抚顺供电公司 | Method for checking power failure information of key machine room |
CN111651601A (en) * | 2020-06-02 | 2020-09-11 | 全球能源互联网研究院有限公司 | Training method and classification method for fault classification model of power information system |
CN111767398A (en) * | 2020-06-30 | 2020-10-13 | 国网新疆电力有限公司电力科学研究院 | Secondary equipment fault short text data classification method based on convolutional neural network |
CN112036582A (en) * | 2020-06-10 | 2020-12-04 | 国网浙江省电力有限公司 | Distribution network emergency repair work order processing method and device |
CN112183782A (en) * | 2020-10-13 | 2021-01-05 | 中国联合网络通信集团有限公司 | Fault work order processing method and equipment |
CN113077118A (en) * | 2021-03-01 | 2021-07-06 | 广东电网有限责任公司广州供电局 | Work order pushing method based on Internet intelligent pushing technology |
TWI777163B (en) * | 2020-04-10 | 2022-09-11 | 鴻海精密工業股份有限公司 | Form data detection method, computer device and storage medium |
CN115511124A (en) * | 2022-09-27 | 2022-12-23 | 上海网商电子商务有限公司 | Customer grading method based on after-sale maintenance records |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN108399230A (en) * | 2018-02-13 | 2018-08-14 | 上海大学 | A kind of Chinese financial and economic news file classification method based on convolutional neural networks |
CN109241530A (en) * | 2018-08-29 | 2019-01-18 | 昆明理工大学 | A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks |
CN109783637A (en) * | 2018-12-12 | 2019-05-21 | 国网浙江省电力有限公司杭州供电公司 | Electric power overhaul text mining method based on deep neural network |
-
2019
- 2019-06-04 CN CN201910480766.7A patent/CN110232188A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN108399230A (en) * | 2018-02-13 | 2018-08-14 | 上海大学 | A kind of Chinese financial and economic news file classification method based on convolutional neural networks |
CN109241530A (en) * | 2018-08-29 | 2019-01-18 | 昆明理工大学 | A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks |
CN109783637A (en) * | 2018-12-12 | 2019-05-21 | 国网浙江省电力有限公司杭州供电公司 | Electric power overhaul text mining method based on deep neural network |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553807A (en) * | 2019-10-28 | 2020-08-18 | 国网辽宁省电力有限公司抚顺供电公司 | Method for checking power failure information of key machine room |
CN111178054A (en) * | 2019-12-05 | 2020-05-19 | 国网浙江省电力有限公司杭州供电公司 | Text processing method based on neural network language model vectorization |
CN111191529A (en) * | 2019-12-17 | 2020-05-22 | 中移(杭州)信息技术有限公司 | Method and system for processing abnormal work order |
CN111191529B (en) * | 2019-12-17 | 2023-04-28 | 中移(杭州)信息技术有限公司 | Method and system for processing abnormal worksheets |
TWI777163B (en) * | 2020-04-10 | 2022-09-11 | 鴻海精密工業股份有限公司 | Form data detection method, computer device and storage medium |
CN111460164A (en) * | 2020-05-22 | 2020-07-28 | 南京大学 | Intelligent barrier judgment method for telecommunication work order based on pre-training language model |
CN111460164B (en) * | 2020-05-22 | 2023-11-03 | 南京大学 | Intelligent fault judging method for telecommunication work orders based on pre-training language model |
CN111651601A (en) * | 2020-06-02 | 2020-09-11 | 全球能源互联网研究院有限公司 | Training method and classification method for fault classification model of power information system |
CN111651601B (en) * | 2020-06-02 | 2023-04-18 | 全球能源互联网研究院有限公司 | Training method and classification method for fault classification model of power information system |
CN112036582A (en) * | 2020-06-10 | 2020-12-04 | 国网浙江省电力有限公司 | Distribution network emergency repair work order processing method and device |
CN111767398A (en) * | 2020-06-30 | 2020-10-13 | 国网新疆电力有限公司电力科学研究院 | Secondary equipment fault short text data classification method based on convolutional neural network |
CN112183782A (en) * | 2020-10-13 | 2021-01-05 | 中国联合网络通信集团有限公司 | Fault work order processing method and equipment |
CN112183782B (en) * | 2020-10-13 | 2024-04-12 | 中国联合网络通信集团有限公司 | Fault work order processing method and equipment |
CN113077118A (en) * | 2021-03-01 | 2021-07-06 | 广东电网有限责任公司广州供电局 | Work order pushing method based on Internet intelligent pushing technology |
CN115511124B (en) * | 2022-09-27 | 2023-04-18 | 上海网商电子商务有限公司 | Customer grading method based on after-sale maintenance records |
CN115511124A (en) * | 2022-09-27 | 2022-12-23 | 上海网商电子商务有限公司 | Customer grading method based on after-sale maintenance records |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232188A (en) | The Automatic document classification method of power grid user troublshooting work order | |
CN112149316B (en) | Aero-engine residual life prediction method based on improved CNN model | |
CN110334705B (en) | Language identification method of scene text image combining global and local information | |
CN109697232A (en) | A kind of Chinese text sentiment analysis method based on deep learning | |
CN111144448A (en) | Video barrage emotion analysis method based on multi-scale attention convolutional coding network | |
CN109376242A (en) | Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks | |
CN114092832B (en) | High-resolution remote sensing image classification method based on parallel hybrid convolutional network | |
CN109635928A (en) | A kind of voltage sag reason recognition methods based on deep learning Model Fusion | |
CN111581385A (en) | Chinese text type identification system and method for unbalanced data sampling | |
CN109376775B (en) | Online news multi-mode emotion analysis method | |
CN110781671A (en) | Knowledge mining method for intelligent IETM fault maintenance record text | |
CN109409221A (en) | Video content description method and system based on frame selection | |
CN113946677B (en) | Event identification and classification method based on bidirectional cyclic neural network and attention mechanism | |
CN116311483B (en) | Micro-expression recognition method based on local facial area reconstruction and memory contrast learning | |
CN108920446A (en) | A kind of processing method of Engineering document | |
CN111680190B (en) | Video thumbnail recommendation method integrating visual semantic information | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN110472245A (en) | A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks | |
CN113094502A (en) | Multi-granularity takeaway user comment sentiment analysis method | |
CN103336830B (en) | Image search method based on structure semantic histogram | |
CN111090747A (en) | Power communication fault emergency disposal method based on neural network classification | |
CN114548116A (en) | Chinese text error detection method and system based on language sequence and semantic joint analysis | |
CN112559741B (en) | Nuclear power equipment defect record text classification method, system, medium and electronic equipment | |
CN116631566B (en) | Medical image report intelligent generation method based on big data | |
CN103440332B (en) | A kind of image search method strengthening expression based on relational matrix regularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190913 |
|
WD01 | Invention patent application deemed withdrawn after publication |