CN110232188A - The Automatic document classification method of power grid user troublshooting work order - Google Patents

The Automatic document classification method of power grid user troublshooting work order Download PDF

Info

Publication number
CN110232188A
CN110232188A CN201910480766.7A CN201910480766A CN110232188A CN 110232188 A CN110232188 A CN 110232188A CN 201910480766 A CN201910480766 A CN 201910480766A CN 110232188 A CN110232188 A CN 110232188A
Authority
CN
China
Prior art keywords
text
training
power grid
classification
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910480766.7A
Other languages
Chinese (zh)
Inventor
赵田
曹渝昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Electric Power
University of Shanghai for Science and Technology
Original Assignee
Shanghai University of Electric Power
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Electric Power filed Critical Shanghai University of Electric Power
Priority to CN201910480766.7A priority Critical patent/CN110232188A/en
Publication of CN110232188A publication Critical patent/CN110232188A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Automatic document classification methods of power grid user troublshooting work order, the data of the text data set of power grid user troublshooting work order are segmented first, filter stop words and training term vector, training term vector uses Word2vec method, the distributed of Word2vec learning text indicates, by text vector, text is indicated using term vector;Term vector after training is loaded into progress feature extraction and model training, probability distribution of the final output on each label in convolutional neural networks and obtains classification results.Relative to traditional file classification method, accuracy rate significantly improves and greatly reduces the classification time.Meanwhile this method can help power grid department more reasonably to configure various emergency maintenance resources, greatly shorten the breakdown maintenance time, be conducive to the service quality for improving power grid department.

Description

The Automatic document classification method of power grid user troublshooting work order
Technical field
The present invention relates to a kind of Text Classification, in particular to a kind of text of power grid user troublshooting work order is automatic Classification method.
Background technique
Recently as economic and information-based fast development, power grid scale is constantly expanding, and number of users is also continuous Increase, the troublshooting amount of power grid user also shows an increasing trend year by year.Power grid user troublshooting work order is that record user comes The important information of electricity feedback, these information are mainly presented in the form of Chinese text, and content is brief, and feature is sparse.Traditionally, power grid Troublshooting work order relies primarily on artificial progress manual classification.But not only working efficiency is low for this processing mode, Er Qieji It is error-prone.
Summary of the invention
The present invention be directed to current power grid departments to be difficult to the problem of reporting work order automatic processing and classification for repairment to user malfunction, A kind of Automatic document classification method of power grid user troublshooting work order is proposed, realizes automatic classification.
The technical solution of the present invention is as follows: a kind of Automatic document classification method of power grid user troublshooting work order, specific to wrap Include following steps:
1) based on the data preprocessing phase of word2vec: first to the text data set of power grid user troublshooting work order Data segmented, filter stop words and training term vector, training term vector use Word2vec method, Word2vec Practising the distributed of text indicates, by text vector, indicates text using term vector;
2) based on the text classification of convolutional neural networks: the term vector after training, which is loaded into convolutional neural networks, to carry out Feature extraction and model training are classified using substituting into model to text data to be tested, classification accuracy rate, most Classified automatically by the qualified convolutional neural networks of training to text eventually.
The specific implementation steps are as follows for the step 2)
(1) convolution: will in data prediction after training obtained term vector be input to the convolutional layers of convolutional neural networks into Row feature extraction, convolutional layer obtain Feature Mapping by convolution algorithm, and the text for converting two-dimensional matrix form for term vector is special Sign indicates;
(2) it pond: is operated in the text feature mapping that convolutional layer obtains using the pond k-max, captures each Feature Mapping Obtained most important feature simplifies the output of convolutional layer;
(3) Softmax classifies: the feature that pondization is extracted is transmitted to softmax layers, most using softmax classifier calculated Probability distribution of the output on each label eventually, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classified Accuracy rate is up to standard, and the convolutional neural networks after training of judgement can directly apply to text classification, such as below standard, increases training data Re-start training.
The beneficial effects of the present invention are: the Automatic document classification method of power grid user troublshooting work order of the present invention, phase For traditional file classification method, accuracy rate significantly improves and greatly reduces the classification time.Meanwhile this method can help Power grid department more reasonably configures various emergency maintenance resources, greatly shortens the breakdown maintenance time, is conducive to improve power grid department Service quality.
Detailed description of the invention
Fig. 1 is that the present invention is based on the automatic text classification methods of the power grid user troublshooting work order of word2vec and CNN Flow diagram;
Fig. 2 is that the present invention is based on CNN to carry out feature extraction and model training schematic diagram.
Specific embodiment
The automatic text classification method stream of power grid user troublshooting work order based on word2vec and CNN as shown in Figure 1 Journey schematic diagram, the method for the present invention realization include the following steps:
1, based on the data preprocessing phase of word2vec: Word2vec is the relevant mode that a group is used to generate term vector Type is used to training with the word text of construction linguistics again.Text in this stage, first to power grid user troublshooting work order The data of notebook data collection are segmented, and stop words and training term vector are filtered.Training term vector uses Word2vec method, The distributed of Word2vec learning text indicates, by text vector, indicates text using term vector.
2, it is based on the text classification of CNN (Convolutional Neural Networks, convolutional neural networks).Training Term vector afterwards, which is loaded into CNN, carries out feature extraction and model training, substitutes into model using to text data to be tested In classify, classification accuracy rate classifies automatically to text eventually by the qualified CNN of training.
The Text Classification of troublshooting work order based on word2vec and CNN is the core entirely invented, traditional Document representation method is indicated based on vector space model or one-hot.Vector dimension and word in dictionary in vector space model Number it is linearly related, " dimension disaster " is easily caused as word number increases.Although and one-hot is simple but has ignored between word Semantic dependency.Word2vec solves the problems, such as vector space model and one-hot, and the sparse feature vector of higher-dimension is reflected The term vector dense for low-dimensional is penetrated, effectively prevents the generation of " dimension disaster ", and can directly calculate the semanteme between word Correlation.Feature learning algorithm is using the deep learning algorithm CNN for obtaining marvelous results in field of image processing in recent years.CNN Main target be extract text characteristic information, the algorithm can still be kept in the case where data volume is big preferably training effect Fruit.
As shown in Fig. 2, the key step of progress feature extraction and model training based on CNN is as follows:
(1) convolution.The convolutional layer that term vector after training in data prediction is input to CNN is subjected to feature extraction, volume Lamination obtains Feature Mapping by convolution algorithm, can convert term vector to the Text Representation of two-dimensional matrix form.
(2) pond.Using the pond k-max operation (maximizing) in the text feature mapping that convolutional layer obtains, capture is every The most important feature that a Feature Mapping obtains simplifies the output of convolutional layer.
(3) Softmax classifies.The feature that pondization is extracted is transmitted to softmax layers, most using softmax classifier calculated Probability distribution of the output on each label eventually, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classified Accuracy rate is up to standard, and the CNN after training of judgement can directly apply to text classification, such as below standard, increases training data and re-starts Training.

Claims (2)

1. a kind of Automatic document classification method of power grid user troublshooting work order, which is characterized in that specifically comprise the following steps:
1) based on the data preprocessing phase of word2vec: first to the number of the text data set of power grid user troublshooting work order According to being segmented, stop words and training term vector are filtered, training term vector uses Word2vec method, Word2vec study text The distributed of this indicates, by text vector, indicates text using term vector;
2) based on the text classification of convolutional neural networks: the term vector after training, which is loaded into convolutional neural networks, carries out feature Extraction and model training are classified using substituting into model to text data to be tested, classification accuracy rate, final logical The convolutional neural networks for crossing training qualification classify automatically to text.
2. the Automatic document classification method of power grid user troublshooting work order according to claim 1, which is characterized in that described The specific implementation steps are as follows for step 2)
(1) convolution: the convolutional layer that the term vector obtained after training in data prediction is input to convolutional neural networks is carried out special Sign is extracted, and convolutional layer obtains Feature Mapping by convolution algorithm, converts term vector in the text feature table of two-dimensional matrix form Show;
(2) it pond: is operated in the text feature mapping that convolutional layer obtains using the pond k-max, captures each Feature Mapping and obtain Most important feature, simplify convolutional layer output;
(3) Softmax classifies: the feature that pondization is extracted is transmitted to softmax layers, finally defeated using softmax classifier calculated Probability distribution on each label out, obtains classification results.
(4) text data to be tested in test set is substituted into model and is examined, obtain the classification accuracy of model, classification is accurate Rate is up to standard, and the convolutional neural networks after training of judgement can directly apply to text classification, such as below standard, increases training data again It is trained.
CN201910480766.7A 2019-06-04 2019-06-04 The Automatic document classification method of power grid user troublshooting work order Pending CN110232188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910480766.7A CN110232188A (en) 2019-06-04 2019-06-04 The Automatic document classification method of power grid user troublshooting work order

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910480766.7A CN110232188A (en) 2019-06-04 2019-06-04 The Automatic document classification method of power grid user troublshooting work order

Publications (1)

Publication Number Publication Date
CN110232188A true CN110232188A (en) 2019-09-13

Family

ID=67859148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910480766.7A Pending CN110232188A (en) 2019-06-04 2019-06-04 The Automatic document classification method of power grid user troublshooting work order

Country Status (1)

Country Link
CN (1) CN110232188A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178054A (en) * 2019-12-05 2020-05-19 国网浙江省电力有限公司杭州供电公司 Text processing method based on neural network language model vectorization
CN111191529A (en) * 2019-12-17 2020-05-22 中移(杭州)信息技术有限公司 Method and system for processing abnormal work order
CN111460164A (en) * 2020-05-22 2020-07-28 南京大学 Intelligent barrier judgment method for telecommunication work order based on pre-training language model
CN111553807A (en) * 2019-10-28 2020-08-18 国网辽宁省电力有限公司抚顺供电公司 Method for checking power failure information of key machine room
CN111651601A (en) * 2020-06-02 2020-09-11 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN111767398A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Secondary equipment fault short text data classification method based on convolutional neural network
CN112036582A (en) * 2020-06-10 2020-12-04 国网浙江省电力有限公司 Distribution network emergency repair work order processing method and device
CN112183782A (en) * 2020-10-13 2021-01-05 中国联合网络通信集团有限公司 Fault work order processing method and equipment
CN113077118A (en) * 2021-03-01 2021-07-06 广东电网有限责任公司广州供电局 Work order pushing method based on Internet intelligent pushing technology
TWI777163B (en) * 2020-04-10 2022-09-11 鴻海精密工業股份有限公司 Form data detection method, computer device and storage medium
CN115511124A (en) * 2022-09-27 2022-12-23 上海网商电子商务有限公司 Customer grading method based on after-sale maintenance records

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN109241530A (en) * 2018-08-29 2019-01-18 昆明理工大学 A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks
CN109783637A (en) * 2018-12-12 2019-05-21 国网浙江省电力有限公司杭州供电公司 Electric power overhaul text mining method based on deep neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN109241530A (en) * 2018-08-29 2019-01-18 昆明理工大学 A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks
CN109783637A (en) * 2018-12-12 2019-05-21 国网浙江省电力有限公司杭州供电公司 Electric power overhaul text mining method based on deep neural network

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553807A (en) * 2019-10-28 2020-08-18 国网辽宁省电力有限公司抚顺供电公司 Method for checking power failure information of key machine room
CN111178054A (en) * 2019-12-05 2020-05-19 国网浙江省电力有限公司杭州供电公司 Text processing method based on neural network language model vectorization
CN111191529A (en) * 2019-12-17 2020-05-22 中移(杭州)信息技术有限公司 Method and system for processing abnormal work order
CN111191529B (en) * 2019-12-17 2023-04-28 中移(杭州)信息技术有限公司 Method and system for processing abnormal worksheets
TWI777163B (en) * 2020-04-10 2022-09-11 鴻海精密工業股份有限公司 Form data detection method, computer device and storage medium
CN111460164A (en) * 2020-05-22 2020-07-28 南京大学 Intelligent barrier judgment method for telecommunication work order based on pre-training language model
CN111460164B (en) * 2020-05-22 2023-11-03 南京大学 Intelligent fault judging method for telecommunication work orders based on pre-training language model
CN111651601A (en) * 2020-06-02 2020-09-11 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN111651601B (en) * 2020-06-02 2023-04-18 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN112036582A (en) * 2020-06-10 2020-12-04 国网浙江省电力有限公司 Distribution network emergency repair work order processing method and device
CN111767398A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Secondary equipment fault short text data classification method based on convolutional neural network
CN112183782A (en) * 2020-10-13 2021-01-05 中国联合网络通信集团有限公司 Fault work order processing method and equipment
CN112183782B (en) * 2020-10-13 2024-04-12 中国联合网络通信集团有限公司 Fault work order processing method and equipment
CN113077118A (en) * 2021-03-01 2021-07-06 广东电网有限责任公司广州供电局 Work order pushing method based on Internet intelligent pushing technology
CN115511124B (en) * 2022-09-27 2023-04-18 上海网商电子商务有限公司 Customer grading method based on after-sale maintenance records
CN115511124A (en) * 2022-09-27 2022-12-23 上海网商电子商务有限公司 Customer grading method based on after-sale maintenance records

Similar Documents

Publication Publication Date Title
CN110232188A (en) The Automatic document classification method of power grid user troublshooting work order
CN112149316B (en) Aero-engine residual life prediction method based on improved CNN model
CN110334705B (en) Language identification method of scene text image combining global and local information
CN109697232A (en) A kind of Chinese text sentiment analysis method based on deep learning
CN111144448A (en) Video barrage emotion analysis method based on multi-scale attention convolutional coding network
CN109376242A (en) Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN109635928A (en) A kind of voltage sag reason recognition methods based on deep learning Model Fusion
CN111581385A (en) Chinese text type identification system and method for unbalanced data sampling
CN109376775B (en) Online news multi-mode emotion analysis method
CN110781671A (en) Knowledge mining method for intelligent IETM fault maintenance record text
CN109409221A (en) Video content description method and system based on frame selection
CN113946677B (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN108920446A (en) A kind of processing method of Engineering document
CN111680190B (en) Video thumbnail recommendation method integrating visual semantic information
CN110910175A (en) Tourist ticket product portrait generation method
CN110472245A (en) A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks
CN113094502A (en) Multi-granularity takeaway user comment sentiment analysis method
CN103336830B (en) Image search method based on structure semantic histogram
CN111090747A (en) Power communication fault emergency disposal method based on neural network classification
CN114548116A (en) Chinese text error detection method and system based on language sequence and semantic joint analysis
CN112559741B (en) Nuclear power equipment defect record text classification method, system, medium and electronic equipment
CN116631566B (en) Medical image report intelligent generation method based on big data
CN103440332B (en) A kind of image search method strengthening expression based on relational matrix regularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190913

WD01 Invention patent application deemed withdrawn after publication