CN110895565A - Method and system for classifying fault defect texts of power equipment - Google Patents

Method and system for classifying fault defect texts of power equipment Download PDF

Info

Publication number
CN110895565A
CN110895565A CN201911202779.4A CN201911202779A CN110895565A CN 110895565 A CN110895565 A CN 110895565A CN 201911202779 A CN201911202779 A CN 201911202779A CN 110895565 A CN110895565 A CN 110895565A
Authority
CN
China
Prior art keywords
text
convolution
matrix
power equipment
defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911202779.4A
Other languages
Chinese (zh)
Inventor
田峥
乔宏
黎曦
邓杰
田建伟
朱宏宇
陈中伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Hunan Electric Power Co Ltd
State Grid Hunan Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Hunan Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Hunan Electric Power Co Ltd
State Grid Hunan Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Hunan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Hunan Electric Power Co Ltd, State Grid Hunan Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Hunan Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201911202779.4A priority Critical patent/CN110895565A/en
Publication of CN110895565A publication Critical patent/CN110895565A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for classifying fault and defect texts of power equipment, wherein the method comprises the following steps: preprocessing a fault defect text of the power equipment, constructing a paragraph matrix training model, extracting semantic features, training a softmax classifier, and inputting the extracted semantic features into the trained softmax classifier to classify the fault defect text. The method for classifying the text of the fault defect of the power equipment can solve the technical problems that the existing manual classification is high in labor cost, classification results are influenced by self experiences of different technicians, and the traditional text classification method is lack of pertinence and cannot be applied to the professional field of the power equipment.

Description

Method and system for classifying fault defect texts of power equipment
Technical Field
The invention belongs to the technical field of automatic text classification, and particularly relates to a method and a system for classifying a text of a fault defect of power equipment.
Background
The power grid system is an important national infrastructure and plays an extremely important role in the aspect of national civilization. Because the power grid system has a plurality of components and a complex structure, a large amount of fault defect text data can be accumulated through measures such as routing inspection, experiments and the like in the long-term operation process of the power equipment. The summary of the fault defect text data is the original basis for fault defect processing and analysis. At present, most power grid dispatching departments simply and randomly record error information of equipment, and do not have strict fault defect information input specifications; the classification statistics of fault defect information is often performed manually, which results in extremely high labor cost. Particularly, the classification of the severity of the fault according to the description of the fault defect condition is affected by the difference of the subjective experiences of different workers, so that the classification result has deviation.
With the continuous development of computer science, the related technologies such as automatic text classification and the like gradually go deep into the text classification technology. Common algorithms in the existing text classification technology include a support vector machine, naive Bayes, a recurrent neural network and the like, but the traditional text classifier based on the machine learning related algorithm is difficult to dig out deep features of the text, which is not beneficial to further research and application of classification data. Meanwhile, the text of the power industry contains a large number of professional vocabularies and special symbols, and the specialty is strong. The general word vectors and corresponding models in deep learning are difficult to obtain direct migration application.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a text classification method for the fault defects of the power equipment, and aims to solve the technical problems that the existing manual classification method is high in labor cost, the classification result is influenced by self experiences of different technicians, and the traditional text classification method is lack of pertinence and cannot be applied to the professional field of the power equipment.
A method for classifying fault and defect texts of power equipment comprises the following steps:
preprocessing a fault defect text of the power equipment;
expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
step (4) obtaining training data by using a fault defect text of the power equipment with a known defect type according to the processing of the steps (1) to (3) and obtaining corresponding semantic feature vectors, training model parameters of the softmax classifier by using the training data and adopting a random gradient descent method, and finishing the training when the loss function is minimized;
and (5) inputting the obtained semantic feature vector into a trained softmax classifier according to the processing of the step (1) to the step (3) of the power equipment fault defect text to be classified, and classifying the power equipment fault defect text.
Further, the step (1) comprises the steps of selecting equipment fault defect record data of a certain power grid company, and performing word segmentation and stop word removal on the obtained equipment fault defect records.
Further, the step (2) specifically comprises the following sub-steps:
(2-1) preprocessing the text words of the fault defects of the power equipment to obtain word vectors w1,w2,w3,…,wTT is the total number of word vectors, and one text word corresponds to one word vector;
(2-2) training a plurality of word vectors w using a stochastic gradient descent methodiConstructing paragraph matrix D such that each word vector wiAll map to a column in the paragraph matrix D, compute all word vectors wiUntil the average logarithmic probability reaches the maximum, obtaining a paragraph matrix training model;
in the training process, the weight of each word vector in the paragraph matrix training model is updated according to the negative gradient direction obtained in the gradient descending process.
The better the association between word vectors in the paragraph matrix when the average log probability is the largest.
Further, the calculation formula of the average log probability is as follows:
Figure BDA0002296273600000021
wherein T is the total number of word vectors,
Figure BDA0002296273600000022
p(·|wt-k,…,wt+k) As a word vector wtA conditional probability function;
Figure BDA0002296273600000023
wherein, yiAs output word vectors wiNon-normalized log probability, where i ═ wt-k,wt-k+1,…,wt+k
Further, the specific obtaining process of the semantic feature vector is as follows:
(3-1) inputting a paragraph matrix D corresponding to the text of the fault defect of the power equipment;
(3-2) adopting a convolution matrix window W epsilon R with the same column number as the paragraph matrix D and the same row number h as the paragraph matrix Dh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
(3-3) carrying out nonlinear processing on the result obtained after s-h +1 times of convolution to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
(3-4) taking the maximum element max { c } in the convolution layer vector c obtained by convolution of each convolution window as a characteristic value, and extracting the characteristic value p corresponding to each convolution windowjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form semantic feature vector p ∈ Rw
Generally, the industrial fault defect text is not as large as other texts, but is generally short, short and one sentence as one paragraph (two or three sentences in a few cases), but the two sentences surround the same thing and have uniform characteristics. For example: "a discharge signal is found at the main transformer body below the phase C of the 110kV high-voltage side bushing of the main transformer No. 1", which is a sentence, but can be understood as a paragraph;
a power equipment fault defect text classification system comprises:
a text preprocessing unit: the system is used for performing word segmentation and stop word removal on the power equipment fault defect text;
a paragraph matrix training model building unit: expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
a semantic feature vector extraction unit: extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
softmax classifier construction unit: the method comprises the steps of utilizing a fault defect text of the electric power equipment with a known defect type, utilizing a text preprocessing unit to preprocess the text, utilizing a semantic feature vector extraction unit to provide a semantic feature vector corresponding to the text with the known defect type, obtaining training data, utilizing the training data to train model parameters of a softmax classifier by a random gradient descent method, finishing training when a loss function is minimized, and obtaining the softmax classifier;
a defect classification unit: extracting corresponding semantic feature vectors from the power equipment fault defect text to be classified by using a text preprocessing unit and a semantic extraction unit, inputting the obtained semantic feature vectors into a trained softmax classifier, and classifying the power equipment fault defect text.
Further, the method comprises a convolution processing unit, wherein the convolution processing unit firstly adopts a convolution matrix window W epsilon R with the same column number as the paragraph matrix D and the same row number hh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
secondly, the s-The result after h +1 times of convolution is subjected to nonlinear processing to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
finally, taking the maximum element max { c } in the convolution layer vector c obtained by convolution of each convolution window as a characteristic value, and extracting the characteristic value p corresponding to each convolution windowjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form a semantic feature vector p, wherein p belongs to Rw
Advantageous effects
Compared with the prior art, the following beneficial effects can be achieved:
(1) the labor cost in the actual production of the power grid is reduced, the fault defect text classification result is prevented from being influenced by the experience of different technicians, and the classification efficiency is improved;
(2) a feature extraction classifier is constructed, and the fault defect text is classified by combining professional features of the power field, so that objective and efficient fault division reference is provided for actual production of the power industry, the stability of a power system is improved, and a foundation is laid for further application of fault defect data.
Drawings
FIG. 1 is a schematic diagram of a text classification flow of the present invention.
Detailed Description
The invention will now be further described with reference to the accompanying drawings and examples.
As shown in fig. 1, the text classification method for fault and defect of power equipment according to the present invention is applied to an existing power grid production management system, and includes the following steps:
a method for classifying fault and defect texts of power equipment comprises the following steps:
preprocessing a fault defect text of the power equipment;
and selecting equipment fault defect record data of a certain power plant in nearly 10 years, and preprocessing the text of the power equipment fault defect record data, including word segmentation, stop word removal and the like. When training and testing a text classification task, after randomly arranging all fault defect data, averagely dividing the fault defect data into 5 parts. And (3) taking 4 parts of data as a training set and 1 part of data as a test set in turn for verification, and assuming that 1 part of the test data set is that the moisture absorber is totally discolored and the moisture absorber is slightly discolored.
The data set is subjected to word segmentation, namely 'moisture absorber + already + totally + discoloring' and 'moisture absorber + slightly + discoloring'.
Expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
(2-1) preprocessing the text words of the fault defects of the power equipment to obtain word vectors w1,w2,w3,…,wTT is the total number of word vectors, and one text word corresponds to one word vector;
(2-2) training a plurality of word vectors w using a stochastic gradient descent methodiConstructing paragraph matrix D such that each word vector wiAll map to a column in the paragraph matrix D, compute all word vectors wiUntil the average logarithmic probability reaches the maximum, obtaining a paragraph matrix training model;
in the training process, the weight of each word vector in the paragraph matrix training model is updated according to the negative gradient direction obtained in the gradient descending process.
The better the association between word vectors in the paragraph matrix when the average log probability is the largest.
The calculation formula of the average logarithmic probability is as follows:
Figure BDA0002296273600000051
wherein T is the total number of word vectors,
Figure BDA0002296273600000052
p(·|wt-k,…,wt+k) As a word vector wtA conditional probability function;
Figure BDA0002296273600000053
wherein, yiAs output word vectors wiNon-normalized log probability, where i ═ wt-k,wt-k+1,…,wt+k
Extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
the specific acquisition process of the semantic feature vector is as follows:
(3-1) inputting a paragraph matrix D corresponding to the text of the fault defect of the power equipment;
(3-2) adopting a convolution matrix window W epsilon R with the same column number as the paragraph matrix D and the same row number h as the paragraph matrix Dh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
(3-3) carrying out nonlinear processing on the result obtained after s-h +1 times of convolution to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
(3-4) taking convolution layer obtained by convolution of each convolution windowThe largest element max { c } in the vector c is used as a characteristic value, and the characteristic value p corresponding to each convolution window is extractedjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form semantic feature vector p ∈ Rw
Step (4) obtaining training data by using a fault defect text of the power equipment with a known defect type according to the corresponding semantic feature vectors obtained by the processing of the steps (1) to (3), training model parameters of a softmax classifier by using the training data through a random gradient descent method, and finishing the training when the loss function is minimized;
and (5) inputting the obtained semantic feature vector into a trained softmax classifier according to the processing of the step (1) to the step (3) of the power equipment fault defect text to be classified, and classifying the power equipment fault defect text.
Because the length of the defect text is greatly changed, different numbers of words can be adopted to express different defect texts when the same information is expressed, for example, for the information expressed by 2 words of 'all + discoloring' in the short text, the long text can be expressed by 4 words of 'all + discoloring', and at the moment, the information is difficult to completely extract by using a convolution window with the number of lines h of 2. Therefore, the convolution window is divided into a plurality of groups, and different groups adopt convolution windows with different sizes (different column numbers) to perform convolution operation with the paragraph matrix D (namely, the sizes of the convolution windows are set according to the size of the paragraph matrix) so as to obtain semantic features with different word number levels.
Generally, the industrial fault defect text does not appear as large-segment text as other texts, usually a short sentence is a segment (two or three sentences in a few cases), and is described around the same thing, and the characteristics are uniform. For example: "a discharge signal is found at the main transformer body below the phase C of the 110kV high-voltage side bushing of the main transformer No. 1", which is a sentence, but can be understood as a paragraph;
the text of the fault defect that the moisture absorber is totally discolored is classified into an emergency type, and the text of the fault defect that the moisture absorber is slightly discolored is classified into a normal type.
A power equipment fault defect text classification system comprises:
a text preprocessing unit: the system is used for performing word segmentation and stop word removal on the power equipment fault defect text;
a paragraph matrix training model building unit: expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
a semantic feature vector extraction unit: extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
softmax classifier construction unit: the method comprises the steps of utilizing a fault defect text of the electric power equipment with a known defect type, utilizing a text preprocessing unit to preprocess the text, utilizing a semantic feature vector extraction unit to provide a semantic feature vector corresponding to the text with the known defect type, obtaining training data, utilizing the training data to train model parameters of a softmax classifier by a random gradient descent method, finishing training when a loss function is minimized, and obtaining the softmax classifier;
a defect classification unit: extracting corresponding semantic feature vectors from the power equipment fault defect text to be classified by using a text preprocessing unit and a semantic extraction unit, inputting the obtained semantic feature vectors into a trained softmax classifier, and classifying the power equipment fault defect text.
Further, the method comprises a convolution processing unit, wherein the convolution processing unit firstly adopts a convolution matrix window W epsilon R with the same column number as the paragraph matrix D and the same row number hh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
secondly, the result after s-h +1 times of convolution is subjected to nonlinear processing to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
finally, taking the maximum element max { c } in the convolution layer vector c obtained by convolution of each convolution window as a characteristic value, and extracting the characteristic value p corresponding to each convolution windowjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form a semantic feature vector p, wherein p belongs to Rw
It should be understood that the functional unit modules in the embodiments of the present invention may be integrated into one processing unit, or each unit module may exist alone physically, or two or more unit modules are integrated into one unit module, and may be implemented in the form of hardware or software.
In summary, the method for classifying the text of the fault and the defect of the power equipment provided by the invention is necessary for reducing the labor cost and improving the robustness of the power system in the actual production by performing the text preprocessing, the text representation, the text classification and the comparison training of the classification result on the text information of the fault and the defect of the power equipment.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A method for classifying fault and defect texts of electrical equipment is characterized by comprising the following steps:
preprocessing a fault defect text of the power equipment;
expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
step (4) obtaining training data by using a fault defect text of the power equipment with a known defect type according to the processing of the steps (1) to (3) and obtaining corresponding semantic feature vectors, training model parameters of the softmax classifier by using the training data and adopting a random gradient descent method, and finishing the training when the loss function is minimized;
and (5) inputting the obtained semantic feature vector into a trained softmax classifier according to the processing of the step (1) to the step (3) of the power equipment fault defect text to be classified, and classifying the power equipment fault defect text.
2. The method for classifying the text of the fault and the defect of the power equipment as claimed in claim 1, wherein the step (1) comprises the steps of selecting fault and defect record data of the equipment of a certain power grid company, and performing word segmentation and stop word removal on the obtained fault and defect record of the equipment.
3. The method for text classification of fault defects of electric power equipment according to claim 1, wherein the step (2) comprises the following sub-steps:
(2-1) preprocessing the text words of the fault defects of the power equipment to obtain word vectors w1,w2,w3,…,wTT is the total number of word vectors, and one text word corresponds to one word vector;
(2-2) training a plurality of word vectors w using a stochastic gradient descent methodiConstructing paragraph matrix D such that each word vector wiAll map to a column in the paragraph matrix D, compute all word vectors wiUntil the average logarithmic probability reaches the maximum, obtaining a paragraph matrix training model;
in the training process, the weight of each word vector in the paragraph matrix training model is updated according to the negative gradient direction obtained in the gradient descending process.
4. The method for classifying the text of the fault defect of the power equipment as claimed in claim 1, wherein the calculation formula of the average log probability is as follows:
Figure FDA0002296273590000011
wherein T is the total number of word vectors,
Figure FDA0002296273590000012
t∈[k,T-k],p(·|wt-k,…,wt+k) As a word vector wtA conditional probability function;
Figure FDA0002296273590000021
wherein, yiAs output word vectors wiNon-normalized log probability, where i ═ wt-k,wt-k+1,…,wt+k
5. The method for classifying the text of the fault defect of the power equipment as claimed in claim 1, wherein the semantic feature vector is obtained by the following specific steps:
(3-1) inputting a paragraph matrix D corresponding to the text of the fault defect of the power equipment;
(3-2) adopting a convolution matrix window W epsilon R with the same column number as the paragraph matrix D and the same row number h as the paragraph matrix Dh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
(3-3) carrying out nonlinear processing on the result obtained after s-h +1 times of convolution to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
(3-4) taking the maximum element max { c } in the convolution layer vector c obtained by convolution of each convolution window as a characteristic value, and extracting the characteristic value p corresponding to each convolution windowjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form semantic feature vector p ∈ Rw
6. A power equipment fault defect text classification system is characterized by comprising:
a text preprocessing unit: the system is used for performing word segmentation and stop word removal on the power equipment fault defect text;
a paragraph matrix training model building unit: expressing words in the preprocessed fault defect text by using word vectors, constructing a paragraph matrix D by using the word vectors, and training the paragraph matrix D by using a random gradient descent method to obtain a paragraph matrix training model;
a semantic feature vector extraction unit: extracting a paragraph matrix of the preprocessed power equipment fault defect text by using a paragraph matrix training model, and performing convolution by using a convolution window with the same size as the paragraph matrix to obtain a corresponding semantic feature vector;
softmax classifier construction unit: the method comprises the steps of utilizing a fault defect text of the electric power equipment with a known defect type, utilizing a text preprocessing unit to preprocess the text, utilizing a semantic feature vector extraction unit to provide a semantic feature vector corresponding to the text with the known defect type, obtaining training data, utilizing the training data to train model parameters of a softmax classifier by a random gradient descent method, finishing training when a loss function is minimized, and obtaining the softmax classifier;
a defect classification unit: extracting corresponding semantic feature vectors from the power equipment fault defect text to be classified by using a text preprocessing unit and a semantic extraction unit, inputting the obtained semantic feature vectors into a trained softmax classifier, and classifying the power equipment fault defect text.
7. The system for classifying the text of the fault defect of the electric power equipment as claimed in claim 6, wherein the system comprises a convolution processing unit, and the convolution processing unit firstly adopts a convolution matrix window W e R with the same column number as that of the paragraph matrix D and the same row number hh×nSequentially carrying out convolution operation from top to bottom on matrix blocks which are not overlapped with each other in the paragraph matrix D and have the size of h rows and n columns to obtain a convolution result ri
ri=W·Di:i+h-1
Wherein i is 1,2, …, s-h + 1; di:i+h-1An ith h-row and n-column matrix block from top to bottom of the paragraph matrix D;
secondly, the result after s-h +1 times of convolution is subjected to nonlinear processing to obtain a nonlinear result ciThe obtained s-h +1 real numbers ciArranged in sequence to form a vector c ∈ R of the convolutional layers-h+1;ci=ReLU(ri+b)
Wherein b is a bias term; ReLU is a modified linear unit function;
finally, taking the maximum element max { c } in the convolution layer vector c obtained by convolution of each convolution window as a characteristic value, and extracting the characteristic value p corresponding to each convolution windowjJ is 1,2, …, w, where w is the total number of convolution windows, and all the characteristic values p are addedjSequentially splicing to form a semantic feature vector p, wherein p belongs to Rw
CN201911202779.4A 2019-11-29 2019-11-29 Method and system for classifying fault defect texts of power equipment Pending CN110895565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911202779.4A CN110895565A (en) 2019-11-29 2019-11-29 Method and system for classifying fault defect texts of power equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911202779.4A CN110895565A (en) 2019-11-29 2019-11-29 Method and system for classifying fault defect texts of power equipment

Publications (1)

Publication Number Publication Date
CN110895565A true CN110895565A (en) 2020-03-20

Family

ID=69786963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911202779.4A Pending CN110895565A (en) 2019-11-29 2019-11-29 Method and system for classifying fault defect texts of power equipment

Country Status (1)

Country Link
CN (1) CN110895565A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651601A (en) * 2020-06-02 2020-09-11 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN111737993A (en) * 2020-05-26 2020-10-02 浙江华云电力工程设计咨询有限公司 Method for extracting health state of equipment from fault defect text of power distribution network equipment
CN111767397A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Electric power system secondary equipment fault short text data classification method
CN111966825A (en) * 2020-07-16 2020-11-20 电子科技大学 Power grid equipment defect text classification method based on machine learning
CN112052622A (en) * 2020-08-11 2020-12-08 国网河北省电力有限公司 Defect disposal method for deep multi-view semantic document representation under cloud platform
CN113111183A (en) * 2021-04-20 2021-07-13 通号(长沙)轨道交通控制技术有限公司 Traction power supply equipment defect grade classification method
CN113392217A (en) * 2021-06-24 2021-09-14 广东电网有限责任公司 Method and device for extracting fault defect entity relationship of power equipment
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341195B1 (en) * 2007-10-04 2012-12-25 Corbis Corporation Platform for managing media assets for multi-model licensing over multi-level pricing and asset grouping
CN108021679A (en) * 2017-12-07 2018-05-11 国网山东省电力公司电力科学研究院 A kind of power equipments defect file classification method of parallelization
CN108921233A (en) * 2018-07-31 2018-11-30 武汉大学 A kind of Raman spectrum data classification method based on autoencoder network
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341195B1 (en) * 2007-10-04 2012-12-25 Corbis Corporation Platform for managing media assets for multi-model licensing over multi-level pricing and asset grouping
CN108021679A (en) * 2017-12-07 2018-05-11 国网山东省电力公司电力科学研究院 A kind of power equipments defect file classification method of parallelization
CN108921233A (en) * 2018-07-31 2018-11-30 武汉大学 A kind of Raman spectrum data classification method based on autoencoder network
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737993A (en) * 2020-05-26 2020-10-02 浙江华云电力工程设计咨询有限公司 Method for extracting health state of equipment from fault defect text of power distribution network equipment
CN111737993B (en) * 2020-05-26 2024-04-02 浙江华云电力工程设计咨询有限公司 Method for extracting equipment health state from fault defect text of power distribution network equipment
CN111651601A (en) * 2020-06-02 2020-09-11 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN111651601B (en) * 2020-06-02 2023-04-18 全球能源互联网研究院有限公司 Training method and classification method for fault classification model of power information system
CN111767397A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Electric power system secondary equipment fault short text data classification method
CN111966825A (en) * 2020-07-16 2020-11-20 电子科技大学 Power grid equipment defect text classification method based on machine learning
CN112052622A (en) * 2020-08-11 2020-12-08 国网河北省电力有限公司 Defect disposal method for deep multi-view semantic document representation under cloud platform
CN113111183A (en) * 2021-04-20 2021-07-13 通号(长沙)轨道交通控制技术有限公司 Traction power supply equipment defect grade classification method
CN113392217A (en) * 2021-06-24 2021-09-14 广东电网有限责任公司 Method and device for extracting fault defect entity relationship of power equipment
CN113392217B (en) * 2021-06-24 2022-06-10 广东电网有限责任公司 Method and device for extracting fault defect entity relationship of power equipment
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
CN115617990B (en) * 2022-09-28 2023-09-05 浙江大学 Power equipment defect short text classification method and system based on deep learning algorithm

Similar Documents

Publication Publication Date Title
CN110895565A (en) Method and system for classifying fault defect texts of power equipment
CN112149316B (en) Aero-engine residual life prediction method based on improved CNN model
CN110929918B (en) 10kV feeder fault prediction method based on CNN and LightGBM
CN107169628B (en) Power distribution network reliability assessment method based on big data mutual information attribute reduction
CN107908716A (en) 95598 work order text mining method and apparatus of word-based vector model
CN113283027B (en) Mechanical fault diagnosis method based on knowledge graph and graph neural network
CN111767397A (en) Electric power system secondary equipment fault short text data classification method
CN113723010B (en) Bridge damage early warning method based on LSTM temperature-displacement correlation model
CN110188047A (en) A kind of repeated defects report detection method based on binary channels convolutional neural networks
CN109740164B (en) Electric power defect grade identification method based on depth semantic matching
CN111309859B (en) Scenic spot network public praise emotion analysis method and device
CN112199496A (en) Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network)
CN111159396B (en) Method for establishing text data classification hierarchical model facing data sharing exchange
CN109710766B (en) Complaint tendency analysis early warning method and device for work order data
CN112836809A (en) Device characteristic extraction method and fault prediction method of convolutional neural network based on differential feature fusion
CN110097096A (en) A kind of file classification method based on TF-IDF matrix and capsule network
DE102021130081A1 (en) AUTOMATIC ONTOLOGY EXTRACTION BASED ON DEEP LEARNING TO CAPTURE NEW AREAS OF KNOWLEDGE
CN115359799A (en) Speech recognition method, training method, device, electronic equipment and storage medium
CN117115581A (en) Intelligent misoperation early warning method and system based on multi-mode deep learning
CN110059938B (en) Power distribution network planning method based on association rule driving
CN111177010A (en) Software defect severity identification method
CN112559741B (en) Nuclear power equipment defect record text classification method, system, medium and electronic equipment
CN114492392A (en) Annual report risk mining system and method based on phrase vector construction
CN114357171A (en) Emergency event processing method and device, storage medium and electronic equipment
CN113111183A (en) Traction power supply equipment defect grade classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200320