CN108874984A - A kind of increased quality method to second-rate grid equipment defect text - Google Patents

A kind of increased quality method to second-rate grid equipment defect text Download PDF

Info

Publication number
CN108874984A
CN108874984A CN201810597110.9A CN201810597110A CN108874984A CN 108874984 A CN108874984 A CN 108874984A CN 201810597110 A CN201810597110 A CN 201810597110A CN 108874984 A CN108874984 A CN 108874984A
Authority
CN
China
Prior art keywords
text
defect
standard
vector
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810597110.9A
Other languages
Chinese (zh)
Other versions
CN108874984B (en
Inventor
王慧芳
邵冠宇
何奔腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201810597110.9A priority Critical patent/CN108874984B/en
Publication of CN108874984A publication Critical patent/CN108874984A/en
Application granted granted Critical
Publication of CN108874984B publication Critical patent/CN108874984B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities

Abstract

The invention proposes a kind of increased quality methods to second-rate grounding grid defect text.The present invention is modified firstly, for text second-rate in historic defects text using the potential Di Li Cray distributed model in Chinese text similarity calculation in conjunction with the power transmission and transformation primary equipment defect classification standard of State Grid Corporation of China to promote quality;Then, for new typing text, quality problems prompt is carried out using text quality's detection method, discriminant amendment is provided using term vector mapping method, guarantees the quality of new typing defect text.Finally, carrying out quality versus in conjunction with defect text of the example to amendment front and back, the classification by defect rank is carried out using machine learning and deep learning classification method to the defect text of amendment front and back, verifies the validity to second-rate defect text quality method for improving.The quality that the present invention has standardized defect text from source, ensure that defect text provides more reliable accurate text data for defect text mining.

Description

A kind of increased quality method to second-rate grid equipment defect text
Technical field
The invention belongs to field of power system, specifically a kind of quality to second-rate grid equipment defect text Method for improving.
Background technique
With the deep propulsion that smart grid is built, electric system links produce the multi-source heterogeneous data of magnanimity, Increase the most rapidly using text, audio, image as the unstructured data of representative.The text of grid equipment defect is wherein described, Contain the information the closest with equipment and power grid security, receives technology and the attention of administrative staff, for example lack to grasp Rule or equipment quality situation are fallen into, the classification and statistics at various visual angles are carried out to defect.Due to manually dividing defect text As a result class and statistics, heavy workload, low efficiency depend on human subjective's experience, the digging efficiency for how improving defect text is Problem to be solved.
Currently, natural language processing technique is increasingly mature, using machine learning method or deep learning method to Chinese Text is excavated can realize.Actual grid equipment defect text is usually present lack of standardization caused by some a variety of causes ask Topic, such as describe it is imperfect, have ambiguity, if there are the texts of quality problems as effective text excavates using these, can give Result brings certain deviation.Therefore a kind of method that increased quality is carried out to second-rate text is needed, is set for power grid Standby defect text mining is provided with the text of quality assurance.
Compared with the structural data Research on Mining of power grid, non-structured text data digging research is also relatively fewer. Currently, foreign countries have scholar to be studied by data mining means power grid historical failure text, lacked to include in text It is trapped into and has gone statistics, but research object is the fault ticket with relatively strong rule.Excavation major part needle of the country to power grid text To operation order automatically generating, with very strong normalization.Grid equipment defect text is due to semantic increasingly complex, progress text It excavates and has more difficulty.Some researchs are directed to grid equipment defect text, have carried out the excavation of different purposes, however the problem of general character It is that Result is affected by defect text quality.It is still public without promoting the method for text quality at present for text quality Develop table.
Summary of the invention
The technical problem to be solved by the present invention is to due to grid equipment defect text quality there are aiming at the problem that electricity Online article this Result bring deviation proposes a kind of method promoted to second-rate grid equipment defect text quality.
The technical solution adopted for solving the technical problem of the present invention is:
Firstly, using the Chinese text similarity calculating method of natural language processing field, it is defeated in conjunction with State Grid Corporation of China Power transformation primary equipment defect classification standard (referred to as " standard "), finds out from standard and states shape with the most like standard of actual defects Formula.By defect text by defect rank classify, the defect text that binding deficient text quality detection method is found out there are the problem of, Second-rate historic defects text is modified, realizes the promotion to historic defects text quality.It is led using deep learning A kind of text representation model in domain, term vector map (word2vec) model, and binding deficient text quality detection method obtains Score of the defect text in different indexs provides the specific discriminant amendment of a new typing defect text, realizes to new typing The quality assurance of defect text.
Then, the defect text of amendment front and back is compared, and using in machine learning and deep learning it is existing not Classify to defect text by defect rank with file classification method, it is accurate by amendment front and back quality measurements and classification The validity of rate verification quality method for improving.
Beneficial effects of the present invention:On the basis of grid equipment defect text quality's testing result, for endless Whole, the problems such as not specific, redundancy is excessively high, defect rank and defect description mismatch second-rate defect text, invention A kind of increased quality method.Chinese text similarity measurement algorithm is improved, the word for having modified " text-word " matrix adds Power mode, and dimensionality reduction is carried out using potential Di Li Cray distributed model, the corresponding state of actual defects text is found out using JS distance Standard sentence in family's power grid power transmission and transformation primary equipment defect classification standard, is modified second-rate defect text.It is right New typing text, if there are quality problems for discovery, provides discriminant amendment after quality testing.It is revised to lack through Example Verification It falls into text to have a distinct increment on quality measurements, more using result when machine learning and convolutional neural networks category of model Accurately, it was demonstrated that the validity of increased quality method.The present invention is that the poor grounding grid defect text of actual mass proposes one kind Increased quality method, the quality that defect text has been standardized from source, ensure that defect text provide for defect text mining More reliable accurate text data, and then text mining effect is improved, while being also the quality of other texts of grid equipment Promotion provides demonstration.
Detailed description of the invention
Fig. 1 historic defects text quality promotes process;
The average result of Fig. 2 amendment front and back different type equipment deficiency text quality detection.
Specific embodiment
Present invention combination State Grid Corporation of China power transmission and transformation primary equipment defect classification standard, utilizes natural language processing field Chinese text similarity calculating method and term vector mapping model, invented and a kind of second-rate historic defects text repaired Correction method, process is as shown in Figure 1, and invented the method for ensuring quality of new typing defect text.Then to actual defects text Second-rate text carries out increased quality in this, according to the amendment front and back text classifications side such as quality measurements and machine learning The validity that the classification accuracy verification quality of method is promoted.Specific step is as follows:
Step 1. utilizes the Chinese text similarity calculating method of natural language processing field, to the defeated change of State Grid Corporation of China Electric primary equipment defect classification standard (referred to as " standard ") and actual history defect text are handled, and " classification standard-master is generated Topic " matrix and " defect text-theme " matrix, specific method are:
(1) standard and actual defects text are segmented and is removed the pretreatment of stop words, then generate " word respectively Language-defect text " matrix and " word-standard " matrix, the row vector in matrix is defect text vector and standard vector, square Different lines in battle array represent different terms, and the word weighting scheme in matrix is weighted using " tf-idf ", as follows:aij=tfij* idfi,
Wherein, aijRepresent term weighing;tfijIt is word i in defect text The frequency occurred in this j;idfiRepresent the frequency inverse for the text of word i occur;Ndoc is defect text sum;gfiRepresent the frequency that word i occurs in all defect text.
(2) using potential Di Li Cray distribution (Latent Dirichlet Allocation, LDA) model to using upper " word-standard " matrix that predicate language weighting scheme obtains carries out dimensionality reduction, generates " standard-theme " matrix Z, then lacking by standard It falls into grade " critical, serious, general ", generates three " classification standard-theme " matrix Z1,Z2,Z3。Z,Z1,Z2,Z3Column vector be For the corresponding standard vector of standard speech sentence.To historic defects text, generator matrix Z, Z are utilized1,Z2,Z3When the LDA that has determined The defect rank of model parameter and historic defects text generates historic defects text to " word-defect text " matrix dimensionality reduction This vector q and control by kinds defect text vector q1(or q2、q3), different historic defects text vectors constitutes " defect text- Theme " matrix.
Step 2. combines the defect rank of actual defects text, is modified to second-rate historic defects text, specifically Method is:
(1) similarity between defect text vector q and matrix Z Plays vector is calculated using JS distance, finds out similarity Highest standard vector, and then judge the semantic similarity degree of actual defects text and standard.JS is apart from expression formula:sim(q1, z) value range be (0,1);Q1 is defect text vector, and z is standard Vector;DKLFor the KL distance for equally indicating probability vector difference, such as following formula:zjAnd q1jFor vector z and q1 In element;
(2) by judging whether the similarity of actual defects text and standard is greater than whether 0.6, quality score s is greater than 70 Divide and whether the defect rank of actual defects text is consistent with the defect rank of most like standard, to historic defects text In defect rank that may be present describe unmatched problem with defect and corrected;
(3) for having the defect text for the defect rank for guaranteeing correctness, above-mentioned judge index is equally used, in matrix Z1,Z2,Z3In find out and control by kinds defect text vector q1(or q2、q3) most like standard vector, utilize standard vector pair The received text answered is modified second-rate historic defects text.
Step 3. utilizes a kind of text representation model in deep learning field --- and term vector maps (word2vec) model, In conjunction with score of the defect text obtained by defect text quality detection method in different indexs, a new typing defect is provided The specific discriminant amendment of text, method are as follows:
(1) first according to score of the new typing defect text on different Testing index judge its defect description or Problem in equipment layering;
(2) for sufficiently complete and redundancy the problem of, can be modified according to method in step 2, equipment is layered inadequate Accurate problem is utilized using the term vector of different terms in the layering of word2vec model generation device according to already present word Cosine similarity finds out the word that most probable lacks, the discriminant amendment as the layering of completion equipment.
Step 4. carries out increased quality to text second-rate in actual electric network equipment deficiency text, to amendment front and back Defect text is compared, and is pressed using difference file classification method existing in machine learning and deep learning to defect text Defect rank is classified, by amendment front and back quality measurements and classification accuracy verification quality method for improving it is effective Property.
Application examples
Grid equipment defect proposed by the present invention text quality's method for improving is applied to actual 25000 a plurality of inhomogeneities The defect text of type equipment.By taking main transformer as an example, the historic defects text and quality measurements such as table of amendment front and back are provided 1, amendment front and back defect rank is constant, therefore does not repeat to list.As it can be seen from table 1 the equipment point before the amendment of two strip defect texts Layer is inaccurate, lacks oil leak speed and discoloration in defect description, completion after amendment.
The historic defects text and its quality measurements of the amendment of table 1 front and back
Average result such as Fig. 2 of amendment front and back quality testing is provided to the defect text of five class equipment, it can be seen that different The average result that type equipment defect text is corrected rear quality testing has different degrees of promotion.
By taking main transformer as an example, the index score and discriminant amendment of new typing text are provided using aforementioned increased quality method Such as table 2.By taking first text in table as an example, providing discriminant amendment, detailed process is as follows:Judge first with string matching Text is described as " main transformer ", " loaded switch ", " tap switch ", " respirator " in equipment layering out, is located at equipment The device type of layering, component, variety of components, position level, missing layer are " device category ";Since accuracy score is greater than 0.6 less than 1, obtains the term vector of four words using word2vec, find out positioned at " device category " layer and with it is aforementioned four to Measuring the maximum word of included angle cosine average value is " oil-immersed type transformer ", is layered missing description as equipment;Since integrity degree is scored at 0.8, using aforementioned similarity calculating method, the defect level part for obtaining most like standard sentence is that " silica gel deliquesces discoloration portion Divide 2/3 " more than total amount, the defect level for the new typing defect text of staff's completion provides reference.
Score and discriminant amendment of the new typing text of table 2 in different indexs
The historic defects text of amendment front and back is divided using machine learning and convolutional neural networks (CNN) disaggregated model Class.By taking main transformer as an example, classification results such as table 3 and table 4.Wherein, serious error rate is defined as " general " being mistakenly classified as " danger The case where suddenly " or " critical " is mistakenly classified as " general " accounts for the percentage of text sum.It can from accuracy rate and serious error rate Defect text classification results after amendment have out is obviously improved, it was demonstrated that second-rate defect text quality method for improving Validity.
3 machine learning model of table counts amendment front and back main transformer defect text classification result
4 CNN model of table counts amendment front and back main transformer defect text classification result

Claims (3)

1. a kind of increased quality method to second-rate grid equipment defect text, it is characterised in that this method includes following step Suddenly:
Step 1. utilizes the Chinese text similarity calculating method of natural language processing field, to State Grid Corporation of China's power transmission and transformation one Secondary device defect classification standard (referred to as " standard ") and actual history defect text are handled, and are generated " classification standard-theme " Matrix and " defect text-theme " matrix, specifically:
(1) standard and actual defects text are segmented and is removed the pretreatment of stop words, then generate " word-defect text This " matrix and " word-standard " matrix, the row vector in matrix is defect text vector and standard vector, in matrix not Same column represents different terms;
(2) using potential Di Li Cray distribution (Latent Dirichlet Allocation, LDA) model to using upper predicate " word-standard " matrix that language weighting scheme obtains carries out dimensionality reduction, generates " standard-theme " matrix Z, then press the defect etc. of standard Grade " critical, serious, general " generates three " classification standard-theme " matrix Z1,Z2,Z3;Z,Z1,Z2,Z3Column vector be mark The corresponding standard vector of quasi- sentence;To historic defects text, generator matrix Z, Z are utilized1,Z2,Z3When the LDA model that has determined The defect rank of parameter and historic defects text, to " word-defect text " matrix dimensionality reduction, generate historic defects text to Measure q and control by kinds defect text vector q1(or q2、q3), different historic defects text vectors constitutes " defect text-theme " Matrix;
Step 2. classifies defect text by defect rank, is modified using step 1 to second-rate historic defects text, Specifically:
(1) similarity between defect text vector q and matrix Z Plays vector is calculated using JS distance, finds out similarity highest Standard vector, and then judge the semantic similarity degree of actual defects text and standard;
(2) by judging whether the similarity of actual defects text and standard is greater than whether 0.6, quality score s is greater than 70 points, with And whether the defect rank of actual defects text is consistent with the defect rank of most like standard, to possible in historic defects text Existing defect rank describes unmatched problem with defect and is corrected;
(3) for having the defect text for the defect rank for guaranteeing correctness, also with above-mentioned judge index, in matrix Z1, Z2,Z3In find out and control by kinds defect text vector q1Or q2、q3Most like standard vector utilizes the corresponding mark of standard vector Quasi- text is modified second-rate historic defects text;
Step 3. utilizes a kind of text representation model --- the term vector mapping model in deep learning field, in conjunction with by defect text Score of the defect text that quality evaluating method obtains in different indexs provides the specific amendment of a new typing defect text It is recommended that specific as follows:
(1) judge it in defect description or equipment according to score of the new typing defect text in different evaluation index first Problem in layering;
(2) for sufficiently complete and redundancy the problem of, is modified according to step 2, and inaccurate problem is layered for equipment, It is similar using cosine according to already present word using the term vector of different terms in the layering of term vector mapping model generating device Degree finds out the word that most probable lacks, the discriminant amendment as the layering of completion equipment;
Step 4. carries out increased quality to text second-rate in actual electric network equipment deficiency text, to the defect of amendment front and back Text is compared, and presses defect to defect text using different file classification methods existing in machine learning and deep learning Grade is classified, and the validity of amendment front and back quality measurements and classification accuracy verification quality method for improving is passed through.
2. a kind of increased quality method to second-rate grid equipment defect text according to claim 1, feature It is:Word weighting scheme in " word-defect text " matrix and " word-standard " matrix is weighted using " tf-idf ", such as Under:aij=tfij*idfi,Wherein, aijRepresent term weighing;tfijIt is word i The frequency occurred in defect text j;idfiRepresent the frequency inverse for the text of word i occur;Ndoc is defect text sum;gfiRepresent the frequency that word i occurs in all defect text.
3. a kind of increased quality method to second-rate grid equipment defect text according to claim 1, feature It is:JS is apart from expression formula:sim(q1, z) value range be (0,1);q1It is scarce Text vector is fallen into, z is standard vector;DKLFor the KL distance for equally indicating probability vector difference, such as following formula: zjWithq1jFor vector z and q1In element.
CN201810597110.9A 2018-06-11 2018-06-11 Quality improvement method for poor-quality power grid equipment defect text Active CN108874984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810597110.9A CN108874984B (en) 2018-06-11 2018-06-11 Quality improvement method for poor-quality power grid equipment defect text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810597110.9A CN108874984B (en) 2018-06-11 2018-06-11 Quality improvement method for poor-quality power grid equipment defect text

Publications (2)

Publication Number Publication Date
CN108874984A true CN108874984A (en) 2018-11-23
CN108874984B CN108874984B (en) 2021-01-01

Family

ID=64337750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810597110.9A Active CN108874984B (en) 2018-06-11 2018-06-11 Quality improvement method for poor-quality power grid equipment defect text

Country Status (1)

Country Link
CN (1) CN108874984B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321425A (en) * 2019-07-11 2019-10-11 云南电网有限责任公司电力科学研究院 A kind of judgment method and device of grounding grid defect type
CN111048167A (en) * 2019-10-31 2020-04-21 中电药明数据科技(成都)有限公司 Hierarchical case structuring method and system
CN111199148A (en) * 2019-12-26 2020-05-26 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
CN113610112A (en) * 2021-07-09 2021-11-05 中国商用飞机有限责任公司上海飞机设计研究院 Auxiliary decision-making method for airplane assembly quality defects
CN114416988A (en) * 2022-01-17 2022-04-29 国网福建省电力有限公司 Defect automatic rating and disposal suggestion pushing method based on natural language processing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303296A (en) * 2015-09-29 2016-02-03 国网浙江省电力公司电力科学研究院 Electric power equipment full-life state evaluation method
CN105955960A (en) * 2016-05-06 2016-09-21 浙江大学 Semantic frame-based power grid defect text mining method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303296A (en) * 2015-09-29 2016-02-03 国网浙江省电力公司电力科学研究院 Electric power equipment full-life state evaluation method
CN105955960A (en) * 2016-05-06 2016-09-21 浙江大学 Semantic frame-based power grid defect text mining method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹靖 等: "基于语义框架的电网缺陷文本挖掘技术及其应用", 《电网技术》 *
秦璇: "电力统计数据的质量评估及其异常检测方法", 《万方数据知识服务平台》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321425A (en) * 2019-07-11 2019-10-11 云南电网有限责任公司电力科学研究院 A kind of judgment method and device of grounding grid defect type
CN110321425B (en) * 2019-07-11 2023-07-21 云南电网有限责任公司电力科学研究院 Method and device for judging defect type of power grid
CN111048167A (en) * 2019-10-31 2020-04-21 中电药明数据科技(成都)有限公司 Hierarchical case structuring method and system
CN111048167B (en) * 2019-10-31 2023-08-18 中电药明数据科技(成都)有限公司 Hierarchical case structuring method and system
CN111199148A (en) * 2019-12-26 2020-05-26 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
CN111199148B (en) * 2019-12-26 2023-01-20 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
CN113610112A (en) * 2021-07-09 2021-11-05 中国商用飞机有限责任公司上海飞机设计研究院 Auxiliary decision-making method for airplane assembly quality defects
CN113610112B (en) * 2021-07-09 2024-04-16 中国商用飞机有限责任公司上海飞机设计研究院 Auxiliary decision-making method for aircraft assembly quality defects
CN114416988A (en) * 2022-01-17 2022-04-29 国网福建省电力有限公司 Defect automatic rating and disposal suggestion pushing method based on natural language processing

Also Published As

Publication number Publication date
CN108874984B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN108874984A (en) A kind of increased quality method to second-rate grid equipment defect text
Pavlick et al. Simple PPDB: A paraphrase database for simplification
Kingston et al. Formative assessment: A meta‐analysis and a call for research
CN102682130B (en) Text sentiment classification method and system
CN104820629A (en) Intelligent system and method for emergently processing public sentiment emergency
CN101587155A (en) Oil soaked transformer fault diagnosis method
CN107132310A (en) Transformer equipment health status method of discrimination based on gauss hybrid models
CN105117398A (en) Software development problem automatic answering method based on crowdsourcing
CN104123678A (en) Electricity relay protection status overhaul method based on status grade evaluation model
CN104268134A (en) Subjective and objective classifier building method and system
CN110133410A (en) Diagnosis Method of Transformer Faults and system based on Fuzzy C-Means Cluster Algorithm
CN105632488A (en) Voice evaluation method and device
CN109918720A (en) Diagnosis Method of Transformer Faults based on krill group's Support Vector Machines Optimized
CN106779455A (en) The methods of risk assessment and system of a kind of translation project
Quijada et al. Hmc at semeval-2016 task 11: Identifying complex words using depth-limited decision trees
Lazaridou et al. A multitask objective to inject lexical contrast into distributional semantics
CN105117847A (en) Method for evaluating transformer failure importance
CN110309309B (en) Method and system for evaluating quality of manual labeling data
CN112037590A (en) Block chain online learning sharing system
CN109283293B (en) Power transformer fault diagnosis method based on coefficient of variation and TOPSIS method
CN109284504A (en) It grinds to call the score using the security of deep learning model and analyses method and device
CN105741184A (en) Transformer state evaluation method and apparatus
CN111369140A (en) Teaching evaluation system and method
Jagadamba Online subjective answer verifying system using artificial intelligence
Ke et al. Autoscoring essays based on complex networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant