CN108874984A - A kind of increased quality method to second-rate grid equipment defect text - Google Patents
A kind of increased quality method to second-rate grid equipment defect text Download PDFInfo
- Publication number
- CN108874984A CN108874984A CN201810597110.9A CN201810597110A CN108874984A CN 108874984 A CN108874984 A CN 108874984A CN 201810597110 A CN201810597110 A CN 201810597110A CN 108874984 A CN108874984 A CN 108874984A
- Authority
- CN
- China
- Prior art keywords
- text
- defect
- standard
- vector
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
Abstract
The invention proposes a kind of increased quality methods to second-rate grounding grid defect text.The present invention is modified firstly, for text second-rate in historic defects text using the potential Di Li Cray distributed model in Chinese text similarity calculation in conjunction with the power transmission and transformation primary equipment defect classification standard of State Grid Corporation of China to promote quality;Then, for new typing text, quality problems prompt is carried out using text quality's detection method, discriminant amendment is provided using term vector mapping method, guarantees the quality of new typing defect text.Finally, carrying out quality versus in conjunction with defect text of the example to amendment front and back, the classification by defect rank is carried out using machine learning and deep learning classification method to the defect text of amendment front and back, verifies the validity to second-rate defect text quality method for improving.The quality that the present invention has standardized defect text from source, ensure that defect text provides more reliable accurate text data for defect text mining.
Description
Technical field
The invention belongs to field of power system, specifically a kind of quality to second-rate grid equipment defect text
Method for improving.
Background technique
With the deep propulsion that smart grid is built, electric system links produce the multi-source heterogeneous data of magnanimity,
Increase the most rapidly using text, audio, image as the unstructured data of representative.The text of grid equipment defect is wherein described,
Contain the information the closest with equipment and power grid security, receives technology and the attention of administrative staff, for example lack to grasp
Rule or equipment quality situation are fallen into, the classification and statistics at various visual angles are carried out to defect.Due to manually dividing defect text
As a result class and statistics, heavy workload, low efficiency depend on human subjective's experience, the digging efficiency for how improving defect text is
Problem to be solved.
Currently, natural language processing technique is increasingly mature, using machine learning method or deep learning method to Chinese
Text is excavated can realize.Actual grid equipment defect text is usually present lack of standardization caused by some a variety of causes ask
Topic, such as describe it is imperfect, have ambiguity, if there are the texts of quality problems as effective text excavates using these, can give
Result brings certain deviation.Therefore a kind of method that increased quality is carried out to second-rate text is needed, is set for power grid
Standby defect text mining is provided with the text of quality assurance.
Compared with the structural data Research on Mining of power grid, non-structured text data digging research is also relatively fewer.
Currently, foreign countries have scholar to be studied by data mining means power grid historical failure text, lacked to include in text
It is trapped into and has gone statistics, but research object is the fault ticket with relatively strong rule.Excavation major part needle of the country to power grid text
To operation order automatically generating, with very strong normalization.Grid equipment defect text is due to semantic increasingly complex, progress text
It excavates and has more difficulty.Some researchs are directed to grid equipment defect text, have carried out the excavation of different purposes, however the problem of general character
It is that Result is affected by defect text quality.It is still public without promoting the method for text quality at present for text quality
Develop table.
Summary of the invention
The technical problem to be solved by the present invention is to due to grid equipment defect text quality there are aiming at the problem that electricity
Online article this Result bring deviation proposes a kind of method promoted to second-rate grid equipment defect text quality.
The technical solution adopted for solving the technical problem of the present invention is:
Firstly, using the Chinese text similarity calculating method of natural language processing field, it is defeated in conjunction with State Grid Corporation of China
Power transformation primary equipment defect classification standard (referred to as " standard "), finds out from standard and states shape with the most like standard of actual defects
Formula.By defect text by defect rank classify, the defect text that binding deficient text quality detection method is found out there are the problem of,
Second-rate historic defects text is modified, realizes the promotion to historic defects text quality.It is led using deep learning
A kind of text representation model in domain, term vector map (word2vec) model, and binding deficient text quality detection method obtains
Score of the defect text in different indexs provides the specific discriminant amendment of a new typing defect text, realizes to new typing
The quality assurance of defect text.
Then, the defect text of amendment front and back is compared, and using in machine learning and deep learning it is existing not
Classify to defect text by defect rank with file classification method, it is accurate by amendment front and back quality measurements and classification
The validity of rate verification quality method for improving.
Beneficial effects of the present invention:On the basis of grid equipment defect text quality's testing result, for endless
Whole, the problems such as not specific, redundancy is excessively high, defect rank and defect description mismatch second-rate defect text, invention
A kind of increased quality method.Chinese text similarity measurement algorithm is improved, the word for having modified " text-word " matrix adds
Power mode, and dimensionality reduction is carried out using potential Di Li Cray distributed model, the corresponding state of actual defects text is found out using JS distance
Standard sentence in family's power grid power transmission and transformation primary equipment defect classification standard, is modified second-rate defect text.It is right
New typing text, if there are quality problems for discovery, provides discriminant amendment after quality testing.It is revised to lack through Example Verification
It falls into text to have a distinct increment on quality measurements, more using result when machine learning and convolutional neural networks category of model
Accurately, it was demonstrated that the validity of increased quality method.The present invention is that the poor grounding grid defect text of actual mass proposes one kind
Increased quality method, the quality that defect text has been standardized from source, ensure that defect text provide for defect text mining
More reliable accurate text data, and then text mining effect is improved, while being also the quality of other texts of grid equipment
Promotion provides demonstration.
Detailed description of the invention
Fig. 1 historic defects text quality promotes process;
The average result of Fig. 2 amendment front and back different type equipment deficiency text quality detection.
Specific embodiment
Present invention combination State Grid Corporation of China power transmission and transformation primary equipment defect classification standard, utilizes natural language processing field
Chinese text similarity calculating method and term vector mapping model, invented and a kind of second-rate historic defects text repaired
Correction method, process is as shown in Figure 1, and invented the method for ensuring quality of new typing defect text.Then to actual defects text
Second-rate text carries out increased quality in this, according to the amendment front and back text classifications side such as quality measurements and machine learning
The validity that the classification accuracy verification quality of method is promoted.Specific step is as follows:
Step 1. utilizes the Chinese text similarity calculating method of natural language processing field, to the defeated change of State Grid Corporation of China
Electric primary equipment defect classification standard (referred to as " standard ") and actual history defect text are handled, and " classification standard-master is generated
Topic " matrix and " defect text-theme " matrix, specific method are:
(1) standard and actual defects text are segmented and is removed the pretreatment of stop words, then generate " word respectively
Language-defect text " matrix and " word-standard " matrix, the row vector in matrix is defect text vector and standard vector, square
Different lines in battle array represent different terms, and the word weighting scheme in matrix is weighted using " tf-idf ", as follows:aij=tfij*
idfi,
Wherein, aijRepresent term weighing;tfijIt is word i in defect text
The frequency occurred in this j;idfiRepresent the frequency inverse for the text of word i occur;Ndoc is defect text sum;gfiRepresent the frequency that word i occurs in all defect text.
(2) using potential Di Li Cray distribution (Latent Dirichlet Allocation, LDA) model to using upper
" word-standard " matrix that predicate language weighting scheme obtains carries out dimensionality reduction, generates " standard-theme " matrix Z, then lacking by standard
It falls into grade " critical, serious, general ", generates three " classification standard-theme " matrix Z1,Z2,Z3。Z,Z1,Z2,Z3Column vector be
For the corresponding standard vector of standard speech sentence.To historic defects text, generator matrix Z, Z are utilized1,Z2,Z3When the LDA that has determined
The defect rank of model parameter and historic defects text generates historic defects text to " word-defect text " matrix dimensionality reduction
This vector q and control by kinds defect text vector q1(or q2、q3), different historic defects text vectors constitutes " defect text-
Theme " matrix.
Step 2. combines the defect rank of actual defects text, is modified to second-rate historic defects text, specifically
Method is:
(1) similarity between defect text vector q and matrix Z Plays vector is calculated using JS distance, finds out similarity
Highest standard vector, and then judge the semantic similarity degree of actual defects text and standard.JS is apart from expression formula:sim(q1, z) value range be (0,1);Q1 is defect text vector, and z is standard
Vector;DKLFor the KL distance for equally indicating probability vector difference, such as following formula:zjAnd q1jFor vector z and q1
In element;
(2) by judging whether the similarity of actual defects text and standard is greater than whether 0.6, quality score s is greater than 70
Divide and whether the defect rank of actual defects text is consistent with the defect rank of most like standard, to historic defects text
In defect rank that may be present describe unmatched problem with defect and corrected;
(3) for having the defect text for the defect rank for guaranteeing correctness, above-mentioned judge index is equally used, in matrix
Z1,Z2,Z3In find out and control by kinds defect text vector q1(or q2、q3) most like standard vector, utilize standard vector pair
The received text answered is modified second-rate historic defects text.
Step 3. utilizes a kind of text representation model in deep learning field --- and term vector maps (word2vec) model,
In conjunction with score of the defect text obtained by defect text quality detection method in different indexs, a new typing defect is provided
The specific discriminant amendment of text, method are as follows:
(1) first according to score of the new typing defect text on different Testing index judge its defect description or
Problem in equipment layering;
(2) for sufficiently complete and redundancy the problem of, can be modified according to method in step 2, equipment is layered inadequate
Accurate problem is utilized using the term vector of different terms in the layering of word2vec model generation device according to already present word
Cosine similarity finds out the word that most probable lacks, the discriminant amendment as the layering of completion equipment.
Step 4. carries out increased quality to text second-rate in actual electric network equipment deficiency text, to amendment front and back
Defect text is compared, and is pressed using difference file classification method existing in machine learning and deep learning to defect text
Defect rank is classified, by amendment front and back quality measurements and classification accuracy verification quality method for improving it is effective
Property.
Application examples
Grid equipment defect proposed by the present invention text quality's method for improving is applied to actual 25000 a plurality of inhomogeneities
The defect text of type equipment.By taking main transformer as an example, the historic defects text and quality measurements such as table of amendment front and back are provided
1, amendment front and back defect rank is constant, therefore does not repeat to list.As it can be seen from table 1 the equipment point before the amendment of two strip defect texts
Layer is inaccurate, lacks oil leak speed and discoloration in defect description, completion after amendment.
The historic defects text and its quality measurements of the amendment of table 1 front and back
Average result such as Fig. 2 of amendment front and back quality testing is provided to the defect text of five class equipment, it can be seen that different
The average result that type equipment defect text is corrected rear quality testing has different degrees of promotion.
By taking main transformer as an example, the index score and discriminant amendment of new typing text are provided using aforementioned increased quality method
Such as table 2.By taking first text in table as an example, providing discriminant amendment, detailed process is as follows:Judge first with string matching
Text is described as " main transformer ", " loaded switch ", " tap switch ", " respirator " in equipment layering out, is located at equipment
The device type of layering, component, variety of components, position level, missing layer are " device category ";Since accuracy score is greater than
0.6 less than 1, obtains the term vector of four words using word2vec, find out positioned at " device category " layer and with it is aforementioned four to
Measuring the maximum word of included angle cosine average value is " oil-immersed type transformer ", is layered missing description as equipment;Since integrity degree is scored at
0.8, using aforementioned similarity calculating method, the defect level part for obtaining most like standard sentence is that " silica gel deliquesces discoloration portion
Divide 2/3 " more than total amount, the defect level for the new typing defect text of staff's completion provides reference.
Score and discriminant amendment of the new typing text of table 2 in different indexs
The historic defects text of amendment front and back is divided using machine learning and convolutional neural networks (CNN) disaggregated model
Class.By taking main transformer as an example, classification results such as table 3 and table 4.Wherein, serious error rate is defined as " general " being mistakenly classified as " danger
The case where suddenly " or " critical " is mistakenly classified as " general " accounts for the percentage of text sum.It can from accuracy rate and serious error rate
Defect text classification results after amendment have out is obviously improved, it was demonstrated that second-rate defect text quality method for improving
Validity.
3 machine learning model of table counts amendment front and back main transformer defect text classification result
4 CNN model of table counts amendment front and back main transformer defect text classification result
Claims (3)
1. a kind of increased quality method to second-rate grid equipment defect text, it is characterised in that this method includes following step
Suddenly:
Step 1. utilizes the Chinese text similarity calculating method of natural language processing field, to State Grid Corporation of China's power transmission and transformation one
Secondary device defect classification standard (referred to as " standard ") and actual history defect text are handled, and are generated " classification standard-theme "
Matrix and " defect text-theme " matrix, specifically:
(1) standard and actual defects text are segmented and is removed the pretreatment of stop words, then generate " word-defect text
This " matrix and " word-standard " matrix, the row vector in matrix is defect text vector and standard vector, in matrix not
Same column represents different terms;
(2) using potential Di Li Cray distribution (Latent Dirichlet Allocation, LDA) model to using upper predicate
" word-standard " matrix that language weighting scheme obtains carries out dimensionality reduction, generates " standard-theme " matrix Z, then press the defect etc. of standard
Grade " critical, serious, general " generates three " classification standard-theme " matrix Z1,Z2,Z3;Z,Z1,Z2,Z3Column vector be mark
The corresponding standard vector of quasi- sentence;To historic defects text, generator matrix Z, Z are utilized1,Z2,Z3When the LDA model that has determined
The defect rank of parameter and historic defects text, to " word-defect text " matrix dimensionality reduction, generate historic defects text to
Measure q and control by kinds defect text vector q1(or q2、q3), different historic defects text vectors constitutes " defect text-theme "
Matrix;
Step 2. classifies defect text by defect rank, is modified using step 1 to second-rate historic defects text,
Specifically:
(1) similarity between defect text vector q and matrix Z Plays vector is calculated using JS distance, finds out similarity highest
Standard vector, and then judge the semantic similarity degree of actual defects text and standard;
(2) by judging whether the similarity of actual defects text and standard is greater than whether 0.6, quality score s is greater than 70 points, with
And whether the defect rank of actual defects text is consistent with the defect rank of most like standard, to possible in historic defects text
Existing defect rank describes unmatched problem with defect and is corrected;
(3) for having the defect text for the defect rank for guaranteeing correctness, also with above-mentioned judge index, in matrix Z1,
Z2,Z3In find out and control by kinds defect text vector q1Or q2、q3Most like standard vector utilizes the corresponding mark of standard vector
Quasi- text is modified second-rate historic defects text;
Step 3. utilizes a kind of text representation model --- the term vector mapping model in deep learning field, in conjunction with by defect text
Score of the defect text that quality evaluating method obtains in different indexs provides the specific amendment of a new typing defect text
It is recommended that specific as follows:
(1) judge it in defect description or equipment according to score of the new typing defect text in different evaluation index first
Problem in layering;
(2) for sufficiently complete and redundancy the problem of, is modified according to step 2, and inaccurate problem is layered for equipment,
It is similar using cosine according to already present word using the term vector of different terms in the layering of term vector mapping model generating device
Degree finds out the word that most probable lacks, the discriminant amendment as the layering of completion equipment;
Step 4. carries out increased quality to text second-rate in actual electric network equipment deficiency text, to the defect of amendment front and back
Text is compared, and presses defect to defect text using different file classification methods existing in machine learning and deep learning
Grade is classified, and the validity of amendment front and back quality measurements and classification accuracy verification quality method for improving is passed through.
2. a kind of increased quality method to second-rate grid equipment defect text according to claim 1, feature
It is:Word weighting scheme in " word-defect text " matrix and " word-standard " matrix is weighted using " tf-idf ", such as
Under:aij=tfij*idfi,Wherein, aijRepresent term weighing;tfijIt is word i
The frequency occurred in defect text j;idfiRepresent the frequency inverse for the text of word i occur;Ndoc is defect text sum;gfiRepresent the frequency that word i occurs in all defect text.
3. a kind of increased quality method to second-rate grid equipment defect text according to claim 1, feature
It is:JS is apart from expression formula:sim(q1, z) value range be (0,1);q1It is scarce
Text vector is fallen into, z is standard vector;DKLFor the KL distance for equally indicating probability vector difference, such as following formula: zjWithq1jFor vector z and q1In element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597110.9A CN108874984B (en) | 2018-06-11 | 2018-06-11 | Quality improvement method for poor-quality power grid equipment defect text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597110.9A CN108874984B (en) | 2018-06-11 | 2018-06-11 | Quality improvement method for poor-quality power grid equipment defect text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108874984A true CN108874984A (en) | 2018-11-23 |
CN108874984B CN108874984B (en) | 2021-01-01 |
Family
ID=64337750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810597110.9A Active CN108874984B (en) | 2018-06-11 | 2018-06-11 | Quality improvement method for poor-quality power grid equipment defect text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108874984B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321425A (en) * | 2019-07-11 | 2019-10-11 | 云南电网有限责任公司电力科学研究院 | A kind of judgment method and device of grounding grid defect type |
CN111048167A (en) * | 2019-10-31 | 2020-04-21 | 中电药明数据科技(成都)有限公司 | Hierarchical case structuring method and system |
CN111199148A (en) * | 2019-12-26 | 2020-05-26 | 东软集团股份有限公司 | Text similarity determination method and device, storage medium and electronic equipment |
CN113610112A (en) * | 2021-07-09 | 2021-11-05 | 中国商用飞机有限责任公司上海飞机设计研究院 | Auxiliary decision-making method for airplane assembly quality defects |
CN114416988A (en) * | 2022-01-17 | 2022-04-29 | 国网福建省电力有限公司 | Defect automatic rating and disposal suggestion pushing method based on natural language processing |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105303296A (en) * | 2015-09-29 | 2016-02-03 | 国网浙江省电力公司电力科学研究院 | Electric power equipment full-life state evaluation method |
CN105955960A (en) * | 2016-05-06 | 2016-09-21 | 浙江大学 | Semantic frame-based power grid defect text mining method |
-
2018
- 2018-06-11 CN CN201810597110.9A patent/CN108874984B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105303296A (en) * | 2015-09-29 | 2016-02-03 | 国网浙江省电力公司电力科学研究院 | Electric power equipment full-life state evaluation method |
CN105955960A (en) * | 2016-05-06 | 2016-09-21 | 浙江大学 | Semantic frame-based power grid defect text mining method |
Non-Patent Citations (2)
Title |
---|
曹靖 等: "基于语义框架的电网缺陷文本挖掘技术及其应用", 《电网技术》 * |
秦璇: "电力统计数据的质量评估及其异常检测方法", 《万方数据知识服务平台》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321425A (en) * | 2019-07-11 | 2019-10-11 | 云南电网有限责任公司电力科学研究院 | A kind of judgment method and device of grounding grid defect type |
CN110321425B (en) * | 2019-07-11 | 2023-07-21 | 云南电网有限责任公司电力科学研究院 | Method and device for judging defect type of power grid |
CN111048167A (en) * | 2019-10-31 | 2020-04-21 | 中电药明数据科技(成都)有限公司 | Hierarchical case structuring method and system |
CN111048167B (en) * | 2019-10-31 | 2023-08-18 | 中电药明数据科技(成都)有限公司 | Hierarchical case structuring method and system |
CN111199148A (en) * | 2019-12-26 | 2020-05-26 | 东软集团股份有限公司 | Text similarity determination method and device, storage medium and electronic equipment |
CN111199148B (en) * | 2019-12-26 | 2023-01-20 | 东软集团股份有限公司 | Text similarity determination method and device, storage medium and electronic equipment |
CN113610112A (en) * | 2021-07-09 | 2021-11-05 | 中国商用飞机有限责任公司上海飞机设计研究院 | Auxiliary decision-making method for airplane assembly quality defects |
CN113610112B (en) * | 2021-07-09 | 2024-04-16 | 中国商用飞机有限责任公司上海飞机设计研究院 | Auxiliary decision-making method for aircraft assembly quality defects |
CN114416988A (en) * | 2022-01-17 | 2022-04-29 | 国网福建省电力有限公司 | Defect automatic rating and disposal suggestion pushing method based on natural language processing |
Also Published As
Publication number | Publication date |
---|---|
CN108874984B (en) | 2021-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108874984A (en) | A kind of increased quality method to second-rate grid equipment defect text | |
Pavlick et al. | Simple PPDB: A paraphrase database for simplification | |
Kingston et al. | Formative assessment: A meta‐analysis and a call for research | |
CN102682130B (en) | Text sentiment classification method and system | |
CN104820629A (en) | Intelligent system and method for emergently processing public sentiment emergency | |
CN101587155A (en) | Oil soaked transformer fault diagnosis method | |
CN107132310A (en) | Transformer equipment health status method of discrimination based on gauss hybrid models | |
CN105117398A (en) | Software development problem automatic answering method based on crowdsourcing | |
CN104123678A (en) | Electricity relay protection status overhaul method based on status grade evaluation model | |
CN104268134A (en) | Subjective and objective classifier building method and system | |
CN110133410A (en) | Diagnosis Method of Transformer Faults and system based on Fuzzy C-Means Cluster Algorithm | |
CN105632488A (en) | Voice evaluation method and device | |
CN109918720A (en) | Diagnosis Method of Transformer Faults based on krill group's Support Vector Machines Optimized | |
CN106779455A (en) | The methods of risk assessment and system of a kind of translation project | |
Quijada et al. | Hmc at semeval-2016 task 11: Identifying complex words using depth-limited decision trees | |
Lazaridou et al. | A multitask objective to inject lexical contrast into distributional semantics | |
CN105117847A (en) | Method for evaluating transformer failure importance | |
CN110309309B (en) | Method and system for evaluating quality of manual labeling data | |
CN112037590A (en) | Block chain online learning sharing system | |
CN109283293B (en) | Power transformer fault diagnosis method based on coefficient of variation and TOPSIS method | |
CN109284504A (en) | It grinds to call the score using the security of deep learning model and analyses method and device | |
CN105741184A (en) | Transformer state evaluation method and apparatus | |
CN111369140A (en) | Teaching evaluation system and method | |
Jagadamba | Online subjective answer verifying system using artificial intelligence | |
Ke et al. | Autoscoring essays based on complex networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |