CN114154561B - Electric power data management method based on natural language processing and random forest - Google Patents

Electric power data management method based on natural language processing and random forest Download PDF

Info

Publication number
CN114154561B
CN114154561B CN202111345415.9A CN202111345415A CN114154561B CN 114154561 B CN114154561 B CN 114154561B CN 202111345415 A CN202111345415 A CN 202111345415A CN 114154561 B CN114154561 B CN 114154561B
Authority
CN
China
Prior art keywords
data
random forest
model
feature
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111345415.9A
Other languages
Chinese (zh)
Other versions
CN114154561A (en
Inventor
刘伟
叶磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Central China Technology Development Of Electric Power Co ltd
State Grid Corp of China SGCC
Original Assignee
Hubei Central China Technology Development Of Electric Power Co ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Central China Technology Development Of Electric Power Co ltd, State Grid Corp of China SGCC filed Critical Hubei Central China Technology Development Of Electric Power Co ltd
Priority to CN202111345415.9A priority Critical patent/CN114154561B/en
Publication of CN114154561A publication Critical patent/CN114154561A/en
Application granted granted Critical
Publication of CN114154561B publication Critical patent/CN114154561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention provides a power data management method based on natural language processing and random forests, which comprises the following steps: data extraction, namely obtaining a training set F; extracting feature data from the training set F, and segmenting model data to obtain a feature data set; and a third step of: deactivating words in the feature data set to form a data set; fourth step: word segmentation is carried out on the data set, and word2vec transformation is carried out to form word vectors; fifth step: classifying word vectors by a random forest algorithm; sixth step: constructing a random forest classification model; seventh step: after the random forest classification model is determined, the data is classified in the using stage, then the abnormal data of each class is returned to the user, and the normal data is recommended to the user and is modified by the user reference. The invention classifies and analyzes the data abnormal problems by utilizing the big data, provides the data generator with the correction, can reduce the data problems from the source and provides reference for the correction of the data source.

Description

Electric power data management method based on natural language processing and random forest
Technical Field
The invention relates to the technical field of computer science, in particular to a power data management method based on natural language processing and random forests.
Background
The electric power data, especially the electric power equipment archival data, is the basis of the development of the power grid production work, and at present, many kinds of production equipment archival data are stored in the equipment (asset) operation and maintenance lean management system (PMS 2.0), the total data amount is already more than 100G, and more than 200 pieces of equipment are involved, for example: transformers, bus bars, etc.
The company equipment archive data is maintained by basic team personnel, all links of power production are based on the equipment archive data, and only the accuracy of the equipment archive data is ensured, and all processes and businesses related to power can be more accurately unfolded, so that a firmer support is provided for power operation, maintenance and power analysis decision-making.
At present, the file data of the power grid production equipment has the problems of incompleteness, inaccuracy and the like, for example: incomplete key parameters of the equipment file, filling errors of equipment account parameters and the like. These problems, particularly the problem of inaccurate data, are difficult to conduct error data checking by refining rules and then developing programs; the current situation is that an operation and maintenance personnel manual checking method is adopted for data checking, but the method has low efficiency, high difficulty and poor effect.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a power data management method based on natural language processing and random forests, which can realize data management and automatic debugging of power equipment archive data.
A power data management method based on natural language processing and random forests comprises the following steps:
the first step: data extraction, obtaining a training set: the method comprises the steps of obtaining model data and rated capacity data of a pole-mounted transformer, and taking 70% of the model data and rated capacity data of the pole-mounted transformer as a training set F;
and a second step of: extracting feature data from the training set F, and segmenting model data to obtain a feature data set S= { S 1 ,s 2 ,s 3 ,...,s n };
And a third step of: deactivating words of the feature data set S to form a data set S ', S' = { S 1 ,s 2 ,s 3 …,s m }, whereinm≤n;
Fourth step: word segmentation is carried out on the data set S ', word2vec transformation is then carried out to form word vector v (S'),wherein the method comprises the steps ofv (S ') represents a word vector of the data set S' after word2vec transformation, and k represents the length of the word vector;
fifth step: carrying out random forest algorithm classification on the word vector v (s'), wherein the tag column is rated capacity data L;
sixth step: and (3) constructing a random forest classification model: aiming at the classification result obtained in the fifth step, obtaining the accuracy of the random forest classification model, and returning to the fourth step and the fifth step for parameter adjustment if the accuracy does not reach the expected threshold value until the accuracy reaches the expected threshold value;
seventh step: after the random forest classification model is determined, the data is classified in the using stage, then the abnormal data of each class is returned to the user, and the normal data is recommended to the user and is modified by the user reference.
Further, in the first step, data cleaning and filtering are further performed after data extraction: first, the rows with the transformer model and rated capacity fields being empty are filtered out, then the rows without "-" in the transformer model field are filtered out, and finally the rows without "M" or "M" in the transformer model field are filtered out.
Further, in the third step, the step of performing the deactivating word on the feature data set S specifically includes: the "-" and "/" in the transformer model field are replaced with a space.
Further, in the fifth step, the process of classifying the word vector v (s ') formed by converting the model data into the word vector v (s') by using the random forest algorithm is as follows:
(1) Setting the total tree number of the random forest decision tree as B, wherein the generation process of one decision tree B is as follows:
(a) Randomly selecting N samples from the word vector v (s') in a put-back form;
(b) Then recursively generating a random forest tree Tb;
(2) Outputting a set of random forest trees
(3) A classification prediction is made for a new data point x (i.e., model data that the user has newly entered): assume thatRepresenting the classification of the new data point x on the b-th tree, then +.>
Further, the specific steps of recursively generating the random forest tree Tb include:
i randomly selecting a number of characteristic vectors with k 'lengths from the k word vector lengths, wherein k' is less than or equal to k;
ii selecting a feature with the smallest uncertainty of the data set information from k' feature vectors to perform data segmentation, wherein the feature is also called the best segmentation feature;
iii dividing the best selected segmentation feature vector node into two sub-nodes until each node is sufficiently pure, finally forming a complete random forest tree Tb, and stopping segmentation regardless of whether the node is sufficiently pure if a decision tree formed by the segmentation feature vector nodes reaches a set maximum depth value.
Further, the means for computing the minimum uncertainty of the data set information includes: information gain based, information gain rate based, and based on the coefficient of kunning.
Further, the sixth step specifically includes: and (3) manually verifying the data in the classification according to the classification result obtained in the fifth step, selecting abnormal data and misjudgment data, verifying the misjudgment condition of the data in each class, obtaining the accuracy of data verification, averaging the accuracy in all the classifications, obtaining the accuracy of a random forest classification model, judging whether the model accuracy reaches an expected threshold, if not, turning to the fourth step and the fifth step, readjusting the length k of the word vector in the fourth step and the number B of the decision tree in the fifth step, calculating the method for enabling the uncertainty of the data set information to be minimum and the maximum depth of the decision tree until the accuracy reaches the expected threshold.
Further, the process of determining the hyper-parameters of the random forest classification model is performed by adopting a grid search method, namely each possibility is tried through cyclic traversal, and the best-performing parameter is the final result.
The invention utilizes natural language processing and random forest technology to develop data management, automatically diagnoses abnormality in a large amount of data, provides advice for rectifying and modifying the data, can reduce strong dependence of data verification work on business personnel, and can realize automatic processing for scattered data abnormality which is completely irregular and extractable, and avoid complex workload brought by manual screening.
Drawings
FIG. 1 is a flow chart of the power data governance method based on natural language processing and random forests of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
To describe the present embodiment, sample data is introduced (see Table 1)
TABLE 1 sample data
From table 1, it can be seen that the sample data has 3 columns, and the main transformer model and rated capacity with physical and business meanings has 2 columns; according to the service rules, each determined transformer model corresponds to a unique rated capacity, but the writing methods of the transformer models are various, for example, the rated capacities corresponding to the transformer models of S9-M-50/10, S11-M-50/10, S9-50 and S9-50KVA are all 50, the value of the rated capacity 50 is hidden in filling information of the transformer model, but the position is uncertain, and no clear rule exists, but experienced service personnel can basically judge how much the rated capacity corresponding to the model should be through the value of the transformer model, but the method is low-efficiency and difficult.
As shown in fig. 1, the embodiment of the invention provides a power data management method based on natural language processing and random forest, which adopts word2vec algorithm to extract characteristics of transformer model, and then uses random forest classification algorithm to construct model based on the characteristics, comprising the following specific steps:
the first step: data extraction, obtaining a training set: the method comprises the steps of obtaining model data and rated capacity data of a pole-mounted transformer, and taking 70% of the model data and rated capacity data of the pole-mounted transformer as a training set F; the data can be cleaned and filtered after the data is extracted, specifically, firstly, the rows with the transformer model and rated capacity fields being empty are filtered, then the rows without the "-" in the transformer model field are filtered, finally, the rows without the "M" or the "M" in the transformer model field are filtered, the transformer model is named xh and the rated capacity field is named edrl;
and a second step of: extracting feature data from the training set F, and segmenting model data to obtain a feature data set S= { S 1 ,s 2 ,s 3 ,…,s n };
And a third step of: deactivating words (e.g., "-" and "/", etc.) to the feature data set S forms a data set S ', S' = { S 1 ,s 2 ,s 3 …,s m }, whereinm is less than or equal to n; for example, replacing "-" and "/" in the transformer model field with "-" (one space), the processed and transformed transformer model is named xh1, e.g., the transformer model field is "S9-M-50/10" (xh) which becomes "S9M 50/10" (xh 1) after processing;
fourth step: word segmentation is performed on the data set S ', and word2vec transformation is performed to form word vectors v (S'), whereinv (S ') represents a word vector of the data set S' after word2vec transformation, and k represents the length of the word vector;
specifically, the token is used for segmenting the content of the transformer model field (xh 1) after processing and transformation, and the segmented array field is named xh2, for example, "S9M 50" (xh 1) is changed into "[ S9, M,50,10]" (xh 2) after being segmented by the token, then the word2vec model is trained by the field xh2, the output field of the word2vec model is named as rawFeateurs, the field rawFeateurs is a multidimensional feature vector, for example "[ S9, M,50,10]" (xh 2) is used as the input of the word2vec model, and the output of the word2vec model is obtained:
[-0.3870379527409871,0.883052121847868,0.16217718521753946,0.24961639444033304,0.09006961186726888,-0.3612159974873066](rawFeatures)。
fifth step: carrying out random forest algorithm classification on the word vector v (s'), wherein the tag column is rated capacity data L;
in the fifth step, the process of classifying the random forest algorithm for converting the type data into the formed word vector v (s') is as follows:
1. setting the total tree number of the random forest decision tree as B, wherein the generation process of one decision tree B is as follows:
(a) Randomly selecting N samples from the word vector v (s') in a put-back form;
(b) Then recursively generating a random forest tree Tb by the following three steps;
i randomly selecting a number of characteristic vectors with k 'lengths from the k word vector lengths, wherein k' is less than or equal to k;
ii selecting one feature with minimum uncertainty of the data set information from k' feature vectors to perform data segmentation, wherein the feature is also called a best segmentation feature, and three ways of calculating the minimum uncertainty of the data set information are ID3 (based on information gain), C4.5 (based on information gain rate) and CART (based on kunit coefficient).
iii dividing the best selected segmentation feature vector node into two sub-nodes until each node is sufficiently pure, finally forming a complete random forest tree Tb, and stopping segmentation regardless of whether the node is sufficiently pure if a decision tree formed by the segmentation feature vector nodes reaches a set maximum depth value.
2. Outputting a set of random forest trees
3. A classification prediction is made for a new data point x (i.e., model data that the user has newly entered):
assume thatRepresenting the classification of the new data point x on the b-th tree, then
Sixth step: and (3) constructing a random forest classification model: manually verifying the data in the classification according to the classification result obtained in the fifth step, selecting abnormal data and misjudgment data, verifying the misjudgment condition of the data in each class to obtain the accuracy of data verification, averaging the accuracy in all the classifications to obtain the accuracy of a random forest classification model, judging whether the model accuracy reaches an expected threshold value, if not, turning to the fourth step and the fifth step, readjusting the length k of the word vector in the fourth step and the number B of the decision trees in the fifth step, calculating the method for minimizing the uncertainty of the data set information and the maximum depth of the decision trees until the accuracy reaches the expected threshold value, for example, setting the expected threshold value of the accuracy to be 95%; the whole process of determining the super parameters of the random forest classification model is carried out by adopting a grid search method, namely each possibility is tried through cyclic traversal, and the best-performing parameter is the final result.
Seventh step: after the random forest classification model is determined, the data is classified in the using stage, then the abnormal data of each class is returned to the user, and the normal data is recommended to the user and is modified by the user reference.
Further, during the use process of the user, the label column (namely rated capacity data) of the model is corrected through the feedback of the user, so that the probability of correct classification in random forest classification is increased.
Rated capacity recommended values given for the transformer model in some of the sample data are as follows in table 2:
TABLE 2
The invention utilizes natural language processing and random forest technology to develop data governance, automatically diagnoses a large amount of data, provides advice for data correction, can reduce the strong dependence of data verification work on business personnel, can realize automatic processing for completely irregular and extractable dispersed data abnormal conditions, can also realize automatic processing by machine learning, avoids complex workload brought by manual screening (the manual means is adopted to check every 100 data usually needs to input 3 days of workload, the data governance of tens of thousands of data can be completed in a few minutes by the method of the invention, and the accuracy of governance can reach more than 95 percent).
The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (8)

1. A power data management method based on natural language processing and random forests is characterized in that: the method comprises the following steps:
the first step: data extraction, obtaining a training set: the method comprises the steps of obtaining model data and rated capacity data of a pole-mounted transformer, and taking 70% of the model data and rated capacity data of the pole-mounted transformer as a training set F;
and a second step of: extracting feature data from the training set F, and segmenting model data to obtain a feature data set S= { S 1 ,s 2 ,s 3 ,...,s n };
And a third step of: deactivating words of the feature data set S to form a data set S ', S' = { S 1 ,s 2 ,s 3 …,s m }, whereinm≤n;
Fourth step: word segmentation is performed on the data set S ', and word2vec transformation is performed to form word vectors v (S'), whereinv (S ') represents a word vector of the data set S' after word2vec transformation, and k represents the length of the word vector;
fifth step: carrying out random forest algorithm classification on the word vector v (s'), wherein the tag column is rated capacity data L;
sixth step: and (3) constructing a random forest classification model: aiming at the classification result obtained in the fifth step, obtaining the accuracy of the random forest classification model, and returning to the fourth step and the fifth step for parameter adjustment if the accuracy does not reach the expected threshold value until the accuracy reaches the expected threshold value;
seventh step: after the random forest classification model is determined, the data is classified in the using stage, then the abnormal data of each class is returned to the user, and the normal data is recommended to the user and is modified by the user reference.
2. A method of power data management based on natural language processing and random forests as claimed in claim 1, characterized by: and in the first step, data are also cleaned and filtered after being extracted: first, the rows with the transformer model and rated capacity fields being empty are filtered out, then the rows without "-" in the transformer model field are filtered out, and finally the rows without "M" or "M" in the transformer model field are filtered out.
3. A method of power data management based on natural language processing and random forests as claimed in claim 1, characterized by: in the third step, the feature data set S is specifically: the "-" and "/" in the transformer model field are replaced with a space.
4. A method of power data management based on natural language processing and random forests as claimed in claim 1, characterized by: in the fifth step, the process of classifying the word vector v (s ') formed by converting the model data into the word vector v (s') by using the random forest algorithm is as follows:
(1) Setting the total tree number of the random forest decision tree as B, wherein the generation process of one decision tree B is as follows:
(a) Randomly selecting N samples from the word vector v (s') in a put-back form;
(b) Then recursively generating a random forest tree Tb;
(2) Outputting a set of random forest trees
(3) A classification prediction is made for a new data point x (i.e., model data that the user has newly entered): assume thatRepresenting the classification of the new data point x on the b-th tree, then +.>
5. The method for managing power data based on natural language processing and random forests according to claim 4, characterized in that: wherein the specific steps of recursively generating the random forest tree Tb include:
i randomly selecting a number of characteristic vectors with k 'lengths from the k word vector lengths, wherein k' is less than or equal to k;
ii selecting a feature with the smallest uncertainty of the data set information from k' feature vectors to perform data segmentation, wherein the feature is also called the best segmentation feature;
iii dividing the best selected segmentation feature vector node into two sub-nodes until each node is sufficiently pure, finally forming a complete random forest tree Tb, and stopping segmentation regardless of whether the node is sufficiently pure if a decision tree formed by the segmentation feature vector nodes reaches a set maximum depth value.
6. The method for managing power data based on natural language processing and random forests according to claim 5, characterized in that: the means for computing the minimum uncertainty of the data set information includes: information gain based, information gain rate based, and based on the coefficient of kunning.
7. A method of power data management based on natural language processing and random forests as claimed in claim 1, characterized by: the sixth step specifically comprises: and (3) manually verifying the data in the classification according to the classification result obtained in the fifth step, selecting abnormal data and misjudgment data, verifying the misjudgment condition of the data in each class, obtaining the accuracy of data verification, averaging the accuracy in all the classifications, obtaining the accuracy of a random forest classification model, judging whether the model accuracy reaches an expected threshold, if not, turning to the fourth step and the fifth step, readjusting the length k of the word vector in the fourth step and the number B of the decision tree in the fifth step, calculating the method for enabling the uncertainty of the data set information to be minimum and the maximum depth of the decision tree until the accuracy reaches the expected threshold.
8. The method for managing power data based on natural language processing and random forests according to claim 7, characterized in that: the process of determining the super parameters of the random forest classification model is carried out by adopting a grid search method, namely each possibility is tried through cyclic traversal, and the best-performing parameter is the final result.
CN202111345415.9A 2021-11-15 2021-11-15 Electric power data management method based on natural language processing and random forest Active CN114154561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111345415.9A CN114154561B (en) 2021-11-15 2021-11-15 Electric power data management method based on natural language processing and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111345415.9A CN114154561B (en) 2021-11-15 2021-11-15 Electric power data management method based on natural language processing and random forest

Publications (2)

Publication Number Publication Date
CN114154561A CN114154561A (en) 2022-03-08
CN114154561B true CN114154561B (en) 2024-02-27

Family

ID=80460062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111345415.9A Active CN114154561B (en) 2021-11-15 2021-11-15 Electric power data management method based on natural language processing and random forest

Country Status (1)

Country Link
CN (1) CN114154561B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest
CN108537281A (en) * 2018-04-13 2018-09-14 贵州电网有限责任公司 A kind of power consumer feature recognition sorting technique based on random forest
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
CN110059183A (en) * 2019-03-22 2019-07-26 重庆邮电大学 A kind of automobile industry User Perspective sensibility classification method based on big data
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
WO2020119403A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Hospitalization data abnormity detection method, apparatus and device, and readable storage medium
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system
CN112364928A (en) * 2020-11-18 2021-02-12 浙江工业大学 Random forest classification method in transformer substation fault data diagnosis
CN112417863A (en) * 2020-11-27 2021-02-26 中国科学院电子学研究所苏州研究院 Chinese text classification method based on pre-training word vector model and random forest algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest
CN108537281A (en) * 2018-04-13 2018-09-14 贵州电网有限责任公司 A kind of power consumer feature recognition sorting technique based on random forest
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
WO2020119403A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Hospitalization data abnormity detection method, apparatus and device, and readable storage medium
CN110059183A (en) * 2019-03-22 2019-07-26 重庆邮电大学 A kind of automobile industry User Perspective sensibility classification method based on big data
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN112364928A (en) * 2020-11-18 2021-02-12 浙江工业大学 Random forest classification method in transformer substation fault data diagnosis
CN112417863A (en) * 2020-11-27 2021-02-26 中国科学院电子学研究所苏州研究院 Chinese text classification method based on pre-training word vector model and random forest algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度神经网络的电力客户诉求预判;彭路;朱君;邹云峰;;计算机与现代化;20200515(05);26-32 *
面向图书主题分类的随机森林算法的应用研究;孙彦雄;李业丽;边玉宁;;计算机技术与发展;20200610(06);71-76 *

Also Published As

Publication number Publication date
CN114154561A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN111782472B (en) System abnormality detection method, device, equipment and storage medium
Fan et al. Chaff from the wheat: Characterizing and determining valid bug reports
CN108549954B (en) Risk model training method, risk identification device, risk identification equipment and risk identification medium
WO2019238109A1 (en) Fault root cause analysis method and apparatus
Sethi et al. DLPaper2Code: Auto-generation of code from deep learning research papers
Kobayashi et al. Towards an NLP-based log template generation algorithm for system log analysis
Angeli et al. Stanford’s 2014 slot filling systems
CN109492106B (en) Automatic classification method for defect reasons by combining text codes
Zheng et al. A self-adaptive temporal-spatial self-training algorithm for semisupervised fault diagnosis of industrial processes
CN112364352A (en) Interpretable software vulnerability detection and recommendation method and system
CN112685324A (en) Method and system for generating test scheme
CN112926627A (en) Equipment defect time prediction method based on capacitive equipment defect data
CN113221960A (en) Construction method and collection method of high-quality vulnerability data collection model
CN104021180A (en) Combined software defect report classification method
CN112487146A (en) Legal case dispute focus acquisition method and device and computer equipment
CN114117029B (en) Solution recommendation method and system based on multi-level information enhancement
CN113590396A (en) Method and system for diagnosing defect of primary device, electronic device and storage medium
CN114154561B (en) Electric power data management method based on natural language processing and random forest
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
CN117131449A (en) Data management-oriented anomaly identification method and system with propagation learning capability
CN110597796B (en) Big data real-time modeling method and system based on full life cycle
CN117370568A (en) Power grid main equipment knowledge graph completion method based on pre-training language model
CN115438190B (en) Power distribution network fault auxiliary decision knowledge extraction method and system
CN115470854A (en) Information system fault classification method and classification system
Kusa et al. Vombat: A tool for visualising evaluation measure behaviour in high-recall search tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant