CN113095511A

CN113095511A - Method and device for judging in-place operation of automatic master station

Info

Publication number: CN113095511A
Application number: CN202110415609.5A
Authority: CN
Inventors: 何祥针; 吴龙腾; 孟子杰; 邱丹骅; 李嘉铭; 赵瑞锋; 蔡新雷; 崔艳林; 何剑军; 黄伟杰; 郭文鑫; 王勇超; 林裕新; 刘超
Original assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-09

Abstract

The invention discloses a method and a device for judging the in-place operation of an automatic master station, which comprises the following steps: selecting operation data of the equipment as a sample data set according to the scheduling instruction ticket; performing data preprocessing on the sample data set by using an ensemble learning algorithm, and screening out an optimal sample set; training the optimal sample set by using a random forest algorithm to obtain a random forest model with classification accuracy reaching a preset threshold; and judging the information matching degree of the dispatching instruction ticket and the current running state of the equipment according to the random forest model. By establishing the random forest model, the output operation instruction category is determined by the mode of a plurality of decision tree output categories, the accuracy of the tree structure selected by the random forest model and the accuracy of the random forest model can be visually checked, and grid search cross validation is adopted to synchronously train the hyper-parameters, so that the robustness of the random forest model is improved, and the accuracy of in-place operation judgment is improved.

Description

Method and device for judging in-place operation of automatic master station

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method, a device, terminal equipment and a storage medium for realizing operation in-place study and judgment at an automatic master station.

Background

In the prior art, the running state of the power distribution network equipment is generally judged by adopting a neural network technology, and the neural network amplifies variables into a series of numbers, so that once the neural network finishes a learning stage, the characteristics become indistinguishable. If only predictions are considered, neural networks are the actual algorithms that are used at all times. There is a need in the industry environment for a model that can assign a property or variable meaning to a stakeholder. These stakeholders may be anyone, not just people who understand deep learning or machine learning knowledge. The most serious problem of the neural network technology is that the neural network technology has no capability of explaining the self reasoning process and reasoning basis; the necessary queries cannot be presented to the user and the neural network cannot work when the data is insufficient. All the characteristics of the problems are changed into numbers, all the reasoning is changed into numerical calculation, and the result is lost information, so that the judgment of the running state of the power distribution network equipment is inaccurate.

Disclosure of Invention

The invention aims to provide a method for judging the in-place operation of an automatic master station, so as to solve the problems of easy data loss and low judgment precision caused by modeling the running state of power distribution network equipment by adopting a neural network technology.

In order to achieve the above object, the present invention provides a method for determining the operation in place at an automation master station, comprising:

selecting operation data of the equipment as a sample data set according to the scheduling instruction ticket;

performing data preprocessing on the sample data set by using an ensemble learning algorithm, and screening out an optimal sample set;

training the optimal sample set by using a random forest algorithm to obtain a random forest model with classification accuracy reaching a preset threshold;

and judging the information matching degree of the dispatching instruction ticket and the current running state of the equipment according to the random forest model.

Preferably, the training of the optimal sample by using a random forest algorithm to obtain a random forest model with a classification accuracy reaching a preset threshold specifically comprises:

training the optimal sample by using a strong classifier, determining nodes of a decision tree according to a ticketing method, and storing the decision tree until all branches of the decision tree have leaf nodes;

and judging whether the number of the decision trees meets the requirement or not according to the scheduling instruction ticket, if not, continuing training, and if so, generating a random forest model.

Preferably, the operational data includes current, voltage, power and operational status.

Preferably, the performing data preprocessing on the sample data set by using the ensemble learning algorithm includes:

randomly extracting a plurality of data sample sets from the sample data set;

dividing each data sampling set into a plurality of training sample sets, and respectively carrying out primary training on the training sample sets by using a weak learner to obtain an integrated classification result h (X) according to the following formula:

wherein m is the number of the training sample sets, h_iIs the ith weak learner, x_iIs the ith training sample set;

training weak learners corresponding to a plurality of data sampling sets into strong learners by using iterative computation to obtain a final classification result H (X), wherein the formula is as follows:

wherein h is_iIs the ith weak learner, p_iIs the weight of the ith weak learner, M is the number of weak learners, y_iIs the ith data sample set.

Preferably, the performing data preprocessing on the sample data set by using the ensemble learning algorithm further includes:

and optimizing the strong learner by using a grid search and cross validation method to obtain an optimal sample set.

Preferably, the preset threshold is 90%.

The invention also provides a device for judging the in-place operation of the automatic master station, which is applied to the method for judging the in-place operation of the automatic master station. The method comprises the following steps:

the data set acquisition module is used for selecting the operation data of the equipment as a sample data set according to the scheduling instruction ticket;

the data preprocessing module is used for preprocessing the sample data set by utilizing an ensemble learning algorithm to screen out an optimal sample set;

the random forest model building module is used for training the optimal sample set by using a random forest algorithm to obtain a random forest model with classification accuracy reaching a preset threshold;

and the operation in-place judging module is used for judging the information matching degree of the dispatching instruction ticket and the current running state of the equipment according to the random forest model.

Preferably, the random forest model building module further comprises:

the decision tree construction unit is used for training the optimal sample by using a strong classifier, determining nodes of the decision tree according to a ticketing method, and storing the decision tree until all branches of the decision tree have leaf nodes;

and the random forest model establishing unit is used for judging whether the number of the decision trees meets the requirement or not according to the scheduling instruction ticket, continuing training if the number of the decision trees does not meet the requirement, and generating a random forest model if the number of the decision trees meets the requirement.

The invention also provides a computer terminal device comprising one or more processors and a memory. A memory coupled to the processor for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method of determining that an operation is in place at an automation host as described in any of the embodiments above.

The present invention also provides a computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the method for determining that an operation is in place at an automation host station according to any of the above embodiments.

In the method and the device for judging the in-place operation of the automatic master station, the random forest model is established, the output operation instruction class is determined by the mode of a plurality of decision tree output classes, the output result is more reliable than that of a single classifier, the tree structure selected by the random forest model and the accuracy of the random forest model can be visually checked, and grid search cross validation is adopted to synchronously train the hyper-parameters, so that the robustness of the random forest model is improved, and the accuracy of the in-place operation judgment is improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for determining that an operation is in place at an automated host station according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for determining that an operation is in place at an automated host station according to an embodiment of the present invention;

FIG. 3 is a flow chart diagram of a ensemble learning method provided by the present invention;

FIG. 4 is a schematic diagram of a process for constructing a random forest model according to the present invention;

fig. 5 is a schematic structural diagram of a computer terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, an embodiment of the present invention provides a method for determining that an operation is in place at an automation host, including:

s10, selecting operation data of the equipment as a sample data set according to the scheduling instruction ticket;

s20, performing data preprocessing on the sample data set by using an ensemble learning algorithm, and screening out an optimal sample set;

s30, training the optimal sample set by using a random forest algorithm to obtain a random forest model with classification accuracy reaching a preset threshold;

and S40, judging the information matching degree of the dispatching instruction ticket and the current running state of the equipment according to the random forest model.

In this embodiment, in a large amount of metering data generated by an acquisition layer of a device, device operation data is selected as a sample data set according to requirements.

For a sample data set selected from the device operation data, the sample data set is prone to contain a lot of interference data for various reasons, and the interference data are prone to influence the error analysis result of the device operation data. Before analyzing the sample data sets, the unsmooth and inaccurate information content of the equipment operation data needs to be filtered out, or noise data irrelevant to the equipment state evaluation needs to be deleted. Therefore, the equipment operation data information which is inconvenient to identify is converted into pure data information which is easy to identify for a user, and therefore the integrated learning method is utilized to carry out data preprocessing on the sample data set so as to screen out the optimal sample set.

And repeatedly training and learning the optimal sample set by using a random forest algorithm to finally obtain an optimal random forest model, wherein the process of constructing the random forest model is the further upgrade of the decision tree algorithm, and the generation of a plurality of different decision trees can be determined. When determining the branch node of the decision tree, a mode that the branch node gradually recurs branches is adopted, when recursing branches, extraction is needed from other data characteristics, the extraction mode still adopts random extraction of partial characteristics, the sub-branches are determined again, and after the node and the branch node are determined, a decision tree model is established, so that a plurality of different decision trees are established. And finally, judging whether the quantity of the constructed decision trees can meet the requirements of the user or not according to the scheduling instruction ticket, if not, retraining and learning according to the method, and re-determining the category of a new input sample according to a voting method principle (minority obeys majority). And when the user requirements are met, generating a random forest model.

After the random forest model is established, analysis and calculation can be performed by using the model. When the analysis device runs big data, the weighted information gain rate of the characteristic variables of different decision trees in the sample data set needs to be calculated to determine the importance of the characteristics. Assuming that the set of substation data samples is hundreds of thousands of different device operation data, the attributes (such as priority, importance value, etc.) of the feature variables of each substation data sample set are sorted in descending order, Y may be reduced to Y dimension (Y > Y), then the maximum feature variable of the values of the first n importance device operation devices may be selected, and then (Y-k) different features are randomly selected from the remaining Y-Y technical features. Form X characteristics jointly to reduce the high dimensional data of equipment operation big data from the X dimension to the X dimension, be favorable to user's discernment, analytical equipment running device operation body, in the angle of data analysis, the operation body is held more to this quality.

In order to verify the beneficial effect of the random forest algorithm in multi-classification data processing, a traditional two-classification method is selected to perform a comparison experiment with the random forest algorithm, wherein the two-classification method classifies and scores data according to data types and similarities, and a training set is input to train the data.

During the experiment, 5000 noun phrases of different categories are randomly imported from a word stock, 852 ball phrases are provided in the phrases, a half of the phrases are randomly selected in MATLAB as a training set, and a half of the phrases are used as a test set, the two methods are trained, after the training is completed, 10 groups of different parameter pairs are adopted in the experiment, and each group is repeated for 5 times, the average value of classification results is used as an experiment result, and the result is shown in the following table 1:

TABLE 1 comparative results of the experiments

It can be seen that when the method is used for carrying out data classification extraction and judgment, the extraction dimensionality is small, but the type judgment precision and time are far better than those of the traditional two-classification algorithm, and the fitting degree is low.

And (3) constructing a comprehensive and quantitative index system by using an analytic hierarchy process, and establishing a state evaluation method model. A management system of 'integrated management and control, lean operation and quality service' based on life management and state evaluation is gradually established, the fundamental change of an asset management mode from dispersed extensive to a big data algorithm is realized, the professional management level, the service quality and the operation benefit are improved, and the method has good practical value. The random forest algorithm adopts an integrated algorithm, so that the accuracy is better than that of most single algorithms, and the accuracy is high. The method has good performance on a test set, the random forest is not easy to fall into overfitting (samples are random, features are random) in industry due to the introduction of two randomness, the random forest has certain anti-noise capability due to the introduction of the two randomness, and the method has certain advantages compared with other algorithms. Due to the combination of trees, the random forest can process nonlinear data, belongs to a nonlinear classification (fitting) model, can process data with high dimensionality (many features), does not need to make feature selection, and has strong adaptability to a data set: the method can process both discrete data and continuous data, and a data set does not need to be normalized. The training speed is fast, and the method can be applied to large-scale data sets. Default values (as a class alone) may be processed without additional processing. Due to the out-of-bag data (OOB), an unbiased estimate of the true error can be obtained during model generation without loss of training data volume. In the training process, the mutual influence among the features can be detected, the importance of the features can be obtained, and certain reference significance is achieved. Because each tree can be independently and simultaneously generated, the parallelization method is easy to make. The method is simple to realize, high in precision and strong in overfitting resistance, and is suitable for being used as a reference model when being used for non-linear data.

In one embodiment, the training of the optimal sample by using a random forest algorithm to obtain a random forest model with a classification accuracy reaching a preset threshold specifically comprises:

Referring to fig. 3, in the embodiment, a random forest model is constructed, and first an optimal sample set needs to be trained by using a strong classifier, wherein a data volume with the highest frequency is output, a voting method is set, all data volumes are sequentially mapped to a hash table, if the same data already appears during mapping, voting is performed, the number is increased by one, and finally a hash table with all data can be obtained after mapping is completed. When determining the branch node of the decision tree, a mode that the branch node gradually recurs and branches is adopted, when recursing and branching, extraction is needed from other data characteristics, the extraction mode still adopts the mode that part of characteristics are randomly extracted, and the sub-branch is determined again. After the nodes and the sub-nodes are determined by the method, a decision tree model is established. Then, each optimal sample set is trained by adopting the method, so that a plurality of different decision trees are established. When the decision trees are increased step by step, the constructed decision trees can be stored. When the decision trees are increased step by step, the constructed decision trees can be stored. And finally, judging whether the number of the constructed decision trees can meet the requirements of the user according to the number of the instruction categories contained in the operation instruction ticket, if not, retraining and learning according to the method, and re-determining the category of the new input sample according to the voting method principle (minority obeys majority). And when the user requirements are met, generating a random forest model.

In one embodiment, the operational data includes current, voltage, power, and operational state.

In one embodiment, the performing data preprocessing on the sample data set by using an ensemble learning algorithm includes:

randomly extracting a plurality of data sample sets from the sample data set;

Referring to fig. 4, in this embodiment, when the mechanical learning training is started, some data sample sets are randomly extracted from a sample data set of original measurement sample data extracted from an equipment operation database, then the data sample sets are input, preliminary learning and training need to be performed by using a weak learner algorithm, and if the classification result is h (x), the integrated classification result is:

by analogy, a plurality of weak classifiers are gradually established according to other data sampling sets, the weak classifiers are trained into a strong classifier through multiple iterative computations, the final classification result is expressed as H (X), and the formula is as follows:

In one embodiment, the performing data preprocessing on the sample data set by using an ensemble learning algorithm further includes:

In this embodiment, when training the strong learner, a grid search cross validation mode is selected to optimize the strong learner, all data of a sample data set is divided into K parts, the K part is used as a validation set, and K-1 part is used as a training set for cross validation, for example, taking a classification result as an example, firstly, 2 nd to 10 th parts of data are used as the training set, and the 1 st part of data are used as the validation set to obtain a first score, then, the 2 nd part of data are used as the validation set, and 1 and 3 to 10 are used as the training sets to obtain a second score, and so on, the highest score is obtained as an optimal sample set.

In one embodiment, the predetermined threshold is 90%.

In this embodiment, if the classification accuracy reaches 90%, a random forest model is generated.

In a certain embodiment, the random forest model building module further includes:

The specific definition of the judgment device for realizing the operation in place at the automation master station can be referred to the definition in the above, and is not described in detail here. The above-mentioned modules in the determination device for realizing operation in place in the automation master station can be realized wholly or partially by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Referring to fig. 5, an embodiment of the invention provides a computer terminal device, which includes one or more processors and a memory. The memory is coupled to the processor for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of determining that an operation is in place at an automation host as in any of the embodiments described above.

The processor is used for controlling the overall operation of the computer terminal equipment so as to complete all or part of the steps of the judging method for realizing the operation in place at the automatic master station. The memory is used to store various types of data to support the operation at the computer terminal device, which data may include, for example, instructions for any application or method operating on the computer terminal device, as well as application-related data. The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

In an exemplary embodiment, the computer terminal Device may be implemented by one or more Application Specific 1 integrated circuits (AS 1C), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components, and is configured to perform the above-mentioned method for determining whether to operate in place in an automation host station, and achieve the technical effects consistent with the above-mentioned method.

In another exemplary embodiment, a computer readable storage medium is also provided, which includes program instructions, which when executed by a processor, implement the steps of the method for determining that an operation is in place at an automation master in any of the above embodiments. For example, the computer readable storage medium may be the memory including program instructions executable by the processor of the computer terminal device to perform the method for determining that an operation is in place at the automation host station, and achieve the technical effects consistent with the method.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for judging the in-place operation of an automated master station is characterized by comprising the following steps:

2. The method for judging the in-place operation of the automated master station as claimed in claim 1, wherein the training of the optimal sample by using the random forest algorithm to obtain the random forest model with the classification accuracy reaching the preset threshold value specifically comprises:

3. The method of claim 1, wherein the operational data includes current, voltage, power, and operational status.

4. The method of claim 1, wherein the pre-processing the sample data set using an ensemble learning algorithm comprises:

randomly extracting a plurality of data sample sets from the sample data set;

5. The method of claim 4, wherein the pre-processing the sample data set using the ensemble learning algorithm further comprises:

6. The method of claim 1, wherein the predetermined threshold is 90%.

7. An apparatus for determining the in-place of an operation at an automated host station, comprising:

8. The apparatus for determining the in-place operation of an automated host station as claimed in claim 7, wherein the random forest model building module further comprises:

9. A computer terminal device, comprising:

one or more processors;

a memory coupled to the processor for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of determining that an operation is in place at an automated host station of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of determining in place an operation at an automated host station according to any one of claims 1 to 6.