CN112750051B - Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment - Google Patents

Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment Download PDF

Info

Publication number
CN112750051B
CN112750051B CN202011622001.1A CN202011622001A CN112750051B CN 112750051 B CN112750051 B CN 112750051B CN 202011622001 A CN202011622001 A CN 202011622001A CN 112750051 B CN112750051 B CN 112750051B
Authority
CN
China
Prior art keywords
random forest
training
sample data
time sequence
voltage sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011622001.1A
Other languages
Chinese (zh)
Other versions
CN112750051A (en
Inventor
蔡永智
唐捷
谭跃凯
招景明
林国营
阙华坤
危阜胜
李健
卢世祥
冯小峰
郭文翀
李慧
胡秀珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Measurement Center of Guangdong Power Grid Co Ltd
Original Assignee
Measurement Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Measurement Center of Guangdong Power Grid Co Ltd filed Critical Measurement Center of Guangdong Power Grid Co Ltd
Priority to CN202011622001.1A priority Critical patent/CN112750051B/en
Publication of CN112750051A publication Critical patent/CN112750051A/en
Application granted granted Critical
Publication of CN112750051B publication Critical patent/CN112750051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Abstract

The invention relates to a station area phase sequence identification method, a device and terminal equipment based on a random forest algorithm, wherein time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target station area and a characteristic set for establishing a phase sequence attribution relation of the user electric meters are obtained, and the time sequence voltage sample data are preprocessed; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately card the phase sequence attribution relation of the user ammeter on the premise that other terminal equipment is not required to be externally hung in a target station area, and is low in cost and high in engineering application value.

Description

Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment
Technical Field
The invention relates to the technical field of low-voltage distribution networks, in particular to a random forest algorithm-based phase sequence identification method and device for a transformer area and terminal equipment.
Background
The traditional low-voltage operation and maintenance management mode is adopted for management in a low-voltage distribution network, and due to the lack of support of a topological relation of a distribution room, the traditional low-voltage operation and maintenance management mode easily causes the problems of untimely power failure notification, untimely repair and restoration, long or unsolved low-voltage solution time, frequent abnormal movement of the distribution room, abnormal line loss of the distribution room and the like, and further causes dissatisfaction of power users. For this reason, it is very important to research the identification technology of the physical topology of the distribution area 'change-line-phase-user' (distribution transformer-low voltage outlet-phase-user ammeter). Aiming at the problem, an injection signal method, a data label method and a data analysis method are researched to identify the physical topology of the transformer-line-phase-user in the transformer area. The existing signal injection method and the data label method both need to add extra terminal equipment, have large investment and large operation and maintenance pressure, and are difficult to apply in a micropower wireless scheme platform area.
Disclosure of Invention
The embodiment of the invention provides a phase sequence identification method, a phase sequence identification device and terminal equipment for a transformer area based on a random forest algorithm, and aims to solve the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line, the phase and the user is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
a random forest algorithm-based platform region phase sequence identification method comprises the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period in a target platform area, and establishing a feature set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing a random forest recognition model for m training decision trees;
s50, carrying out phase sequence recognition on the test set by adopting the random forest recognition model to obtain the phase sequence of the user electric meters in the target transformer area.
Preferably, in step S10, obtaining that the time-series voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in the target platform region in a certain time period satisfies a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
Preferably, the missing proportion of the time sequence voltage sample data is a ratio percentage of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
Preferably, in step S20, the preprocessing the timing voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
Preferably, in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={p A 、p B 、p C 、std A 、std B 、std C }
in the formula, p A 、p B 、p C Respectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target area A 、std B 、std C And respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
Preferably, in step S40, the step of establishing a random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a learning algorithm of random forests;
wherein the test set consists of time-series voltage sample data of D-D samples in the sample set.
Preferably, in step S43, selecting one of m training sets and generating one training decision tree by using a random forest includes:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value.
The invention also provides a station area phase sequence recognition device based on the random forest algorithm, which comprises a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing a random forest recognition model for m training decision trees;
and the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meters in the target transformer area.
The present invention also provides a computer-readable storage medium for storing computer instructions, which, when run on a computer, cause the computer to execute the above-mentioned random forest algorithm-based phase sequence identification method for a region.
The invention also provides terminal equipment, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
According to the technical scheme, the embodiment of the invention has the following advantages: the method, the device and the terminal equipment for identifying the phase sequence of the transformer area based on the random forest algorithm acquire time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area and establish a characteristic set of the phase sequence attribution relationship of the user electric meters, and preprocess the time sequence voltage sample data; generating a training set and a test set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence recognition method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise that other terminal equipment is not required to be hung outside a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to recognize the 'variable-line-phase-user' physical topology of the station area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flow chart of steps of a platform region phase sequence identification method based on a random forest algorithm according to an embodiment of the present invention.
Fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter based on the random forest algorithm station area phase sequence identification method according to the embodiment of the invention.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
Fig. 4 is a frame diagram of the phase sequence identification device for a transformer area based on a random forest algorithm according to the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application provides a phase sequence identification method and device for a transformer area based on a random forest algorithm and terminal equipment, and solves the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line area, the phase area and the user area is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
The first embodiment is as follows:
fig. 1 is a flow chart of steps of a platform region phase sequence identification method based on a random forest algorithm according to an embodiment of the present invention, and fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter according to the platform region phase sequence identification method based on the random forest algorithm according to the embodiment of the present invention.
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a phase sequence identification method for a distribution room based on a random forest algorithm, including the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing a random forest recognition model for the m training decision trees;
s50, adopting a random forest identification model to identify the phase sequence of the test set to obtain the phase sequence of the user electric meters in the target transformer area.
In step S10 of the embodiment of the present invention, time sequence voltage sample data between each phase low voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target station area and a feature set for establishing a phase sequence affiliation relationship of the user electric meters are mainly obtained from the power system.
It should be noted that the acquired time sequence voltage sample data needs to satisfy a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of the target platform area corresponding to the time sequence voltage sample data is more than 0.02. In this embodiment, each sampling point uses one voltage data, that is, the time sequence voltage sample data at least includes the voltage data of 96 time points. And the time section of the voltage sample data is not less than the total number of the user electric meters in the target transformer area. The phase sequence attribution relation of the user electric meter refers to which phase sequence of the attribution A, B and C the user electric meter belongs to.
In the embodiment of the invention, the missing proportion of the time sequence voltage sample data is the percentage of the ratio of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
It should be noted that (null/m) × 100% is less than or equal to 20%, where null is the number of missing sampling points, and m is the time span of the time sequence voltage sample data.
In the embodiment of the present invention, the expression of the feature set for establishing the phase sequence attribution relationship of the user electricity meter is as follows:
F={p A 、p B 、p C 、std A 、std B 、std C }
in the formula, p A 、p B 、p C Electricity separate from the user's electricity meterCorrelation coefficient, std, of the voltage sequence with the three-phase time sequence voltage of the concentrator of the target area A 、std B 、std C And respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
In the embodiment of the present invention, the three-phase imbalance of the target block corresponding to the time sequence voltage sample data is:
S=mean{s ac 、s bc 、s ab }
Figure GDA0004053414120000061
/>
in the formula (I), the compound is shown in the specification,
Figure GDA0004053414120000062
is the time-series voltage of phase A in the concentrator of the target zone>
Figure GDA0004053414120000063
Is the timing voltage of the B phase in the concentrator of the target zone>
Figure GDA0004053414120000064
Is the timing voltage of the C phase in the concentrator of the target site.
In step S20 of the embodiment of the present invention, the time sequence voltage sample data is mainly preprocessed to ensure the integrity of the concentrated data of the sample, and to provide a guarantee for establishing the accuracy of the recognition result of the random forest recognition model.
In step S30 of the embodiment of the present invention, a training set and a test set are generated by mainly selecting time sequence voltage sample data from a sample set, and training data and test data are provided for establishing a random forest recognition model, so as to obtain a random forest recognition model capable of recognizing a phase sequence of a user electric meter in a target platform area.
In step S40 of the embodiment of the present invention, a random forest algorithm is mainly used to train and establish a random forest recognition model in combination with a training set and a feature set.
It should be noted that, in machine learning, a random forest is a classifier comprising a plurality of decision trees, and the output class thereof is determined by the mode of the class output by the individual trees. LeoBreiman and Adele Cutler developed algorithms that inferred random forests. And "Random forms" were derived from Random decision Forests (Random decision forms) proposed by Tin Kam Ho of Bell laboratories in 1995. The random forest algorithm is a set of decision trees that combines the "boosting aggregation" idea of Breimans with the "random subspace method" of Ho.
In step S50 of the embodiment of the present invention, a trained random forest recognition model is used to recognize the phase-sequence relationship between the user electric meters and the distribution transformer in the time sequence voltage sample data of the test set, so as to obtain the phase sequence of the user electric meters in the target transformer area.
It should be noted that, if the number of the user electric meters in the target platform area is 68, 288 time-series voltage sample data of each user electric meter in the target platform area are obtained through step S10, as shown in fig. 2. Through the steps S20 to S40, a random forest recognition model capable of recognizing the phase sequence of the user electricity meters in the target region is obtained, and the recognition result is shown in table 1.
Table 1 shows the identification result of the phase sequence relation between the user electric meter and the distribution transformer in the target area
Figure GDA0004053414120000071
/>
Figure GDA0004053414120000081
/>
Figure GDA0004053414120000091
From the table 1, the phase sequences of 64 user electric meters are correctly identified, and the phase sequences of 4 user electric meters are incorrectly identified, the accuracy of identifying the phase sequence relation between the user electric meters and all phases of the distribution transformer in the target distribution area by the random forest identification model established by the random forest algorithm-based distribution area phase sequence identification method can reach 94.12%, and the effectiveness and the feasibility of the random forest algorithm-based distribution area phase sequence identification method are verified.
The invention provides a random forest algorithm-based transformer area phase sequence identification method, which comprises the steps of obtaining time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area, establishing a characteristic set of a phase sequence attribution relation of the user electric meters, and preprocessing the time sequence voltage sample data; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise of not needing to plug in other terminal equipment in a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to identify the 'variable-line-phase-user' physical topology of the station area.
In an embodiment of the present invention, in step S20, the preprocessing the time-series voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm; and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
It should be noted that missing data in the acquired time sequence voltage sample data is mainly filled, so that the acquired time sequence voltage sample data is complete, and inaccurate random forest identification model identification results obtained by data incompleteness are avoided.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
As shown in fig. 3, in an embodiment of the present invention, in step S40, the step of establishing a random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
the test set consists of time sequence voltage sample data of D-D samples in the sample set.
In the embodiment of the present invention, in step S43, selecting one training set from m training sets and generating one training decision tree by using a random forest includes:
dividing a training set into two sub-training sets, wherein each non-leaf node corresponding to a random forest is provided with two branches;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value. The Gini coefficients of a random forest are defined as:
Figure GDA0004053414120000101
in the formula, pi is the probability that the sub-training set in the random forest node belongs to the class i, and M is the number of the classes in the random forest node.
It should be noted that, a sample of the time sequence voltage sample data is randomly selected from the sample set D each time, stored into the training set, and then the sample of the time sequence voltage sample data is placed back to the original sample set D, so that the sample of the time sequence voltage sample data is possibly selected next time; repeating for m times to obtain a training set d containing m samples of time sequence voltage sample data, namely: d = { d 1 ,d 2 ,...,d m }. Samples (D-D) of the timing voltage sample data remaining in the sample set D serve as a test set. P features are selected from the feature set to form a new feature set, and d = { d } from the training set 1 ,d 2 ,...,d m And extracting a training set as a current training set, dividing the current training set into 2 sub-training sets, wherein each non-leaf node of a tree generated in a random forest has 2 branches. The non-leaf nodes represent features, and the leaf nodes are identification values given by a tree model of the random forest. Selecting a feature from the new feature set according to a second constraint condition, and dividing nodes of the tree corresponding to the random forest into 2 branches according to the feature; recursively repeating generating a training decision tree on each branch; and combining the obtained m training decision trees to obtain a random forest recognition model. In this embodiment, the feature set contains 6 features, P<6。
In the embodiment of the invention, after the random forest identification model is obtained, a test set (D-D) is input, and the random forest identification model gives the identification value of the test set by averaging the identification values of all regression trees
Figure GDA0004053414120000112
Comprises the following steps:
Figure GDA0004053414120000111
where m is the number of training decision trees, rf i The decision tree is trained for the i-th training.
Example two:
fig. 4 is a frame diagram of the platform region phase sequence identification apparatus based on the random forest algorithm according to the embodiment of the present invention.
As shown in fig. 4, an embodiment of the present invention further provides a platform region phase sequence recognition apparatus based on a random forest algorithm, which includes a data acquisition module 10, a data processing module 20, a sample classification module 30, a model building module 40, and a recognition module 50;
the data acquisition module 10 is configured to acquire time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target station area, and establish a feature set of a phase sequence affiliation relationship of the user electric meters;
the data processing module 20 is configured to perform preprocessing on the time-series voltage sample data to obtain a sample set;
the sample classification module 30 is configured to select time sequence voltage sample data from a sample set to generate a training set and a test set;
the model establishing module 40 is used for training the training set and the feature set to obtain training decision trees, and establishing a random forest recognition model for the m training decision trees;
and the identification module 50 is used for performing phase sequence identification on the test set by adopting a random forest identification model to obtain the phase sequence of the user electric meter in the target transformer area.
It should be noted that the modules in the second embodiment correspond to the steps in the first embodiment, and the steps in the first embodiment have been described in detail in the first embodiment, and the contents of the modules in the second embodiment are not described in detail in this second embodiment.
Example three:
the embodiment of the invention provides a computer-readable storage medium, which is used for storing computer instructions, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the station area phase sequence identification method based on the random forest algorithm.
Example four:
the embodiment of the invention provides terminal equipment, which comprises a processor and a memory;
a memory for storing the program code and transmitting the program code to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
It should be noted that the processor is configured to execute the steps in the above-mentioned embodiment of the phase sequence identification method based on the random forest algorithm according to the instructions in the program code. Alternatively, the processor, when executing the computer program, implements the functions of each module/unit in each system/apparatus embodiment described above.
Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in a memory and executed by a processor to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of a computer program in a terminal device.
The terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the terminal device is not limited and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used for storing computer programs and other programs and data required by the terminal device. The memory may also be used to temporarily store data that has been output or is to be output.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A random forest algorithm-based phase sequence identification method for a transformer area is characterized by comprising the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period in a target platform area, and establishing a feature set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
s50, carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain a phase sequence of the user electric meter in the target transformer area;
in step S40, the step of establishing a random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
the test set consists of time sequence voltage sample data which is left after the time sequence voltage sample data of the training set is removed from the sample set;
in step S43, selecting one of m training sets to generate a training decision tree using a random forest includes:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value;
the expression for the Gini coefficient for a random forest is:
Figure FDA0004059897220000021
in the formula, p i The probability that the sub training set in the random forest nodes belongs to the class i is shown, and M is the number of the classes in the random forest nodes;
and the random forest identification model outputs identification values for the test set phase sequence identification, wherein the identification values are as follows:
Figure FDA0004059897220000022
where m is the number of training decision trees, rf i For the i-th training decision tree to be trained,
Figure FDA0004059897220000023
to identify the value.
2. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the time sequence voltage sample data between the low voltage outgoing lines of each phase of the distribution transformer and the user electric meters in a certain time period of the target transformer area is acquired to meet a first constraint condition, and the first constraint condition comprises: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
3. The random forest algorithm-based phase sequence identification method for the transformer area, as recited in claim 2, wherein the missing proportion of the time sequence voltage sample data is a percentage of a ratio of the number of missing sampling points of the acquired time sequence voltage sample data to a time span of the time sequence voltage sample data.
4. The random forest algorithm-based phase sequence identification method for the transformer area, as claimed in claim 2, wherein in step S20, the preprocessing the time sequence voltage sample data comprises: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
5. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={p A 、p B 、p C 、std A 、std B 、std C }
in the formula, p A 、P B 、P C Respectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target area A 、std B 、std C And respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
6. A phase sequence recognition device of a transformer area based on a random forest algorithm is characterized by comprising a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a feature set of a phase sequence affiliation relationship of the user electric meter;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meter in the target transformer area;
the step of establishing the random forest identification model comprises the following steps:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
the test set consists of time sequence voltage sample data left after the time sequence voltage sample data of the training set are removed from the sample set;
in step S43, selecting one of m training sets to generate a training decision tree using a random forest includes:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value;
the expression for the Gini coefficient for a random forest is:
Figure FDA0004059897220000041
in the formula, p i The probability that the sub training set in the random forest nodes belongs to the class i is shown, and M is the number of the classes in the random forest nodes;
and the random forest identification model outputs identification values for the test set phase sequence identification, wherein the identification values are as follows:
Figure FDA0004059897220000042
wherein m is the number of training decision trees, rf i For the i-th training decision tree to be trained,
Figure FDA0004059897220000043
to identify the value.
7. A computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the random forest algorithm based stage phase sequence identification method of any one of claims 1 to 5.
8. A terminal device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the random forest algorithm-based phase sequence identification method of the region according to any one of claims 1 to 5 according to instructions in the program code.
CN202011622001.1A 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment Active CN112750051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011622001.1A CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011622001.1A CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Publications (2)

Publication Number Publication Date
CN112750051A CN112750051A (en) 2021-05-04
CN112750051B true CN112750051B (en) 2023-04-07

Family

ID=75650339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011622001.1A Active CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Country Status (1)

Country Link
CN (1) CN112750051B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093985B (en) * 2021-06-09 2021-09-10 中国南方电网有限责任公司超高压输电公司广州局 Sensor data link abnormity detection method and device and computer equipment
CN113449980B (en) * 2021-06-24 2023-02-24 广东电网有限责任公司 Low-voltage transformer area phase sequence identification method, system, terminal and storage medium
CN113625108B (en) * 2021-08-02 2022-11-01 四川轻化工大学 Flexible direct current power distribution network fault identification method
CN114548226A (en) * 2022-01-21 2022-05-27 国网江苏省电力有限公司常州供电分公司 Method and device for identifying station area user-variable relationship based on K-Means clustering algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404944A (en) * 2015-12-11 2016-03-16 中国电力科学研究院 Big data analysis method for warning of heavy-load and overload of electric power system
CN108376982A (en) * 2017-11-24 2018-08-07 上海泰豪迈能能源科技有限公司 Load recognition methods and the device of phase sequence
CN109376944A (en) * 2018-11-13 2019-02-22 国网宁夏电力有限公司电力科学研究院 The construction method and device of intelligent electric meter prediction model
CN110930198A (en) * 2019-12-05 2020-03-27 佰聆数据股份有限公司 Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN111881415A (en) * 2020-07-31 2020-11-03 广东电网有限责任公司计量中心 Method and device for identifying phase sequence and line-subscriber relationship of distribution room

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018004580A1 (en) * 2016-06-30 2018-01-04 Intel Corporation Device-based anomaly detection using random forest models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404944A (en) * 2015-12-11 2016-03-16 中国电力科学研究院 Big data analysis method for warning of heavy-load and overload of electric power system
CN108376982A (en) * 2017-11-24 2018-08-07 上海泰豪迈能能源科技有限公司 Load recognition methods and the device of phase sequence
CN109376944A (en) * 2018-11-13 2019-02-22 国网宁夏电力有限公司电力科学研究院 The construction method and device of intelligent electric meter prediction model
CN110930198A (en) * 2019-12-05 2020-03-27 佰聆数据股份有限公司 Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN111881415A (en) * 2020-07-31 2020-11-03 广东电网有限责任公司计量中心 Method and device for identifying phase sequence and line-subscriber relationship of distribution room

Also Published As

Publication number Publication date
CN112750051A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112750051B (en) Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment
CN110389269B (en) Low-voltage distribution area topological relation identification method and device based on current optimization matching
CN108376982A (en) Load recognition methods and the device of phase sequence
Cheng et al. Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting
CN105786702A (en) Computer software analysis system
CN110659693A (en) K-nearest neighbor classification-based rapid topology identification method and system for power distribution network and readable storage medium
CN110019519A (en) Data processing method, device, storage medium and electronic device
Da Silva et al. Composite reliability evaluation with renewable sources based on quasi-sequential Monte Carlo and cross entropy methods
CN115936180A (en) Photovoltaic power generation power prediction method and device and computer equipment
CN110110213A (en) Excavate method, apparatus, computer readable storage medium and the terminal device of user&#39;s occupation
CN111147306B (en) Fault analysis method and device of Internet of things equipment and Internet of things platform
CN112767190B (en) Method and device for identifying phase sequence of transformer area based on multilayer stacked neural network
CN110601173A (en) Distribution network topology identification method and device based on edge calculation
CN111654392A (en) Low-voltage distribution network topology identification method and system based on mutual information
CN115759365A (en) Photovoltaic power generation power prediction method and related equipment
CN114021425B (en) Power system operation data modeling and feature selection method and device, electronic equipment and storage medium
CN115329814B (en) Low-voltage user link identification method and device based on image signal processing
CN106599865A (en) Disconnecting link state recognition device and method
CN111239484A (en) Non-invasive load electricity consumption information acquisition method for non-resident users
CN113627655B (en) Method and device for simulating and predicting pre-disaster fault scene of power distribution network
CN115545085A (en) Weak fault current fault type identification method, device, equipment and medium
CN114971053A (en) Training method and device for online prediction model of network line loss rate of low-voltage transformer area
CN113722939A (en) Wind power output prediction method, device, equipment and storage medium
CN113224748A (en) Method for calculating line loss of low-voltage distribution station area
CN112100860A (en) MMC (Modular multilevel converter) model establishing method and electromagnetic transient simulation method for multi-terminal direct-current power transmission system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant