CN112750051A - Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment - Google Patents

Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment Download PDF

Info

Publication number
CN112750051A
CN112750051A CN202011622001.1A CN202011622001A CN112750051A CN 112750051 A CN112750051 A CN 112750051A CN 202011622001 A CN202011622001 A CN 202011622001A CN 112750051 A CN112750051 A CN 112750051A
Authority
CN
China
Prior art keywords
random forest
sample data
training
voltage sample
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011622001.1A
Other languages
Chinese (zh)
Other versions
CN112750051B (en
Inventor
蔡永智
唐捷
谭跃凯
招景明
林国营
阙华坤
危阜胜
李健
卢世祥
冯小峰
郭文翀
李慧
胡秀珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Measurement Center of Guangdong Power Grid Co Ltd
Original Assignee
Measurement Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Measurement Center of Guangdong Power Grid Co Ltd filed Critical Measurement Center of Guangdong Power Grid Co Ltd
Priority to CN202011622001.1A priority Critical patent/CN112750051B/en
Publication of CN112750051A publication Critical patent/CN112750051A/en
Application granted granted Critical
Publication of CN112750051B publication Critical patent/CN112750051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a station area phase sequence identification method, a device and terminal equipment based on a random forest algorithm, wherein time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target station area and a characteristic set for establishing a phase sequence attribution relation of the user electric meters are obtained, and the time sequence voltage sample data are preprocessed; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately card the phase sequence attribution relation of the user ammeter on the premise of not needing to plug other terminal equipment outside a target station area, and is low in cost and high in engineering application value.

Description

Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment
Technical Field
The invention relates to the technical field of low-voltage distribution networks, in particular to a random forest algorithm-based phase sequence identification method and device for a transformer area and terminal equipment.
Background
The traditional low-voltage operation and maintenance management mode is adopted for management in a low-voltage distribution network, and due to the lack of support of a topological relation of a distribution room, the traditional low-voltage operation and maintenance management mode easily causes the problems of untimely power failure notification, untimely repair and restoration, long or unsolved low-voltage solution time, frequent abnormal movement of the distribution room, abnormal line loss of the distribution room and the like, and further causes dissatisfaction of power users. Therefore, it is very important to research the physical topology identification technology of the transformer-line-phase-user (distribution transformer-low voltage outlet-phase-user electric meter) in the transformer area. Aiming at the problem, an injection signal method, a data label method and a data analysis method are researched to identify the physical topology of the transformer-line-phase-user in the transformer area. The existing signal injection method and the data label method both need to add extra terminal equipment, have large investment and large operation and maintenance pressure, and are difficult to apply in a micropower wireless scheme area.
Disclosure of Invention
The embodiment of the invention provides a phase sequence identification method, a phase sequence identification device and terminal equipment for a transformer area based on a random forest algorithm, and aims to solve the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line, the phase and the user is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
a random forest algorithm-based phase sequence identification method for a transformer area comprises the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
s50, carrying out phase sequence recognition on the test set by adopting the random forest recognition model to obtain the phase sequence of the user electric meters in the target transformer area.
Preferably, in step S10, the obtaining step obtains that the time-series voltage sample data between the low-voltage outgoing lines of each phase of the distribution transformer and each user electric meter in the target area in a certain time period satisfies a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
Preferably, the missing proportion of the time sequence voltage sample data is a ratio percentage of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
Preferably, in step S20, the preprocessing the time-series voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
Preferably, in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
Preferably, in steps S30 and S40, the step of building a random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of (D-D) time-series voltage sample data of the sample set.
Preferably, in step S43, selecting one of m training sets to generate one training decision tree using a random forest includes:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value.
The invention also provides a station area phase sequence recognition device based on the random forest algorithm, which comprises a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
and the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meters in the target transformer area.
The present invention also provides a computer-readable storage medium for storing computer instructions, which, when run on a computer, cause the computer to execute the above-mentioned random forest algorithm-based phase sequence identification method for a region.
The invention also provides terminal equipment, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
According to the technical scheme, the embodiment of the invention has the following advantages: the method, the device and the terminal equipment for identifying the phase sequence of the transformer area based on the random forest algorithm acquire time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area and establish a characteristic set of the phase sequence attribution relationship of the user electric meters, and preprocess the time sequence voltage sample data; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise of not needing to plug in other terminal equipment in a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to identify the 'variable-line-phase-user' physical topology of the station area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flow chart illustrating steps of a phase sequence identification method for a transformer area based on a random forest algorithm according to an embodiment of the present invention.
Fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter based on the random forest algorithm station area phase sequence identification method according to the embodiment of the invention.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
Fig. 4 is a frame diagram of the phase sequence identification device for a transformer area based on a random forest algorithm according to the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application provides a phase sequence identification method and device for a transformer area based on a random forest algorithm and terminal equipment, and solves the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line area, the phase area and the user area is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
The first embodiment is as follows:
fig. 1 is a flow chart illustrating steps of a station phase sequence identification method based on a random forest algorithm according to an embodiment of the present invention, and fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter according to the station phase sequence identification method based on the random forest algorithm according to the embodiment of the present invention.
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a phase sequence identification method for a distribution room based on a random forest algorithm, including the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for the m training decision trees;
s50, adopting a random forest identification model to identify the phase sequence of the test set to obtain the phase sequence of the user electric meters in the target transformer area.
In step S10 of the embodiment of the present invention, time sequence voltage sample data between each phase low voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target transformer area and a feature set for establishing a phase sequence attribution relationship of the user electric meters are mainly obtained from the power system.
It should be noted that the acquired time sequence voltage sample data needs to satisfy a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of the target platform area corresponding to the time sequence voltage sample data is more than 0.02. In this embodiment, each sampling point uses one voltage data, that is, the time sequence voltage sample data at least includes the voltage data of 96 time points. And the time section of the voltage sample data is not less than the total number of the user electric meters in the target transformer area. The user electric meter phase sequence attribution relation refers to which phase sequence in the attribution A, B, C phase sequences the user electric meter belongs to.
In the embodiment of the invention, the missing proportion of the time sequence voltage sample data is the percentage of the ratio of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
It should be noted that (null/m) × 100% is less than or equal to 20%, where null is the number of missing sampling points, and m is the time span of the time sequence voltage sample data.
In the embodiment of the present invention, the expression of the feature set for establishing the phase sequence attribution relationship of the user electricity meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
In the embodiment of the present invention, the three-phase imbalance of the target block corresponding to the time sequence voltage sample data is:
S=mean{sac、sbc、sab}
Figure BDA0002872522810000061
in the formula (I), the compound is shown in the specification,
Figure BDA0002872522810000062
is the timing voltage of phase a in the concentrator of the target site,
Figure BDA0002872522810000063
is the timing voltage of phase B in the concentrator of the target site,
Figure BDA0002872522810000064
is the timing voltage of the C phase in the concentrator of the target site.
In step S20 of the embodiment of the present invention, time sequence voltage sample data is mainly preprocessed to ensure the integrity of the concentrated data of the sample, and to provide a guarantee for the accuracy of the identification result of the random forest identification model.
In step S30 of the embodiment of the present invention, a training set and a test set are generated by selecting time sequence voltage sample data from a sample set, and training data and test data are provided for establishing a random forest recognition model, so as to obtain a random forest recognition model capable of recognizing a phase sequence of a user electricity meter in a target platform area.
In step S40 of the embodiment of the present invention, a random forest algorithm is mainly used to train and establish a random forest recognition model in combination with a training set and a feature set.
In machine learning, a random forest is a classifier that includes a plurality of decision trees, and the class of its output is determined by the mode of the class output by the individual trees. Leo Breiman and Adele Cutler developed algorithms that inferred random forests. And "Random forms" were derived from Random decision Forests (Random decision forms) proposed by Tin Kam Ho of Bell laboratories in 1995. The random forest algorithm is a set of decision trees that combines the "boosting aggregation" idea of Breimans with the "random subspace method" of Ho.
In step S50 of the embodiment of the present invention, a trained random forest recognition model is used to recognize the phase-sequence relationship between the user electricity meter and the distribution transformer in the time sequence voltage sample data of the test set, so as to obtain the phase sequence of the user electricity meter in the target transformer area.
It should be noted that, if the number of the user electric meters in the target region is 68, 288 time-series voltage sample data of each user electric meter in the target region are obtained through step S10, as shown in fig. 2. Through steps S20 to S40, a random forest recognition model capable of recognizing the phase sequence of the user electricity meters in the target region is obtained, and the recognition result is shown in table 1.
Table 1 shows the identification results of the phase sequence relationship between the user electric meter and the distribution transformer in the target area
Figure BDA0002872522810000071
Figure BDA0002872522810000081
Figure BDA0002872522810000091
From the table 1, the phase sequences of 64 user electric meters are correctly identified, and the phase sequences of 4 user electric meters are incorrectly identified, the accuracy of identifying the phase sequence relation between the user electric meters and all phases of the distribution transformer in the target distribution area by the random forest identification model established by the random forest algorithm-based distribution area phase sequence identification method can reach 94.12%, and the effectiveness and the feasibility of the random forest algorithm-based distribution area phase sequence identification method are verified.
The invention provides a random forest algorithm-based transformer area phase sequence identification method, which comprises the steps of obtaining time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area, establishing a characteristic set of a phase sequence attribution relation of the user electric meters, and preprocessing the time sequence voltage sample data; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise of not needing to plug in other terminal equipment in a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to identify the 'variable-line-phase-user' physical topology of the station area.
In one embodiment of the present invention, in step S20, the preprocessing the time-series voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm; and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
It should be noted that missing data in the acquired time sequence voltage sample data is mainly filled, so that the acquired time sequence voltage sample data is complete, and inaccurate random forest identification model identification results obtained by data incompleteness are avoided.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
As shown in fig. 3, in one embodiment of the present invention, in steps S30 and S40, the step of establishing the random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of time sequence voltage sample data of the (D-D) sample set.
In the embodiment of the present invention, in step S43, selecting one training set from m training sets to generate one training decision tree using a random forest includes:
dividing a training set into two sub-training sets, wherein each non-leaf node corresponding to a random forest is provided with two branches;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value. The Gini coefficients of a random forest are defined as:
Figure BDA0002872522810000101
in the formula, piAnd the probability that the sub training set in the random forest nodes belongs to the class i is shown, and M is the number of the classes in the random forest nodes.
It should be noted that, a sample of the time sequence voltage sample data is randomly selected from the sample set D each time, stored into the training set, and then the sample of the time sequence voltage sample data is placed back to the original sample set D, so that the sample of the time sequence voltage sample data is possibly selected next time; repeating the steps m times to obtain a training set d containing m samples of time sequence voltage sample data, namely: d ═ d1,d2,...,dm}. Samples (D-D) of the timing voltage sample data remaining in the sample set D serve as a test set. P features are selected from the feature set to form a new feature set, and d is also selected from the training set d ═ d1,d2,...,dmAnd extracting a training set as a current training set, dividing the current training set into 2 sub-training sets, wherein each non-leaf node of a tree generated in a random forest has 2 branches. The non-leaf nodes represent features, and the leaf nodes are identification values given by a tree model of the random forest. Selecting a feature from the new feature set according to a second constraint condition, and dividing nodes of the tree corresponding to the random forest into 2 branches according to the feature; recursively repeating generating a training decision tree on each branch; combining the obtained m training decision trees to obtain a random forestA model is identified. In this embodiment, the feature set contains 6 features, P<6。
In the embodiment of the invention, after the random forest identification model is obtained, a test set (D-D) is input, and the random forest identification model gives the identification value of the test set by averaging the identification values of all regression trees
Figure BDA0002872522810000112
Comprises the following steps:
Figure BDA0002872522810000111
where m is the number of training decision trees, rfiThe decision tree is trained for the i-th training.
Example two:
fig. 4 is a frame diagram of the phase sequence identification device for a transformer area based on a random forest algorithm according to the embodiment of the present invention.
As shown in fig. 4, an embodiment of the present invention further provides a station phase sequence identification apparatus based on a random forest algorithm, which includes a data acquisition module 10, a data processing module 20, a sample classification module 30, a model building module 40, and an identification module 50;
the data acquisition module 10 is configured to acquire time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target station area, and establish a feature set of a phase sequence affiliation relationship of the user electric meters;
the data processing module 20 is configured to perform preprocessing on the time-series voltage sample data to obtain a sample set;
the sample classification module 30 is configured to select time sequence voltage sample data from a sample set to generate a training set and a test set;
the model establishing module 40 is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for the m training decision trees;
and the identification module 50 is used for identifying the phase sequence of the test set by adopting a random forest identification model to obtain the phase sequence of the user electric meter in the target transformer area.
It should be noted that the modules in the second embodiment correspond to the steps in the first embodiment, and the steps in the first embodiment have been described in detail in the first embodiment, and the contents of the modules in the second embodiment are not described in detail in this second embodiment.
Example three:
the embodiment of the invention provides a computer-readable storage medium, which is used for storing computer instructions, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the station area phase sequence identification method based on the random forest algorithm.
Example four:
the embodiment of the invention provides terminal equipment, which comprises a processor and a memory;
a memory for storing the program code and transmitting the program code to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
It should be noted that the processor is configured to execute the steps in the above-mentioned embodiment of the phase sequence identification method based on the random forest algorithm according to the instructions in the program code. Alternatively, the processor, when executing the computer program, implements the functions of each module/unit in each system/apparatus embodiment described above.
Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in a memory and executed by a processor to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of a computer program in a terminal device.
The terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the terminal device is not limited and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used for storing computer programs and other programs and data required by the terminal device. The memory may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A random forest algorithm-based phase sequence identification method for a transformer area is characterized by comprising the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
s50, carrying out phase sequence recognition on the test set by adopting the random forest recognition model to obtain the phase sequence of the user electric meters in the target transformer area.
2. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the obtained time sequence voltage sample data between the low voltage outgoing lines of the phases of the distribution transformer and the user electric meters in the target transformer area in a certain time period satisfies a first constraint condition, and the first constraint condition comprises: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
3. The random forest algorithm-based phase sequence identification method for the transformer area, as recited in claim 2, wherein the missing proportion of the time sequence voltage sample data is a percentage of a ratio of the number of missing sampling points of the acquired time sequence voltage sample data to a time span of the time sequence voltage sample data.
4. The random forest algorithm-based phase sequence identification method for the transformer area, as claimed in claim 2, wherein in step S20, the preprocessing the time sequence voltage sample data comprises: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
5. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
6. The random forest algorithm-based phase sequence identification method of the transformer area as claimed in claim 1, wherein in the steps S30 and S40, the step of establishing the random forest identification model comprises:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of (D-D) time-series voltage sample data of the sample set.
7. The method for identifying the phase sequence of the region based on the random forest algorithm as recited in claim 6, wherein in the step S43, selecting one of m training sets to generate a training decision tree by using a random forest comprises:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value.
8. A phase sequence recognition device of a transformer area based on a random forest algorithm is characterized by comprising a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
and the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meters in the target transformer area.
9. A computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the random forest algorithm based stage area phase sequence identification method of any one of claims 1-7.
10. A terminal device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the random forest algorithm-based phase sequence identification method of the station area according to any one of claims 1 to 7 according to instructions in the program code.
CN202011622001.1A 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment Active CN112750051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011622001.1A CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011622001.1A CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Publications (2)

Publication Number Publication Date
CN112750051A true CN112750051A (en) 2021-05-04
CN112750051B CN112750051B (en) 2023-04-07

Family

ID=75650339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011622001.1A Active CN112750051B (en) 2020-12-30 2020-12-30 Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment

Country Status (1)

Country Link
CN (1) CN112750051B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093985A (en) * 2021-06-09 2021-07-09 中国南方电网有限责任公司超高压输电公司广州局 Sensor data link abnormity detection method and device and computer equipment
CN113449980A (en) * 2021-06-24 2021-09-28 广东电网有限责任公司 Low-voltage transformer area phase sequence identification method, system, terminal and storage medium
CN113625108A (en) * 2021-08-02 2021-11-09 四川轻化工大学 Flexible direct current power distribution network fault identification method
CN114548226A (en) * 2022-01-21 2022-05-27 国网江苏省电力有限公司常州供电分公司 Method and device for identifying station area user-variable relationship based on K-Means clustering algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404944A (en) * 2015-12-11 2016-03-16 中国电力科学研究院 Big data analysis method for warning of heavy-load and overload of electric power system
CN108376982A (en) * 2017-11-24 2018-08-07 上海泰豪迈能能源科技有限公司 Load recognition methods and the device of phase sequence
CN109376944A (en) * 2018-11-13 2019-02-22 国网宁夏电力有限公司电力科学研究院 The construction method and device of intelligent electric meter prediction model
US20190213446A1 (en) * 2016-06-30 2019-07-11 Intel Corporation Device-based anomaly detection using random forest models
CN110930198A (en) * 2019-12-05 2020-03-27 佰聆数据股份有限公司 Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN111881415A (en) * 2020-07-31 2020-11-03 广东电网有限责任公司计量中心 Method and device for identifying phase sequence and line-subscriber relationship of distribution room

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404944A (en) * 2015-12-11 2016-03-16 中国电力科学研究院 Big data analysis method for warning of heavy-load and overload of electric power system
US20190213446A1 (en) * 2016-06-30 2019-07-11 Intel Corporation Device-based anomaly detection using random forest models
CN108376982A (en) * 2017-11-24 2018-08-07 上海泰豪迈能能源科技有限公司 Load recognition methods and the device of phase sequence
CN109376944A (en) * 2018-11-13 2019-02-22 国网宁夏电力有限公司电力科学研究院 The construction method and device of intelligent electric meter prediction model
CN110930198A (en) * 2019-12-05 2020-03-27 佰聆数据股份有限公司 Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN111881415A (en) * 2020-07-31 2020-11-03 广东电网有限责任公司计量中心 Method and device for identifying phase sequence and line-subscriber relationship of distribution room

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093985A (en) * 2021-06-09 2021-07-09 中国南方电网有限责任公司超高压输电公司广州局 Sensor data link abnormity detection method and device and computer equipment
CN113093985B (en) * 2021-06-09 2021-09-10 中国南方电网有限责任公司超高压输电公司广州局 Sensor data link abnormity detection method and device and computer equipment
CN113449980A (en) * 2021-06-24 2021-09-28 广东电网有限责任公司 Low-voltage transformer area phase sequence identification method, system, terminal and storage medium
CN113449980B (en) * 2021-06-24 2023-02-24 广东电网有限责任公司 Low-voltage transformer area phase sequence identification method, system, terminal and storage medium
CN113625108A (en) * 2021-08-02 2021-11-09 四川轻化工大学 Flexible direct current power distribution network fault identification method
CN113625108B (en) * 2021-08-02 2022-11-01 四川轻化工大学 Flexible direct current power distribution network fault identification method
CN114548226A (en) * 2022-01-21 2022-05-27 国网江苏省电力有限公司常州供电分公司 Method and device for identifying station area user-variable relationship based on K-Means clustering algorithm

Also Published As

Publication number Publication date
CN112750051B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN112750051B (en) Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment
CN108376982A (en) Load recognition methods and the device of phase sequence
Cheng et al. Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting
CN105786702A (en) Computer software analysis system
CN110601173B (en) Distribution network topology identification method and device based on edge calculation
CN110019519A (en) Data processing method, device, storage medium and electronic device
CN111654392A (en) Low-voltage distribution network topology identification method and system based on mutual information
CN111461164A (en) Sample data set capacity expansion method and model training method
CN110110213A (en) Excavate method, apparatus, computer readable storage medium and the terminal device of user&#39;s occupation
CN112767190B (en) Method and device for identifying phase sequence of transformer area based on multilayer stacked neural network
CN111147306B (en) Fault analysis method and device of Internet of things equipment and Internet of things platform
CN115759365A (en) Photovoltaic power generation power prediction method and related equipment
CN115795329A (en) Power utilization abnormal behavior analysis method and device based on big data grid
CN112508254B (en) Method for determining investment prediction data of transformer substation engineering project
CN112100860A (en) MMC (Modular multilevel converter) model establishing method and electromagnetic transient simulation method for multi-terminal direct-current power transmission system
CN117236022A (en) Training method and application method of residual life prediction model of transformer and electronic equipment
CN115329814B (en) Low-voltage user link identification method and device based on image signal processing
CN116595395A (en) Inverter output current prediction method and system based on deep learning
CN106599865A (en) Disconnecting link state recognition device and method
CN105447251B (en) A kind of verification method based on transaction types excitation
CN113627655B (en) Method and device for simulating and predicting pre-disaster fault scene of power distribution network
CN115545085A (en) Weak fault current fault type identification method, device, equipment and medium
CN114971053A (en) Training method and device for online prediction model of network line loss rate of low-voltage transformer area
CN113569904B (en) Bus wiring type identification method, system, storage medium and computing device
EP3748555A1 (en) Method and system for low sampling rate electrical load disaggregation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant