CN112750051A - Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment - Google Patents
Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment Download PDFInfo
- Publication number
- CN112750051A CN112750051A CN202011622001.1A CN202011622001A CN112750051A CN 112750051 A CN112750051 A CN 112750051A CN 202011622001 A CN202011622001 A CN 202011622001A CN 112750051 A CN112750051 A CN 112750051A
- Authority
- CN
- China
- Prior art keywords
- random forest
- sample data
- training
- voltage sample
- time sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 114
- 238000000819 phase cycle Methods 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 claims abstract description 103
- 238000003066 decision tree Methods 0.000 claims abstract description 39
- 238000012360 testing method Methods 0.000 claims abstract description 24
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 8
- 239000000243 solution Substances 0.000 description 9
- 238000012423 maintenance Methods 0.000 description 7
- 238000002347 injection Methods 0.000 description 6
- 239000007924 injection Substances 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 5
- 230000005611 electricity Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000012661 Dyskinesia Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention relates to a station area phase sequence identification method, a device and terminal equipment based on a random forest algorithm, wherein time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target station area and a characteristic set for establishing a phase sequence attribution relation of the user electric meters are obtained, and the time sequence voltage sample data are preprocessed; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately card the phase sequence attribution relation of the user ammeter on the premise of not needing to plug other terminal equipment outside a target station area, and is low in cost and high in engineering application value.
Description
Technical Field
The invention relates to the technical field of low-voltage distribution networks, in particular to a random forest algorithm-based phase sequence identification method and device for a transformer area and terminal equipment.
Background
The traditional low-voltage operation and maintenance management mode is adopted for management in a low-voltage distribution network, and due to the lack of support of a topological relation of a distribution room, the traditional low-voltage operation and maintenance management mode easily causes the problems of untimely power failure notification, untimely repair and restoration, long or unsolved low-voltage solution time, frequent abnormal movement of the distribution room, abnormal line loss of the distribution room and the like, and further causes dissatisfaction of power users. Therefore, it is very important to research the physical topology identification technology of the transformer-line-phase-user (distribution transformer-low voltage outlet-phase-user electric meter) in the transformer area. Aiming at the problem, an injection signal method, a data label method and a data analysis method are researched to identify the physical topology of the transformer-line-phase-user in the transformer area. The existing signal injection method and the data label method both need to add extra terminal equipment, have large investment and large operation and maintenance pressure, and are difficult to apply in a micropower wireless scheme area.
Disclosure of Invention
The embodiment of the invention provides a phase sequence identification method, a phase sequence identification device and terminal equipment for a transformer area based on a random forest algorithm, and aims to solve the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line, the phase and the user is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
a random forest algorithm-based phase sequence identification method for a transformer area comprises the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
s50, carrying out phase sequence recognition on the test set by adopting the random forest recognition model to obtain the phase sequence of the user electric meters in the target transformer area.
Preferably, in step S10, the obtaining step obtains that the time-series voltage sample data between the low-voltage outgoing lines of each phase of the distribution transformer and each user electric meter in the target area in a certain time period satisfies a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
Preferably, the missing proportion of the time sequence voltage sample data is a ratio percentage of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
Preferably, in step S20, the preprocessing the time-series voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
Preferably, in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
Preferably, in steps S30 and S40, the step of building a random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of (D-D) time-series voltage sample data of the sample set.
Preferably, in step S43, selecting one of m training sets to generate one training decision tree using a random forest includes:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value.
The invention also provides a station area phase sequence recognition device based on the random forest algorithm, which comprises a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
and the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meters in the target transformer area.
The present invention also provides a computer-readable storage medium for storing computer instructions, which, when run on a computer, cause the computer to execute the above-mentioned random forest algorithm-based phase sequence identification method for a region.
The invention also provides terminal equipment, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
According to the technical scheme, the embodiment of the invention has the following advantages: the method, the device and the terminal equipment for identifying the phase sequence of the transformer area based on the random forest algorithm acquire time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area and establish a characteristic set of the phase sequence attribution relationship of the user electric meters, and preprocess the time sequence voltage sample data; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise of not needing to plug in other terminal equipment in a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to identify the 'variable-line-phase-user' physical topology of the station area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flow chart illustrating steps of a phase sequence identification method for a transformer area based on a random forest algorithm according to an embodiment of the present invention.
Fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter based on the random forest algorithm station area phase sequence identification method according to the embodiment of the invention.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
Fig. 4 is a frame diagram of the phase sequence identification device for a transformer area based on a random forest algorithm according to the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application provides a phase sequence identification method and device for a transformer area based on a random forest algorithm and terminal equipment, and solves the technical problems that terminal equipment is additionally added, the investment is large, and the operation and maintenance pressure is large when the physical topology of the transformer area, the line area, the phase area and the user area is identified by adopting an injection signal method, a data label method or a data analysis method in the prior art.
The first embodiment is as follows:
fig. 1 is a flow chart illustrating steps of a station phase sequence identification method based on a random forest algorithm according to an embodiment of the present invention, and fig. 2 is a time sequence voltage distribution diagram of each time section of a user electric meter according to the station phase sequence identification method based on the random forest algorithm according to the embodiment of the present invention.
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a phase sequence identification method for a distribution room based on a random forest algorithm, including the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for the m training decision trees;
s50, adopting a random forest identification model to identify the phase sequence of the test set to obtain the phase sequence of the user electric meters in the target transformer area.
In step S10 of the embodiment of the present invention, time sequence voltage sample data between each phase low voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target transformer area and a feature set for establishing a phase sequence attribution relationship of the user electric meters are mainly obtained from the power system.
It should be noted that the acquired time sequence voltage sample data needs to satisfy a first constraint condition, where the first constraint condition includes: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of the target platform area corresponding to the time sequence voltage sample data is more than 0.02. In this embodiment, each sampling point uses one voltage data, that is, the time sequence voltage sample data at least includes the voltage data of 96 time points. And the time section of the voltage sample data is not less than the total number of the user electric meters in the target transformer area. The user electric meter phase sequence attribution relation refers to which phase sequence in the attribution A, B, C phase sequences the user electric meter belongs to.
In the embodiment of the invention, the missing proportion of the time sequence voltage sample data is the percentage of the ratio of the number of the missing sampling points of the acquired time sequence voltage sample data to the time span of the time sequence voltage sample data.
It should be noted that (null/m) × 100% is less than or equal to 20%, where null is the number of missing sampling points, and m is the time span of the time sequence voltage sample data.
In the embodiment of the present invention, the expression of the feature set for establishing the phase sequence attribution relationship of the user electricity meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
In the embodiment of the present invention, the three-phase imbalance of the target block corresponding to the time sequence voltage sample data is:
S=mean{sac、sbc、sab}
in the formula (I), the compound is shown in the specification,is the timing voltage of phase a in the concentrator of the target site,is the timing voltage of phase B in the concentrator of the target site,is the timing voltage of the C phase in the concentrator of the target site.
In step S20 of the embodiment of the present invention, time sequence voltage sample data is mainly preprocessed to ensure the integrity of the concentrated data of the sample, and to provide a guarantee for the accuracy of the identification result of the random forest identification model.
In step S30 of the embodiment of the present invention, a training set and a test set are generated by selecting time sequence voltage sample data from a sample set, and training data and test data are provided for establishing a random forest recognition model, so as to obtain a random forest recognition model capable of recognizing a phase sequence of a user electricity meter in a target platform area.
In step S40 of the embodiment of the present invention, a random forest algorithm is mainly used to train and establish a random forest recognition model in combination with a training set and a feature set.
In machine learning, a random forest is a classifier that includes a plurality of decision trees, and the class of its output is determined by the mode of the class output by the individual trees. Leo Breiman and Adele Cutler developed algorithms that inferred random forests. And "Random forms" were derived from Random decision Forests (Random decision forms) proposed by Tin Kam Ho of Bell laboratories in 1995. The random forest algorithm is a set of decision trees that combines the "boosting aggregation" idea of Breimans with the "random subspace method" of Ho.
In step S50 of the embodiment of the present invention, a trained random forest recognition model is used to recognize the phase-sequence relationship between the user electricity meter and the distribution transformer in the time sequence voltage sample data of the test set, so as to obtain the phase sequence of the user electricity meter in the target transformer area.
It should be noted that, if the number of the user electric meters in the target region is 68, 288 time-series voltage sample data of each user electric meter in the target region are obtained through step S10, as shown in fig. 2. Through steps S20 to S40, a random forest recognition model capable of recognizing the phase sequence of the user electricity meters in the target region is obtained, and the recognition result is shown in table 1.
Table 1 shows the identification results of the phase sequence relationship between the user electric meter and the distribution transformer in the target area
From the table 1, the phase sequences of 64 user electric meters are correctly identified, and the phase sequences of 4 user electric meters are incorrectly identified, the accuracy of identifying the phase sequence relation between the user electric meters and all phases of the distribution transformer in the target distribution area by the random forest identification model established by the random forest algorithm-based distribution area phase sequence identification method can reach 94.12%, and the effectiveness and the feasibility of the random forest algorithm-based distribution area phase sequence identification method are verified.
The invention provides a random forest algorithm-based transformer area phase sequence identification method, which comprises the steps of obtaining time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target transformer area, establishing a characteristic set of a phase sequence attribution relation of the user electric meters, and preprocessing the time sequence voltage sample data; generating a training set and a testing set for the processed sample set, training the training set and the feature set by adopting a random forest algorithm to obtain training decision trees, and acquiring m training decision trees to build a random forest recognition model; and predicting the phase sequence of each phase of the concentrated user electric meter and the distribution transformer by adopting a random forest recognition model. The station area phase sequence identification method based on the random forest algorithm can accurately comb the phase sequence attribution relation of the user ammeter on the premise of not needing to plug in other terminal equipment in a target station area, is low in cost and high in engineering application value, and solves the technical problems that terminal equipment is additionally added, the investment amount is large, and the operation and maintenance pressure is large when the existing injection signal method, data label method or data analysis method is adopted to identify the 'variable-line-phase-user' physical topology of the station area.
In one embodiment of the present invention, in step S20, the preprocessing the time-series voltage sample data includes: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm; and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
It should be noted that missing data in the acquired time sequence voltage sample data is mainly filled, so that the acquired time sequence voltage sample data is complete, and inaccurate random forest identification model identification results obtained by data incompleteness are avoided.
Fig. 3 is a flow chart of steps of establishing a random forest recognition model by the platform region phase sequence recognition method based on the random forest algorithm according to the embodiment of the invention.
As shown in fig. 3, in one embodiment of the present invention, in steps S30 and S40, the step of establishing the random forest recognition model includes:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of time sequence voltage sample data of the (D-D) sample set.
In the embodiment of the present invention, in step S43, selecting one training set from m training sets to generate one training decision tree using a random forest includes:
dividing a training set into two sub-training sets, wherein each non-leaf node corresponding to a random forest is provided with two branches;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value. The Gini coefficients of a random forest are defined as:
in the formula, piAnd the probability that the sub training set in the random forest nodes belongs to the class i is shown, and M is the number of the classes in the random forest nodes.
It should be noted that, a sample of the time sequence voltage sample data is randomly selected from the sample set D each time, stored into the training set, and then the sample of the time sequence voltage sample data is placed back to the original sample set D, so that the sample of the time sequence voltage sample data is possibly selected next time; repeating the steps m times to obtain a training set d containing m samples of time sequence voltage sample data, namely: d ═ d1,d2,...,dm}. Samples (D-D) of the timing voltage sample data remaining in the sample set D serve as a test set. P features are selected from the feature set to form a new feature set, and d is also selected from the training set d ═ d1,d2,...,dmAnd extracting a training set as a current training set, dividing the current training set into 2 sub-training sets, wherein each non-leaf node of a tree generated in a random forest has 2 branches. The non-leaf nodes represent features, and the leaf nodes are identification values given by a tree model of the random forest. Selecting a feature from the new feature set according to a second constraint condition, and dividing nodes of the tree corresponding to the random forest into 2 branches according to the feature; recursively repeating generating a training decision tree on each branch; combining the obtained m training decision trees to obtain a random forestA model is identified. In this embodiment, the feature set contains 6 features, P<6。
In the embodiment of the invention, after the random forest identification model is obtained, a test set (D-D) is input, and the random forest identification model gives the identification value of the test set by averaging the identification values of all regression treesComprises the following steps:
where m is the number of training decision trees, rfiThe decision tree is trained for the i-th training.
Example two:
fig. 4 is a frame diagram of the phase sequence identification device for a transformer area based on a random forest algorithm according to the embodiment of the present invention.
As shown in fig. 4, an embodiment of the present invention further provides a station phase sequence identification apparatus based on a random forest algorithm, which includes a data acquisition module 10, a data processing module 20, a sample classification module 30, a model building module 40, and an identification module 50;
the data acquisition module 10 is configured to acquire time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in the target station area, and establish a feature set of a phase sequence affiliation relationship of the user electric meters;
the data processing module 20 is configured to perform preprocessing on the time-series voltage sample data to obtain a sample set;
the sample classification module 30 is configured to select time sequence voltage sample data from a sample set to generate a training set and a test set;
the model establishing module 40 is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for the m training decision trees;
and the identification module 50 is used for identifying the phase sequence of the test set by adopting a random forest identification model to obtain the phase sequence of the user electric meter in the target transformer area.
It should be noted that the modules in the second embodiment correspond to the steps in the first embodiment, and the steps in the first embodiment have been described in detail in the first embodiment, and the contents of the modules in the second embodiment are not described in detail in this second embodiment.
Example three:
the embodiment of the invention provides a computer-readable storage medium, which is used for storing computer instructions, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the station area phase sequence identification method based on the random forest algorithm.
Example four:
the embodiment of the invention provides terminal equipment, which comprises a processor and a memory;
a memory for storing the program code and transmitting the program code to the processor;
and the processor is used for executing the station area phase sequence identification method based on the random forest algorithm according to the instructions in the program codes.
It should be noted that the processor is configured to execute the steps in the above-mentioned embodiment of the phase sequence identification method based on the random forest algorithm according to the instructions in the program code. Alternatively, the processor, when executing the computer program, implements the functions of each module/unit in each system/apparatus embodiment described above.
Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in a memory and executed by a processor to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of a computer program in a terminal device.
The terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the terminal device is not limited and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used for storing computer programs and other programs and data required by the terminal device. The memory may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A random forest algorithm-based phase sequence identification method for a transformer area is characterized by comprising the following steps:
s10, acquiring time sequence voltage sample data between each phase low-voltage outgoing line of a distribution transformer and each user electric meter in a certain time period of a target platform area, and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
s20, preprocessing the time sequence voltage sample data to obtain a sample set;
s30, selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
s40, training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
s50, carrying out phase sequence recognition on the test set by adopting the random forest recognition model to obtain the phase sequence of the user electric meters in the target transformer area.
2. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the obtained time sequence voltage sample data between the low voltage outgoing lines of the phases of the distribution transformer and the user electric meters in the target transformer area in a certain time period satisfies a first constraint condition, and the first constraint condition comprises: the time span of the time sequence voltage sample data is not less than 96 sampling points, the time section number of the time sequence voltage sample data is not less than the total number of the user electric meters, the missing proportion of the time sequence voltage sample data is not less than 20%, and the three-phase unbalance degree of a target platform area corresponding to the time sequence voltage sample data is more than 0.02.
3. The random forest algorithm-based phase sequence identification method for the transformer area, as recited in claim 2, wherein the missing proportion of the time sequence voltage sample data is a percentage of a ratio of the number of missing sampling points of the acquired time sequence voltage sample data to a time span of the time sequence voltage sample data.
4. The random forest algorithm-based phase sequence identification method for the transformer area, as claimed in claim 2, wherein in step S20, the preprocessing the time sequence voltage sample data comprises: filling missing data in the time sequence voltage sample data with the missing proportion not more than 5% by adopting a Lagrange interpolation algorithm;
and filling missing data in the time sequence voltage sample data with the missing proportion larger than 5% by adopting a K nearest neighbor algorithm.
5. The method for identifying the phase sequence of the transformer area based on the random forest algorithm as claimed in claim 1, wherein in step S10, the expression of the feature set for establishing the phase sequence attribution relationship of the user electric meter is as follows:
F={pA、pB、pC、stdA、stdB、stdC}
in the formula, pA、pB、pCRespectively, the correlation coefficient, std, of the voltage sequence of the consumer meter and the concentrator three-phase time sequence voltage of the target areaA、stdB、stdCAnd respectively acquiring three-phase standard deviations in a certain time period for the time sequence voltage sample data.
6. The random forest algorithm-based phase sequence identification method of the transformer area as claimed in claim 1, wherein in the steps S30 and S40, the step of establishing the random forest identification model comprises:
s41, acquiring time sequence voltage sample data from the sample set D, storing the time sequence voltage sample data into a training set, and storing the acquired time sequence voltage sample data into the sample set;
s42, repeating the step S41 for m times to obtain a training set d consisting of m time sequence voltage sample data, and selecting P features from the feature set to form a new feature set;
s43, selecting one training set from m training sets to generate one training decision tree by using a random forest, and repeatedly executing the step S43 for m times to obtain m training decision trees;
s44, establishing a random forest recognition model for the m training decision trees by adopting a random forest learning algorithm;
wherein the test set consists of (D-D) time-series voltage sample data of the sample set.
7. The method for identifying the phase sequence of the region based on the random forest algorithm as recited in claim 6, wherein in the step S43, selecting one of m training sets to generate a training decision tree by using a random forest comprises:
dividing one training set into two sub-training sets, wherein two branches are arranged corresponding to each non-leaf node of the random forest;
selecting corresponding features from the new feature set by adopting a second constraint condition for the sub-training set corresponding to each branch to generate a training decision tree;
wherein the random forest comprises non-leaf nodes and leaf nodes; the second constraint condition is that the Gini coefficient of the random forest is smaller than a set coefficient threshold value, the number of time sequence voltage sample data of the sub-training set of the nodes in the random forest is smaller than a set number threshold value or the depth of the random forest is larger than a set depth threshold value.
8. A phase sequence recognition device of a transformer area based on a random forest algorithm is characterized by comprising a data acquisition module, a data processing module, a sample classification module, a model establishing module and a recognition module;
the data acquisition module is used for acquiring time sequence voltage sample data between each phase low-voltage outgoing line of the distribution transformer and each user electric meter in a certain time period in a target transformer area and establishing a characteristic set of a phase sequence affiliation relationship of the user electric meters;
the data processing module is used for preprocessing the time sequence voltage sample data to obtain a sample set;
the sample classification module is used for selecting time sequence voltage sample data from the sample set to generate a training set and a test set;
the model establishing module is used for training the training set and the feature set to obtain training decision trees, and establishing random forest recognition models for m training decision trees;
and the identification module is used for carrying out phase sequence identification on the test set by adopting the random forest identification model to obtain the phase sequence of the user electric meters in the target transformer area.
9. A computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the random forest algorithm based stage area phase sequence identification method of any one of claims 1-7.
10. A terminal device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the random forest algorithm-based phase sequence identification method of the station area according to any one of claims 1 to 7 according to instructions in the program code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011622001.1A CN112750051B (en) | 2020-12-30 | 2020-12-30 | Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011622001.1A CN112750051B (en) | 2020-12-30 | 2020-12-30 | Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112750051A true CN112750051A (en) | 2021-05-04 |
CN112750051B CN112750051B (en) | 2023-04-07 |
Family
ID=75650339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011622001.1A Active CN112750051B (en) | 2020-12-30 | 2020-12-30 | Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112750051B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093985A (en) * | 2021-06-09 | 2021-07-09 | 中国南方电网有限责任公司超高压输电公司广州局 | Sensor data link abnormity detection method and device and computer equipment |
CN113449980A (en) * | 2021-06-24 | 2021-09-28 | 广东电网有限责任公司 | Low-voltage transformer area phase sequence identification method, system, terminal and storage medium |
CN113625108A (en) * | 2021-08-02 | 2021-11-09 | 四川轻化工大学 | Flexible direct current power distribution network fault identification method |
CN114548226A (en) * | 2022-01-21 | 2022-05-27 | 国网江苏省电力有限公司常州供电分公司 | Method and device for identifying station area user-variable relationship based on K-Means clustering algorithm |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404944A (en) * | 2015-12-11 | 2016-03-16 | 中国电力科学研究院 | Big data analysis method for warning of heavy-load and overload of electric power system |
CN108376982A (en) * | 2017-11-24 | 2018-08-07 | 上海泰豪迈能能源科技有限公司 | Load recognition methods and the device of phase sequence |
CN109376944A (en) * | 2018-11-13 | 2019-02-22 | 国网宁夏电力有限公司电力科学研究院 | The construction method and device of intelligent electric meter prediction model |
US20190213446A1 (en) * | 2016-06-30 | 2019-07-11 | Intel Corporation | Device-based anomaly detection using random forest models |
CN110930198A (en) * | 2019-12-05 | 2020-03-27 | 佰聆数据股份有限公司 | Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment |
CN111881415A (en) * | 2020-07-31 | 2020-11-03 | 广东电网有限责任公司计量中心 | Method and device for identifying phase sequence and line-subscriber relationship of distribution room |
-
2020
- 2020-12-30 CN CN202011622001.1A patent/CN112750051B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404944A (en) * | 2015-12-11 | 2016-03-16 | 中国电力科学研究院 | Big data analysis method for warning of heavy-load and overload of electric power system |
US20190213446A1 (en) * | 2016-06-30 | 2019-07-11 | Intel Corporation | Device-based anomaly detection using random forest models |
CN108376982A (en) * | 2017-11-24 | 2018-08-07 | 上海泰豪迈能能源科技有限公司 | Load recognition methods and the device of phase sequence |
CN109376944A (en) * | 2018-11-13 | 2019-02-22 | 国网宁夏电力有限公司电力科学研究院 | The construction method and device of intelligent electric meter prediction model |
CN110930198A (en) * | 2019-12-05 | 2020-03-27 | 佰聆数据股份有限公司 | Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment |
CN111881415A (en) * | 2020-07-31 | 2020-11-03 | 广东电网有限责任公司计量中心 | Method and device for identifying phase sequence and line-subscriber relationship of distribution room |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093985A (en) * | 2021-06-09 | 2021-07-09 | 中国南方电网有限责任公司超高压输电公司广州局 | Sensor data link abnormity detection method and device and computer equipment |
CN113093985B (en) * | 2021-06-09 | 2021-09-10 | 中国南方电网有限责任公司超高压输电公司广州局 | Sensor data link abnormity detection method and device and computer equipment |
CN113449980A (en) * | 2021-06-24 | 2021-09-28 | 广东电网有限责任公司 | Low-voltage transformer area phase sequence identification method, system, terminal and storage medium |
CN113449980B (en) * | 2021-06-24 | 2023-02-24 | 广东电网有限责任公司 | Low-voltage transformer area phase sequence identification method, system, terminal and storage medium |
CN113625108A (en) * | 2021-08-02 | 2021-11-09 | 四川轻化工大学 | Flexible direct current power distribution network fault identification method |
CN113625108B (en) * | 2021-08-02 | 2022-11-01 | 四川轻化工大学 | Flexible direct current power distribution network fault identification method |
CN114548226A (en) * | 2022-01-21 | 2022-05-27 | 国网江苏省电力有限公司常州供电分公司 | Method and device for identifying station area user-variable relationship based on K-Means clustering algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN112750051B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112750051B (en) | Random forest algorithm-based phase sequence identification method and device for transformer area and terminal equipment | |
CN108376982A (en) | Load recognition methods and the device of phase sequence | |
Cheng et al. | Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting | |
CN105786702A (en) | Computer software analysis system | |
CN110601173B (en) | Distribution network topology identification method and device based on edge calculation | |
CN110019519A (en) | Data processing method, device, storage medium and electronic device | |
CN111654392A (en) | Low-voltage distribution network topology identification method and system based on mutual information | |
CN111461164A (en) | Sample data set capacity expansion method and model training method | |
CN110110213A (en) | Excavate method, apparatus, computer readable storage medium and the terminal device of user's occupation | |
CN112767190B (en) | Method and device for identifying phase sequence of transformer area based on multilayer stacked neural network | |
CN111147306B (en) | Fault analysis method and device of Internet of things equipment and Internet of things platform | |
CN115759365A (en) | Photovoltaic power generation power prediction method and related equipment | |
CN115795329A (en) | Power utilization abnormal behavior analysis method and device based on big data grid | |
CN112508254B (en) | Method for determining investment prediction data of transformer substation engineering project | |
CN112100860A (en) | MMC (Modular multilevel converter) model establishing method and electromagnetic transient simulation method for multi-terminal direct-current power transmission system | |
CN117236022A (en) | Training method and application method of residual life prediction model of transformer and electronic equipment | |
CN115329814B (en) | Low-voltage user link identification method and device based on image signal processing | |
CN116595395A (en) | Inverter output current prediction method and system based on deep learning | |
CN106599865A (en) | Disconnecting link state recognition device and method | |
CN105447251B (en) | A kind of verification method based on transaction types excitation | |
CN113627655B (en) | Method and device for simulating and predicting pre-disaster fault scene of power distribution network | |
CN115545085A (en) | Weak fault current fault type identification method, device, equipment and medium | |
CN114971053A (en) | Training method and device for online prediction model of network line loss rate of low-voltage transformer area | |
CN113569904B (en) | Bus wiring type identification method, system, storage medium and computing device | |
EP3748555A1 (en) | Method and system for low sampling rate electrical load disaggregation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |