CN111080105A

CN111080105A - Transformer area user-to-user relationship identification method and system based on voltage time sequence data

Info

Publication number: CN111080105A
Application number: CN201911239321.6A
Authority: CN
Inventors: 王剑; 胡伟; 王云龙; 刘越; 吴双; 李刚; 孟妍; 郎斌; 赵志阳; 陈源; 付博
Original assignee: Tsinghua University; State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd
Current assignee: Tsinghua University; State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-04-28

Abstract

The invention discloses a method and a system for identifying station area subscriber variation relationship based on voltage time sequence data, wherein the method comprises the following steps: collecting and processing voltage time sequence data of a transformer side and each user side of a distribution station area as a plurality of observation variables; the method comprises the steps of performing independent component analysis and feature extraction on a plurality of observation variables by adopting a FastICA technology to obtain a series of mutually independent random variables and mixing matrixes for estimating the plurality of observation variables, performing cluster analysis on data, namely the mixing matrixes after feature extraction, by utilizing a K-means clustering method to obtain clustering results, and determining the corresponding relation of users in a transformer area according to the clustering results.

Description

Transformer area user-to-user relationship identification method and system based on voltage time sequence data

Technical Field

The invention relates to the technical field of power distribution network management of a power system, in particular to a station area user variable relation identification method and system based on the combination of FastICA algorithm and K-means clustering.

Background

With the rapid development of power grids, the number of power consumers continues to increase, and the scale and structure of low-voltage power distribution networks become larger and more complex. In order to facilitate management, a power company manages the low-voltage distribution network users in a distribution area, and the identification of the relationship among the stations is the basis for realizing marketing refinement, consumption reduction and loss reduction and is also the premise of electricity stealing detection. In order to ensure the accuracy of line loss calculation, the power department needs to frequently check the station area information of users. In a low-voltage transformer area, the circuits of part of old blocks are complex, and due to the reasons of imperfect transformer area information, untimely updating and the like, the user data of the transformer area are often inaccurate or even lost. In addition, the problems that the user incoming line end and the concentrator attribution relation are not accurately recorded and the station-to-user relation is not consistent with the actual condition caused by the change of user wiring or the line reconstruction due to balanced load distribution occur, so that the effective identification of the station-to-user relation is particularly important under the condition of no power failure.

The method for recognizing the relation between users in the foreground is mainly divided into manual recognition and special station area recognition equipment. The manual identification mainly depends on the electric power personnel to find out the affiliation of the user station area to the on-site resident, and along with the increasing of the electricity consumption, the manual identification wastes time and energy and has low efficiency. The special station area identification equipment mainly adopts a pulse current method, wherein the pulse current method is used for sending pulse current signals at the transformer end and receiving the pulse current signals at the identification terminal to complete identification. However, the method cannot perform bidirectional communication, usually requires carrier communication to be used in cooperation with auxiliary communication, and in addition, when a pulse current signal passes through a transformer, an alternating magnetic field cannot be generated, and the pulse current signal cannot pass through the transformer, so that the pulse current signal can only be transmitted in the same phase line range of the same transformer area.

In recent years, with the large-scale installation and popularization of the smart electric meters at the user sides, a power grid can obtain massive operation data such as user voltage, current and the like. In view of the problems in the current research, a method for determining the relationship between the users without installing additional devices and apparatuses and without manual detection is needed.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a station-to-station relationship identification method based on voltage time series data, which can realize intelligent identification of the station-to-station relationship by using only the voltage time series data while ensuring accuracy.

Another objective of the present invention is to provide a station-area correlation identification system based on voltage time series data.

In order to achieve the above object, an embodiment of the present invention provides a station area correlation identification method based on voltage time series data, including the following steps: collecting and processing time sequence voltage data of a transformer side and a user side of a distribution transformer area to obtain a plurality of observation variables; performing dimensionality reduction processing on the plurality of observation variables by adopting a FastICA algorithm to obtain corresponding independent components and corresponding mixed matrixes of the plurality of observation variables; and performing clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result, and determining the corresponding relation of the users in the distribution area according to the clustering result.

According to the station area user variable relation identification method based on the voltage time sequence data, the FastICA algorithm and the K-means method are sequentially used for carrying out data dimension reduction and clustering to realize user variable relation identification, extra equipment and devices do not need to be installed, manual detection is not needed, user voltage time sequence data only need to be collected according to a certain sampling rate, manpower and material resources can be saved to realize user variable relation identification, meanwhile, FastICA independent component analysis converts the voltage time sequence data into static characteristics, the calculated amount in the process of utilizing the voltage time sequence data correlation calculation is reduced, and the efficiency is improved.

In addition, the station area correlation identification method based on the voltage time series data according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, before performing the dimension reduction process of the FastICA algorithm, data preprocessing is performed on the plurality of observation variables, wherein the data preprocessing includes a decentralization and a whitening transformation.

Further, in one embodiment of the present invention, the decentralized process is: calculating an average value of the plurality of observed variables; and subtracting the average value from each sampling point of each observation variable to obtain the decentralized observation variable.

Further, in one embodiment of the present invention, the whitening transformation is to remove the correlation between each observed variable to simplify the extraction process of the independent components.

Further, in an embodiment of the present invention, the performing cluster analysis on the mixing matrix by using K-means clustering includes: selecting k samples from the mixing matrix as initial clustering centers; classifying the k samples into a class to which a clustering center closest to the k samples belongs; recalculating the clustering center; the classification and recalculation processes are iteratively performed until the objective function converges or a maximum number of iterations is reached.

In order to achieve the above object, another embodiment of the present invention provides a station area correlation identification system based on voltage time series data, including: the acquisition module is used for acquiring and processing time sequence voltage data of a transformer side and a user side of a distribution transformer area to obtain a plurality of observation variables; the dimension reduction module is used for carrying out dimension reduction processing on the plurality of observation variables by adopting a FastICA algorithm to obtain corresponding independent components and corresponding mixing matrixes of the plurality of observation variables; and the clustering module is used for carrying out clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result and determining the corresponding relation of the users in the distribution area according to the clustering result.

According to the station area user variable relation identification system based on the voltage time sequence data, the FastICA algorithm and the K-means method are sequentially used for carrying out data dimension reduction and clustering to realize user variable relation identification, extra equipment and devices do not need to be installed, manual detection is not needed, user voltage time sequence data only need to be collected according to a certain sampling rate, manpower and material resources can be saved to realize user variable relation identification, meanwhile, FastICA independent component analysis converts the voltage time sequence data into static characteristics, the calculated amount in the process of utilizing the voltage time sequence data correlation calculation is reduced, and the efficiency is improved.

In addition, the station area correlation identification system based on the voltage time series data according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, before performing the dimension reduction process in the dimension reduction module, data preprocessing is performed on the plurality of observation variables, where the data preprocessing includes decentralization and whitening transformation.

Further, in an embodiment of the present invention, in the clustering module, performing cluster analysis on the mixing matrix by using a K-means clustering method includes: the selecting unit is used for selecting k samples from the mixed matrix as initial clustering centers; the classification unit is used for classifying the k samples into a class to which a cluster center closest to the k samples belongs; a recalculation unit for recalculating the clustering center; and the iteration unit is used for iteratively executing the classification and recalculation processes until the target function converges or the maximum iteration times is reached.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a station-to-station relationship identification method based on voltage timing data according to an embodiment of the present invention;

fig. 2 is a detailed flowchart of station-to-station relationship change identification according to an embodiment of the present invention;

FIG. 3 is a flow chart of voltage timing observation variable feature extraction according to one embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a station-area correlation identification system based on voltage timing data according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a clustering module in a station-to-station user-variable relationship identification system based on voltage timing data according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The method and system for identifying station area diversity relationship based on voltage time series data according to the embodiments of the present invention are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a station area subscriber relationship identification method based on voltage timing data according to an embodiment of the present invention.

As shown in fig. 1, the station area correlation identification method based on voltage time series data includes the following steps:

in step S101, time series voltage data of the transformer side and the user side of the distribution substation are collected and processed to obtain a plurality of observation variables.

Specifically, the method and the device for acquiring the time sequence voltage data of the transformer and the users in the transformer area acquire the time sequence voltage data of the transformer and the users in the transformer area from an acquisition system, and then clear up and clean the time sequence voltage data to obtain a plurality of observation variables;

in step S102, a FastICA algorithm is used to perform dimensionality reduction on the plurality of observed variables to obtain corresponding independent components and corresponding mixing matrices of the plurality of observed variables.

It should be noted that, before the dimension reduction processing of the FastICA algorithm is performed, data preprocessing is performed on a plurality of observation variables, wherein the data preprocessing includes the decentralization and the whitening transformation.

Wherein the decentralized process comprises the following steps: calculating an average value of a plurality of observed variables; and subtracting the average value from each sampling point of each observation variable to obtain the decentralized observation variable.

The specific treatment process is as follows:

for m observation variables with the number of n sampling points, firstly, calculating the average value of the observation variables by using the formula (1), and then subtracting the average value of the observation variables from each sampling point of each observation variable to obtain the decentralized observation variables, wherein the formula (2) is shown.

Wherein, the whitening transformation is to remove the correlation between each observation variable to simplify the extraction process of the independent components.

For example, as shown in FIG. 2, for an observed variable x₁(t),x₂(t),…,x_m(t) performing a whitening transformation, namely:

z(t)＝Bx(t) (3)

wherein the content of the first and second substances,

is a whitening matrix, E is a covariance matrix E { x (t)^TThe unit orthogonal eigenvector of the unit is a matrix of columns; Γ ═ diag (γ)₁,γ₂,...γ_m) Is E { x (t)^TThe eigenvalues of the are diagonal matrices of diagonal elements, Γ^-1/2The diagonal matrix is obtained by taking a square root and then taking the reciprocal of a diagonal element of the gamma.

It should be noted that the objective of the FastICA algorithm is to find a separation matrix W_zSo that s (t) is W_zZ (t) has the greatest non-Gaussian property, where W_zIs the corresponding separation matrix after whitening of the observed variable. Separation matrix W corresponding to observed variable and separation matrix W corresponding to whitened variable_zThe following relations exist between the following components:

W＝W_zB (4)

solving by using FastICA algorithm to obtain W_zAnd then obtaining W according to the formula (6), and solving the inverse matrix or the generalized inverse matrix of W to finally obtain a mixed matrix A of the observed variables.

Further, the dimension reduction processing by using the FastICA algorithm comprises the following specific steps:

(1) carrying out whitening transformation on the observation variable X (t) to obtain Z (t) with zero mean and unit variance characteristics;

(2) selecting a random initial weight vector W_{z_p}Subscript p denotes the pth iteration, let p equal to 1;

(3) the weight vector W is aligned according to equation (5)_{z_p}Adjusting, wherein E (-) is mean value operation, g (-) is nonlinear function;

(4) for weight vector W_{z_p}Performing orthogonalization operation;

(5) for weight vector W_{z_p}Carrying out normalization processing;

W_{z_p}＝W_{z_p}/||W_{z_p}|| (7)

(6) if W_{z_p}If not, returning to the step (3) to continue iteration;

(7) if p is equal to p +1 and is less than or equal to m, returning to the step (2);

where ICA can be essentially thought of as representing the original observed variable by a linear combination of a set of independent random variables, the expansion is written as follows:

all the observed variables can be linearly represented by random variables, and if the set of random variables is regarded as a set of bases under a high-dimensional space, the row vector a in the mixing matrix A_iAnd corresponding to the coordinate of an observation variable in the high-dimensional space, thereby realizing the representation of time series data by using static data. As shown in fig. 3, from the aspect of time series data feature extraction, the set of bases s (t) can be used as the feature of the time series data, and then the row vector a_iRepresenting the corresponding characteristic value.

In step S103, performing clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result, and determining a corresponding relationship between users in the distribution area according to the clustering result.

From step S102, the row vector a of the mixing matrix is known_iRepresenting corresponding observation variables, therefore, a static data clustering mode can be used for replacing time sequence data clustering, and K-means clustering is adopted for the obtained mixing matrix.

It is understood that the similarity relationship between the row vectors of the mixing matrix a reflects the similarity relationship between the observed variables x (t) from the relationship between the linear space basis and the coordinates. Therefore, clustering of the observation variables can be realized by clustering the row vectors of A. The dimension of the mixing matrix A is irrelevant to the time t, so that the dimension of time sequence data and the complexity of cluster analysis are greatly reduced.

In the identification of the relation of the subscriber, the category number of the user electricity consumption data needing to be subjected to clustering analysis can be determined in advance and is suitable for adopting a K-means algorithm, so that the embodiment of the invention adopts the ICA-based K-means algorithm to cluster the user electricity consumption time sequence data. Wherein the target function of the K-means is as follows:

in the formula (d)_ijIs the Euclidean distance from the sample point to the distance center, k is the number of clusters, n_iIs the number of samples i, c_iIs the cluster center of the ith type sample.

Further, in an embodiment of the present invention, performing cluster analysis on the mixing matrix by using K-means clustering includes:

selecting k samples from the mixed matrix as initial clustering centers;

classifying the k samples into a class to which a clustering center closest to the k samples belongs;

recalculating the clustering center;

the classification and recalculation processes are iteratively performed until the objective function converges or a maximum number of iterations is reached.

That is, the clustering process of K-means is:

(1) selecting k samples as initial clustering centers;

(2) classifying each sample into a class where a cluster center closest to the sample is located;

(3) recalculating the clustering center;

(4) repeating the steps (2) and (3) until the target function converges or the maximum iteration number is reached, obtaining the result of K-means clustering, determining the station area phase information of each class by the determined station area and phase in the class, and determining the station area phase information to which the user belongs in the class, thereby realizing the station area user variable relationship identification.

According to the station area user variation relation identification method based on the voltage time sequence data, provided by the embodiment of the invention, the static characteristics of the time sequence data are extracted by firstly processing the collected user voltage time sequence data and analyzing the FastICA independent components; then, a K-means clustering method of static data is applied to carry out clustering analysis on static characteristics (namely a mixed matrix), so that voltage time sequence data are automatically classified into corresponding categories to realize the identification of the user variable relationship, extra equipment and devices are not required to be installed in the method, manual detection is not required, the voltage time sequence data of a user are only required to be collected according to a certain sampling rate, and the identification of the user variable relationship can be realized by saving manpower and material resources; the FastICA independent component analysis converts the voltage time sequence data into static characteristics, reduces the calculation amount when the voltage time sequence data correlation is used for calculation, and improves the efficiency.

Next, a station-to-station relationship identification system based on voltage timing data according to an embodiment of the present invention will be described with reference to the drawings.

Fig. 4 is a table-area-subscriber relationship identification system based on voltage timing data according to an embodiment of the present invention.

As shown in fig. 4, the station-to-station relationship identification system 10 based on voltage timing data includes: the system comprises an acquisition module 100, a dimension reduction module 200 and a clustering module 300.

The obtaining module 100 is configured to collect and process time sequence voltage data of a transformer side and a user side of a distribution substation to obtain a plurality of observation variables. The dimension reduction module 200 is configured to perform dimension reduction processing on a plurality of observation variables by using a FastICA algorithm, and obtain corresponding independent components and corresponding mixing matrices of the observation variables. The clustering module 300 is configured to perform clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result, and determine a corresponding relationship between users in the distribution area according to the clustering result.

Further, in an embodiment of the present invention, before performing the dimension reduction process in the dimension reduction module, a data preprocessing is performed on the plurality of observation variables, wherein the data preprocessing includes a decentralization and a whitening transformation.

Further, in one embodiment of the present invention, the decentralized process is: calculating an average value of a plurality of observed variables; and subtracting the average value from each sampling point of each observation variable to obtain the decentralized observation variable.

Further, as shown in fig. 5, in an embodiment of the present invention, in the clustering module 300, performing cluster analysis on the mixing matrix by using a K-means clustering method includes:

a selecting unit 301, configured to select k samples in the mixed matrix as an initial clustering center;

a classifying unit 302, configured to classify the k samples into a class to which a cluster center closest to the k samples belongs;

a recalculation unit 303 for recalculating the clustering center;

an iteration unit 304 for iteratively performing the classification and recalculation processes until the objective function converges or a maximum number of iterations is reached.

According to the station area user variable relation recognition system based on the voltage time sequence data, the FastICA algorithm and the K-means method are sequentially used for carrying out data dimension reduction and clustering to realize user variable relation recognition, extra equipment and devices do not need to be installed, manual detection is not needed, user voltage time sequence data only need to be collected according to a certain sampling rate, manpower and material resources can be saved to realize user variable relation recognition, meanwhile, FastICA independent component analysis converts the voltage time sequence data into static characteristics, the calculated amount in the process of utilizing the voltage time sequence data correlation calculation is reduced, and the efficiency is improved.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A transformer area user variation relation identification method based on voltage time sequence data is characterized by comprising the following steps:

collecting and processing time sequence voltage data of a transformer side and a user side of a distribution transformer area to obtain a plurality of observation variables;

performing dimensionality reduction processing on the plurality of observation variables by adopting a FastICA algorithm to obtain corresponding independent components and corresponding mixed matrixes of the plurality of observation variables; and

and performing clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result, and determining the corresponding relation of the users in the distribution area according to the clustering result.

2. The method for identifying station area diversity relationship based on voltage time series data according to claim 1, wherein before performing dimension reduction processing of the FastICA algorithm, data preprocessing is performed on the plurality of observation variables, wherein the data preprocessing includes decentralization and whitening transformation.

3. The method for identifying station area diversity relationship based on voltage time series data according to claim 2, wherein the decentralized process is as follows:

calculating an average value of the plurality of observed variables;

and subtracting the average value from each sampling point of each observation variable to obtain the decentralized observation variable.

4. The method for identifying station-area correlation based on voltage time series data according to claim 2, wherein the whitening transformation is to remove the correlation between each observation variable so as to simplify the extraction process of independent components.

5. The method for identifying station area diversity relationship based on voltage time series data according to claim 1, wherein the performing cluster analysis on the mixing matrix by using K-means clustering comprises:

selecting k samples from the mixing matrix as initial clustering centers;

recalculating the clustering center;

6. A transformer area user-dependent relationship identification system based on voltage time sequence data is characterized by comprising:

the acquisition module is used for acquiring and processing time sequence voltage data of a transformer side and a user side of a distribution transformer area to obtain a plurality of observation variables;

the dimension reduction module is used for carrying out dimension reduction processing on the plurality of observation variables by adopting a FastICA algorithm to obtain corresponding independent components and corresponding mixing matrixes of the plurality of observation variables; and

and the clustering module is used for carrying out clustering analysis on the mixed matrix by using a K-means clustering method to obtain a clustering result and determining the corresponding relation of the users in the distribution area according to the clustering result.

7. The system of claim 6, wherein the pre-processing of the plurality of observation variables is performed before performing the dimensionality reduction process in the dimensionality reduction module, wherein the pre-processing of the data comprises a de-centering and a whitening transformation.

8. The system of claim 7, wherein the decentralized process comprises:

calculating an average value of the plurality of observed variables;

9. The voltage timing data-based station area correlation identification system according to claim 7, wherein the whitening transformation is to remove correlation between each observed variable to simplify the extraction process of independent components.

10. The system for identifying station area diversity relationship based on voltage time series data according to claim 6, wherein in the clustering module, the mixing matrix is clustered and analyzed by using a K-means clustering method, and the method comprises:

the selecting unit is used for selecting k samples from the mixed matrix as initial clustering centers;

the classification unit is used for classifying the k samples into a class to which a cluster center closest to the k samples belongs;

a recalculation unit for recalculating the clustering center;

and the iteration unit is used for iteratively executing the classification and recalculation processes until the target function converges or the maximum iteration times is reached.