CN115545422A

CN115545422A - Platform area user variation relation identification method based on improved decision mechanism

Info

Publication number: CN115545422A
Application number: CN202211130992.0A
Authority: CN
Inventors: 刘奕玹
Original assignee: State Grid Hunan Electric Power Co Ltd Yueyang Power Supply Branch; State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd
Current assignee: State Grid Hunan Electric Power Co Ltd Yueyang Power Supply Branch; State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-12-30

Abstract

The invention discloses a platform area user variation relation identification method based on an improved decision mechanism, which comprises the following steps: acquiring voltage time sequence data of all users in each region, and performing data dimension reduction and feature extraction to obtain a training set; training the SVDD model by using a training set to obtain hyperspheres corresponding to the distribution areas one by one; acquiring voltage time sequence data of a target user, and performing data dimension reduction and feature extraction to obtain an actual measurement set; and respectively calculating the position relation between each user sample and each hyper-sphere in the measured set, wherein the user sample x does not belong to any station area if not in any hyper-sphere, the user sample x only belongs to the station area corresponding to the hyper-sphere if in one hyper-sphere, and the station area to which the user sample x belongs is the same as the station area to which the maximum target sample belongs in the k nearest target user samples when in at least two hyper-spheres. The invention effectively solves the problem that users can not accurately identify the user variation relationship due to the fact that the users cross the distribution areas and are wrongly filed when the power supply ranges of a plurality of distribution areas are overlapped.

Description

Platform area user variation relation identification method based on improved decision mechanism

Technical Field

The invention relates to the field of power distribution network management, in particular to a station area indoor variation relation identification method based on an improved decision mechanism.

Background

The correct household variable relation is an important basic stone for business development of line loss calculation, line transformation and the like of the distribution network region. However, the number of measuring devices in the existing power distribution system is limited compared with that of a power transmission network, and in addition, situations such as power distribution network reconstruction and extension often occur, so that the problems of disordered attribution records of a user station area and the like may exist.

The common household variable relation identification method at the present stage mainly comprises a switching-out electroscopy and a power carrier communication method, and the switching-out electroscopy is considered to seriously influence the life of residents and is difficult to implement in a large scale; meanwhile, the power carrier communication method has high requirements on equipment and is easily interfered, so that the two methods cannot be widely applied in practice.

With the large-scale popularization of the intelligent electric meter, real-time electric power data of transformers and user nodes in each distribution area can be continuously acquired by means of the power utilization information acquisition system, and therefore the accurate identification of the distribution area user variable relationship can be achieved by mining useful information from mass data. The prior scholars propose a household variation identification method based on measurement data, but only discuss the household variation relation of a single station area, and do not relate to the problem that the membership between uncertain users in adjacent station areas and transformers is difficult to distinguish.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a station area indoor transformer relationship recognition method based on an improved decision mechanism, which effectively solves the problem that the existing low-voltage distribution network cannot be accurately recognized due to the fact that the power supply ranges of a plurality of low-voltage station areas are overlapped, and part of users possibly have the condition that cross-station area false filing exists, and can provide reference for on-site actual investigation, so that the investigation of workers is more targeted, and the efficiency is higher.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a platform area user variation relation identification method based on an improved decision mechanism comprises the following steps:

acquiring voltage time sequence data of all users in each distribution area, constructing a voltage time sequence matrix of each distribution area, and performing data dimension reduction and feature extraction on each voltage time sequence matrix to obtain a training set of each distribution area;

constructing an SVDD model, and training the SVDD model by using the training set to obtain a hypersphere corresponding to each station area;

acquiring voltage time sequence data of a target user, constructing a voltage time sequence matrix of the target user, and performing data dimension reduction and feature extraction on the voltage time sequence matrix to obtain an actual measurement set;

respectively calculating the position relation between each user sample and each hyper-sphere in the measured set, if the user sample x is only in one hyper-sphere, the user sample x belongs to the platform area corresponding to the hyper-sphere, if the user sample x is in at least two hyper-spheres or outside all hyper-spheres, k target user samples nearest to the user sample x are obtained, and the platform area to which the maximum target user sample in the k target user samples belongs is the platform area to which the user sample x belongs.

Further, the step of performing data dimension reduction and feature extraction specifically includes:

normalizing each element in the voltage time sequence to obtain a low-dimensional data sample set;

and mapping the low-dimensional data sample set to a high-dimensional space by using a Gaussian kernel function phi (X) to construct a characteristic matrix, and then extracting principal component characteristic vectors of the characteristic matrix to generate a new characteristic data matrix.

Further, the expression of the normalization process is as follows:

in the above formula, the first and second carbon atoms are,

for the ith node user in the voltage time sequence matrix at the jth momentThe measured value is measured by the following method,

and

the average value and the standard deviation of all measurement point data of the ith node user in the voltage time sequence matrix are shown, and n is the time quantity of a preset time period.

Further, the pivot feature vector expression is as follows:

in the above formula, p is the total number of kernel principal elements, Φ (X) is a Gaussian kernel function, X _i Is a sample X _i A subset of _i Is Lagrange multiplier, and N is the total number of users in the station area.

Further, the SVDD model expression in step S2) is as follows:

in the above formula, xi _i Is a relaxation variable, C is a weight parameter, N is the number of samples, x _i For the user samples in the training set, a and R are the center and radius of the hyper-sphere, respectively.

Further, the method further comprises a step of model transformation after the SVDD model is constructed, and the method specifically comprises the following steps:

introducing a kernel function, converting the original problem of the SVDD model into a dual problem, wherein the expression is as follows:

the constraint conditions are as follows:

in the above formula, α _i And alpha _j Are all lagrange multipliers, x _i 、x _j Respectively two different samples, K (X), in the input data set X _i ,x _j )＝<Φ(x _i ),Φ(x _j )>Is a kernel function and n is the number of the station areas.

Further, the position relation between each user sample and each hypersphere in the measured set is calculated as: and calculating the absolute distance between each user sample in the actual measurement set and the center of each hyper-sphere, wherein if the absolute distance R between the current user sample and the center of the current hyper-sphere is smaller than the radius R of the current hyper-sphere, the current user sample is inside the current hyper-sphere, and otherwise, the current user sample is outside the current hyper-sphere.

Further, the expression for calculating the absolute distance between each user sample in the measurement set and the center of each hypersphere is as follows:

in the above formula, α _i And alpha _j Are all Lagrange multipliers, x _i 、x _j Respectively two different samples, K (X), in the input data set X _i ,x _j )＝<Φ(x _i ),Φ(x _j )>Is a kernel function, and z is the user sample in the measured set.

Further, the step of calculating k user samples closest to the user sample x in the target user samples comprises the following steps:

when the user sample x is in at least two hyper-spheres, extracting user samples of all training sets from each hyper-sphere where the user sample x is located to serve as target user samples, and when the user sample x is out of all hyper-spheres, extracting the training set user samples in all hyper-spheres to serve as target user samples;

calculating the distance between the user sample x and the extracted user sample of each training set;

and acquiring k user samples closest to the user sample x in the extracted user samples of the training set from small to large distances.

Further, the expression of the distance between the user sample x and the extracted user sample of each training set is calculated as follows:

in the above formula, p _i ，q _i The dimension i is the dimension i of the user sample x and the dimension h is the dimension h of the user sample of the training set.

Compared with the prior art, the invention has the advantages that:

the SVDD algorithm is applied to the field of identification of the user variation relationship, the position relationship between a hypersphere of the SVDD model and an actually measured concentrated user sample established based on user voltage data is calculated by establishing a station area SVDD model, the membership relationship of a user station area is judged for the first time, users with difficulty in determining the user variation relationship in the boundary of the adjacent station areas of the actual distribution station area are considered, a KNN algorithm is introduced, k user samples which are most adjacent to the user sample in the SVDD hypersphere-in cross area are calculated, secondary judgment is carried out according to the station area to which the k user samples belong, the power supply ownership relationship between all power utilization customers and a station area power supply transformer is determined through secondary judgment, and the problem that the ownership of the users in the adjacent station area is difficult to determine is solved while the advantages of high SVDD classification accuracy and short training time are kept.

Drawings

FIG. 1 is a schematic diagram of the SVDD algorithm.

Fig. 2 is a schematic diagram of KNN algorithm principle.

FIG. 3 is a schematic diagram of the steps of the embodiment of the present invention.

FIG. 4 is a diagram illustrating a relationship between a user sample and a position of a hyper-sphere.

FIG. 5 is a detailed flow chart of an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and the specific preferred embodiments, without thereby limiting the scope of protection of the invention.

Before describing particular embodiments of the present invention in detail, it is necessary to state in advance the following regarding related concepts and preconditions in particular embodiments:

SVDD algorithm: the SVDD algorithm forms a hypersphere with a minimum volume, which can be all or most of the target samples, by training the target samples, and has two key parameters, namely, a center a and a radius R, as shown in fig. 1. The sample set to be tested is divided into three types, namely a retention vector, a boundary support vector and an error support vector, according to the position of the hyper-sphere, as shown in fig. 1, wherein all points on and inside the hyper-sphere are target samples, and all singular points outside the hyper-sphere are non-target samples.

KNN algorithm: the basic idea of the KNN algorithm is to measure the difference between a sample to be classified and its neighboring samples, usually using euclidean distance for calculation, and then using a voting classification decision mechanism to determine the classification of the sample to be classified according to the distribution of most points. As shown in fig. 2, the closer the circle center in the graph is, the closer the euclidean distance between the representative and the sample to be classified is, the most of the nearest neighbors of the sample to be classified can be observed to belong to the T2 distribution area, and then it can be determined that the sample to be classified belongs to the distribution area T2.

The embodiment provides a station area user change relationship identification method based on an improved decision mechanism, which comprises the steps of firstly, constructing a classification model by adopting an SVDD (singular value decomposition) algorithm, carrying out first time user change relationship identification on users to be classified, considering that users with user change relationship determination difficulty exist in the boundary of an adjacent station area of an actual distribution station area, enabling the first time user change relationship identification results of the users to be in an SVDD hypersphere cross area, and aiming at the users, carrying out second time user change relationship identification by adopting a KNN (K nearest neighbor) algorithm so as to solve the problem that the attribution of the users of the adjacent station area is difficult to determine.

According to the above concept, as shown in fig. 3 and 4, the method comprises the steps of:

step 1: generating a training set: acquiring voltage time sequence data of all users in n transformer areas to be identified, constructing a voltage time sequence matrix of each transformer area, and performing data dimension reduction and feature extraction on each voltage time sequence matrix to obtain a training set of each transformer area;

step 2: generating and training a classification model: constructing an SVDD model, and training the SVDD model by using the training set to obtain a hypersphere corresponding to each station area;

and step 3: and (3) actual measurement set generation: acquiring voltage time sequence data of a target user, constructing a voltage time sequence matrix of the target user, and performing data dimension reduction and feature extraction on the voltage time sequence matrix to obtain an actual measurement set;

and 4, step 4: and (3) identifying the user variation relationship: respectively calculating the position relationship between each user sample in the actual measurement set and each hyper-sphere, and as a result, as shown in fig. 5, if the user sample x is only in one hyper-sphere, the user sample x belongs to the station area corresponding to the hyper-sphere, if the user sample x is in at least two hyper-spheres or is not in any hyper-sphere, a KNN algorithm is introduced to obtain k target user samples nearest to the user sample x, the station area to which the most target user samples in the k target user samples belong is the station area to which the user sample x belongs, when the user sample x is in at least two hyper-spheres, the target user samples are training set user samples in the hyper-sphere in which the user sample x belongs, and when the user sample x is outside all hyper-spheres, the target user samples are training set user samples in all hyper-spheres.

In step 1 of this embodiment, the voltage time series data is daily voltage data of all users in the station area on the same date acquired by the smart meter and the smart measurement terminal, and the voltage time series matrix of each station area is U _l Represents, U _l ∈R ^N ^×M N represents the number of samples, that is, the total number of users in the station area, M represents the number of voltage data points collected by the smart meter in a preset time period, l =1,2.. N, N is the total number of the station areas to be identified, and taking a certain station area to be identified as an example, the voltage time sequence matrix can be represented as follows:

in the above formula, u _i,j The voltage measurement value of the ith user at the jth moment in the voltage time series matrix is represented by i =1,2, \ 8230;, N, j =1,2, \ 8230;, d, N is the total number of users in the region, d is the number of voltage data points collected by the daily electricity meter, and generally, the voltage data collection interval is selected to be 15min, d =96.

Correspondingly, in step 3 of this embodiment, the voltage time-series data is also the daily voltage data of the same date obtained from the smart meter and the smart measurement terminal, and there may be a plurality of target users, so that the voltage time-series matrix X ∈ R of the target user ^N’×M N' represents the number of target users, M represents the number of voltage data points collected by the smart meter in a preset time period, that is, the voltage time sequence matrix X of the target users can be represented as:

in the above formula, the first and second carbon atoms are,

the voltage measurement value of the ith user at the jth moment in the voltage time sequence matrix is shown, wherein i =1,2, \8230, N ', j =1,2, \8230, d, N' is the total number of target users, and d is the number of voltage data points collected by the daily electricity meter.

It can be seen that the voltage time series matrix of each station and the voltage time series matrix of the target user are both composed of user samples (i.e., rows in the matrix), and the dimensions of the user samples are the same.

In step 1 and step 3 of this embodiment, performing data dimension reduction and feature extraction is implemented by using a kernel principal component analysis method, which specifically includes the following steps:

and (3) data dimension reduction: in this embodiment, a z-score normalization formula is used for normalization, and the expression is as follows:

in the above formula, x _i ^j The measured value of the ith user in the voltage time sequence matrix at the jth moment,

and

the average value and the standard deviation of all measurement point data of the ith user in the voltage time sequence matrix are obtained, and n is the time quantity of a preset time period;

feature extraction: mapping a low-dimensional data sample set to a high-dimensional space by using a Gaussian kernel function phi (X) to construct a feature matrix, then extracting principal component feature vectors of the feature matrix to generate a new feature data matrix, and specifically, the method comprises the following steps:

firstly, mapping a low-dimensional data sample set to a high-dimensional space F by using a Gaussian kernel function phi (X) to construct a feature space, and calculating a covariance matrix of the feature space F as follows:

in the above formula, X _j J =1,2, \ 8230indicating the j-th sample in the low-dimensional data sample set, and N indicating the number of samples in the low-dimensional data sample set, namely the number of user samples;

then, performing eigenvalue decomposition on the characteristic space F covariance matrix to obtain:

in the above formula, λ is covariance matrix C ^F V is a covariance matrix C ^F The feature vector of (2);

further, the feature vector V may be derived from the sample X _j The mapping is as follows:

then introducing a kernel function matrix K epsilon R ^N×N And [ K ] is] _ij ＝K _ij ＝<Φ(X _i ),Φ(X _j )>Wherein X is _i And X _j For two different samples in the input space, N represents the number of samples in the low-dimensional data sample set, and formula (6) and the kernel function matrix are arranged by substituting formula (5):

and finally, extracting principal component feature vectors of the feature matrix, and generating a new feature data matrix as a training set or an actual measurement set:

in the above formula, p is the total number of kernel principal components, phi (X) is a Gaussian kernel function, and X _i Is a sample X _i A subset of _i For lagrange multipliers, N represents the number of samples in the low dimensional data sample set.

In step 2 of this embodiment, the SVDD model expression is as follows:

in the above formula, xi _i Is a relaxation variable, C is a weight parameter, N is the number of samples, x _i For training user samples in the set, aAnd R is the center and radius of the hyper-sphere, respectively.

In order to adapt to the characteristic data matrix of the training set or the actual measurement set, the embodiment further includes a step of model transformation after the SVDD model is constructed, which specifically includes:

the constraint conditions are as follows:

in the above formula, α _i And alpha _j Are all lagrange multipliers, x _i 、x _j For two different samples in the input data set X, K (X) _i ,x _j )＝<Φ(x _i ),Φ(x _j )>Is a kernel function and n is the total number of samples in the input data set X.

Therefore, by using the characteristic data matrix of the training set as an input and solving the optimization problem according to the equations (10) and (11), the hyper-sphere center and radius of the table area corresponding to the training set can be obtained. The method for solving the optimization problem is a method commonly used by those skilled in the art, the solution does not involve the improvement of the specific solving process, and the specific solving process is not the key point discussed in the solution, and is not repeated here.

In step 4 of this embodiment, the position relationship between each user sample and each hypersphere in the actual measurement set is calculated as: calculating the absolute distance between each user sample in the actual measurement set and the center of each hyper-sphere, if the absolute distance R between the current user sample and the center of the current hyper-sphere is smaller than the radius R of the current hyper-sphere, the current user sample is inside the current hyper-sphere, otherwise, the current user sample is outside the current hyper-sphere, and calculating the absolute distance between each user sample in the actual measurement set and the center of each hyper-sphere according to the following expression:

in the above formula, α _i And alpha _j Are all lagrange multipliers, x _i 、x _j Two different samples, K (X), in the input data set X (i.e., the training set of the previous paragraph) are respectively _i ,x _j )＝<Φ(x _i ),Φ(x _j )>Is a kernel function, and z is the user sample in the measured set.

As shown in fig. 5, there are two cases in the identification process of step 4 in this embodiment:

if the sample to be recognized is in one hyper-sphere and only in one hyper-sphere, the judgment can be directly made, and the recognition is finished;

if the sample to be identified is outside all the hyper-spheres or within 2 or more hyper-spheres, a secondary identification is required.

When performing the secondary identification, the obtaining k target user samples nearest to the user sample x includes the following steps:

step 4.1: extracting user samples of all training sets as target user samples, extracting the user samples of all training sets from each hyper-sphere where the user sample x is located when the user sample x is in at least two hyper-spheres, and extracting the user samples of the training sets in all hyper-spheres when the user sample x is outside all hyper-spheres;

and 4.2: calculating the distance between the user sample x and the extracted user sample of each training set, wherein the expression is as follows:

in the above formula, p _i ，q _i Respectively representing the ith dimension of the user sample x and the user sample of the training set for distance calculation, and h representing the dimensions of the user sample x and the user sample of the training set for distance calculation;

step 4.3: and acquiring k user samples closest to the user sample x in the extracted user samples of the training set from small to large distances.

And after k target user samples which are most adjacent to the user sample x are obtained, determining the category of the user sample x to be classified according to the distribution of most points by adopting a voting classification decision mechanism. Taking adjacent platform areas T1 and T2 as an example, as shown in fig. 2, the closer the circle center in the graph represents the closer the euclidean distance to the user sample x, it can be observed that most of the nearest neighbors of the user sample x belong to the T2 platform area, and it can be determined that the user sample x belongs to the platform area T2.

In summary, in the embodiment, the user variation relationship is preliminarily determined by comparing the position relationship between the sample to be recognized and the SVDD hypersphere, and further verification is performed by using KNN in consideration of the fact that a hypersphere intersection area exists in the actual distribution network environment or a user sample outside the description boundary. The method has the advantages of high classification precision and short training time of the SVDD classifier, and effectively solves the problem that the user attribution of adjacent cell areas is difficult to determine.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention shall fall within the protection scope of the technical solution of the present invention, unless the technical essence of the present invention departs from the content of the technical solution of the present invention.

Claims

1. A transformer area user change relation identification method based on an improved decision mechanism is characterized by comprising the following steps:

acquiring voltage time sequence data of all users in each transformer area, constructing a voltage time sequence matrix of each transformer area, and performing data dimension reduction and feature extraction on each voltage time sequence matrix to obtain a training set of each transformer area;

respectively calculating the position relation between each user sample and each hyper-sphere in the actual measurement set, if the user sample x is only in one hyper-sphere, the user sample x belongs to the platform area corresponding to the hyper-sphere, if the user sample x is in at least two hyper-spheres or outside all hyper-spheres, k target user samples which are most adjacent to the user sample x are obtained, and the platform area to which the most target user samples in the k target user samples belong is the platform area to which the user sample x belongs.

2. The method for identifying station area diversity relations based on an improved decision-making mechanism as claimed in claim, wherein the step of performing data dimension reduction and feature extraction specifically comprises:

and mapping the low-dimensional data sample set to a high-dimensional space by using a Gaussian kernel function phi (X) to construct a characteristic matrix, and then extracting principal element characteristic vectors of the characteristic matrix to generate a new characteristic data matrix.

3. The improved decision mechanism-based station area diversity relation identification method according to claim 2, wherein the expression of the normalization process is as follows:

in the above formula, the first and second carbon atoms are,

the measured value of the ith node user in the voltage time sequence matrix at the jth moment,

and with

Average value and standard of all measurement point data of ith node user in voltage time sequence matrixAnd n is the number of moments in the preset time period.

4. The method for identifying station area subscriber relationship based on improved decision-making mechanism as claimed in claim 2, wherein the expression of the principal component feature vector is as follows:

in the above formula, p is the total number of kernel principal elements, Φ (X) is a Gaussian kernel function, X _i Is a sample X _i A subset of _i Is Lagrange multiplier, and N is the total number of users in the platform area.

5. The improved decision mechanism-based station area diversity relation identification method according to claim 1, wherein the SVDD model expression is as follows:

6. The method for identifying station area diversity relations based on an improved decision-making mechanism as claimed in claim 5, wherein the step of model transformation is further included after the SVDD model is constructed, and the method specifically includes:

the constraint conditions are as follows:

in the above formula, α _i And alpha _j Are all lagrange multipliers, x _i 、x _j Respectively two different samples, K (X), in the input data set X _i ,x _j )＝<Φ(x _i ),Φ(x _j )>Is a kernel function and n is the number of regions.

7. The method for identifying the transformer substation area diversity relationship based on the improved decision-making mechanism as claimed in claim 1, wherein the position relationship between each user sample and each hypersphere in the measured set is calculated as follows: and calculating the absolute distance between each user sample in the actual measurement set and the center of each hyper-sphere, wherein if the absolute distance R between the current user sample and the center of the current hyper-sphere is smaller than the radius R of the current hyper-sphere, the current user sample is inside the current hyper-sphere, and otherwise, the current user sample is outside the current hyper-sphere.

8. The method for identifying station area diversity relationship based on improved decision mechanism as claimed in claim 7, wherein the expression for calculating the absolute distance between each user sample in the measured set and the center of each hypersphere is as follows:

r ² ＝1-2∑ _i a _i ^* K(x _i ，z)+∑ _ij a _i ^* ,a _j ^* K(x _i ,x _j )

9. The improved decision mechanism-based station area diversity relation identification method according to claim 1, wherein the step of obtaining k target user samples nearest to the user sample x comprises the steps of:

when the user sample x is in at least two hyper-spheres, extracting user samples of all training sets from each hyper-sphere in which the user sample x is positioned as target user samples, and when the user sample x is outside all hyper-spheres, extracting the training set user samples in all hyper-spheres as the target user samples;

10. The improved decision mechanism-based station area diversity relation recognition method according to claim 9, wherein the distance expression between the user sample x and the extracted user sample of each training set is calculated as follows: