CN117452063A - Semi-supervised electricity stealing time positioning method - Google Patents

Semi-supervised electricity stealing time positioning method Download PDF

Info

Publication number
CN117452063A
CN117452063A CN202311388066.8A CN202311388066A CN117452063A CN 117452063 A CN117452063 A CN 117452063A CN 202311388066 A CN202311388066 A CN 202311388066A CN 117452063 A CN117452063 A CN 117452063A
Authority
CN
China
Prior art keywords
electricity
daily load
data
curve
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311388066.8A
Other languages
Chinese (zh)
Inventor
陈静
王铭海
赵睿
江灏
缪希仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202311388066.8A priority Critical patent/CN117452063A/en
Publication of CN117452063A publication Critical patent/CN117452063A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R22/00Arrangements for measuring time integral of electric power or current, e.g. electricity meters
    • G01R22/06Arrangements for measuring time integral of electric power or current, e.g. electricity meters by electronic methods
    • G01R22/061Details of electronic electricity meters
    • G01R22/066Arrangements for avoiding or indicating fraudulent use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a semi-supervised electricity stealing time positioning method, which comprises the following steps: step 1: processing historical electricity utilization curve data of a user; step 2: analyzing the electricity stealing principle, establishing an electricity stealing mathematical model, and constructing an electricity stealing simulation sample; step 3: establishing a transducer model as a reconstruction model of a daily load curve, and constructing a residual curve; step 4: and taking a reconstructed residual curve of a normal daily load curve in the training set as input of the OCSVM model, mapping the reconstructed residual curve from an input space to a high-dimensional feature space, taking an origin of the high-dimensional feature space as a negative class of a sample, determining a hyperplane farthest from the origin, and constructing an optimal classification hyperplane. By the technical scheme, various electricity stealing modes can be effectively identified, the electricity stealing time can be manually researched and judged, and misjudgment caused by missing judgment is reduced.

Description

Semi-supervised electricity stealing time positioning method
Technical Field
The invention relates to the technical field of electricity stealing time positioning, in particular to a semi-supervised electricity stealing time positioning method.
Background
Losses in electrical power systems are generally divided into technical losses and non-technical losses, and the theft behaviour of power consumers is a major cause of non-technical losses in the grid. And the electricity stealing users can reduce the payment of electricity fees by changing the readings of the electricity meters, and seriously reduce the economic benefits of power enterprises and power departments. The current electricity larceny detection method can only identify electricity larceny users, and does not position the specific time when the detected electricity larceny actions of the electricity larceny users occur.
Efficient recovery of the amount of electricity stolen is the ultimate goal of electricity theft detection. Most of the current provincial and urban power supply offices basically estimate the stolen electric quantity according to the requirement of the power supply business rule, namely, the capacity indicated by the calibration current value of the charging electric energy meter is calculated and determined by multiplying the actual stolen time. The electricity stealing time is an important component for electricity stealing amount estimation and is a precondition for accurately estimating the electricity stealing amount. The electricity larceny time of manual research and judgment has certain subjectivity, has the problems of misjudgment and omission judgment, and is difficult to deal with massive power users.
The electricity stealing time is an important component for electricity stealing amount estimation, and the accurate positioning of the electricity stealing time is a precondition for accurately estimating the electricity stealing amount and is also a precondition for economic benefit loss caused by electricity stealing by power enterprises and departments. At present, in actual work, the judgment is mainly performed manually according to sudden drop of electricity consumption, but because the electricity consumption rules of the power users are different, the electricity consumption curves are non-stable time curves, and the number of abrupt points is large; and various electricity stealing methods are endless, and the phenomenon of little calculation and calculation missing of electricity stealing time exists. In the existing research on electricity stealing time positioning, as the normal electricity utilization record of a user is easy to acquire, the tag of the specific electricity stealing time is difficult to acquire, and the time positioning cannot be performed through a supervised classification algorithm; the long-term prediction of the prediction algorithm can cause overlarge errors, and the rolling prediction can introduce electricity stealing data into the data, so that erroneous judgment is caused; the clustering algorithm fails to consider the intrinsic electricity utilization rule of the user, and the normal electricity quantity descending behavior is easily misjudged as electricity stealing.
Disclosure of Invention
Therefore, the invention aims to provide a semi-supervised electricity stealing time positioning method which can effectively identify various electricity stealing modes, can assist in manually judging the electricity stealing time and reduce misjudgment of missed judgment.
In order to achieve the above purpose, the invention adopts the following technical scheme: a semi-supervised electricity stealing time positioning method comprises the following steps:
step 1: processing historical electricity utilization curve data of a user;
step 2: analyzing an electricity stealing principle, establishing various electricity stealing mathematical models which are attached to actual electricity stealing conditions, and constructing electricity stealing data by randomly selecting dates in a test set, wherein the ratio of normal data to abnormal data in the test set is 1:1;
step 3: establishing a transducer model as a reconstruction model of a daily load curve, and constructing a residual curve; the architecture of the transducer model is divided into 3 modules: the device comprises an input module, an encoding-decoding module and an output module;
step 4: and taking a reconstructed residual curve of a normal daily load curve in the training set as input of the OCSVM model, mapping the reconstructed residual curve from an input space to a high-dimensional feature space, taking an origin of the high-dimensional feature space as a negative class of a sample, and determining a hyperplane farthest from the origin, so that the reconstructed residual curves of the normal daily load curve are opposite to the origin, thereby constructing an optimal classified hyperplane.
In a preferred embodiment, the step 1 specifically includes: firstly, cutting power consumption data of a user into daily load curves, and deleting daily load curves with missing values exceeding 20%; then removing abnormal values by adopting a three-sigma rule according to a formula (1), and carrying out linear interpolation on null values according to a formula (2); normalization is carried out according to a formula (3), and the maximum and minimum values in the training set can be used for reserving the magnitude difference characteristics between data every day so as to prevent data leakage; finally, dividing the data set into a training set and a testing set according to a ratio of 7:3;
wherein X represents a daily load curve, X i Representing the load value at each point, avg (x) representing the average value of the daily load curve, std (x) representing the variance of the load curve;
wherein if x i Is a null value, it is expressed as x i ∈NaN
Wherein: x is x i Raw data of the power consumption of the user in hours, x' i The normalized user hour electricity consumption data; x is the training set; max (X) is the maximum value in the training set; min (X) is the minimum value in the training set.
In a preferred embodiment, the step 2 specifically includes: the electric charge paid by one electric power consumer is as follows:
wherein S is the electricity charge of the user, p t R is the electrical energy consumed by the user in the t time period t The electricity price at the time t;
in order to reduce the electric charge S, the user damages the intelligent ammeter by illegal operation, thereby affecting p t And r t The method comprises the steps of carrying out a first treatment on the surface of the According to the existing method for stealing electricity and the reference, the following 6 electricity stealing modes can be simulated:
(1) Decreasing the daily load curve at time t according to a fixed proportion alpha:
h 1 (x t )=αx t ,α∈(0.2,0.8) (5)
(2) Changing the daily load curve at time t according to the random threshold gamma, x t When the value is larger than gamma, the value is fixed as gamma:
(3) Will be random time period (t 1 ,t 2 ) Setting the electric quantity in the battery to zero:
(4) The daily load curve at the moment t is cut down according to a random threshold gamma, and non-negative is taken:
h 4 (x t )=max{x t -γ,0} (8)
(5) Because of the real-time of the time-sharing peak-valley electricity price, the electricity consumption peak of the power user can be placed at the moment of low electricity price by reversing the load curve, so that the electricity fee is reduced:
h 5 (x t )=x 96-t (9)
(6) Taking the average value of daily load curves, and reducing the electric quantity of electricity consumption peak at the moment of low electricity price so as to reduce the electricity charge
h 6 (x t )=mean(X) (10)。
In a preferred embodiment, the step 3 specifically includes: daily load curve data input into the model are converted into vectors with the same length and higher dimension through embedding layer ebadd; unlike LSTM and RNN, the Transformer needs to add a position code to the input embedded vector in order to make the model use the time series characteristics of the daily load curve; the input embedding and the position coding have the same dimension, and the input embedding and the position coding are summed, so that the model coding-decoding module is beneficial to extracting and reconstructing normal electricity utilization characteristics of a user; the coding formula is as follows:
where pos is the position of the daily load curve in the time series; i represents the sequence position in the daily load curve, and the dimension of the output vector generated by the embedding layer is d model =256;
The coding module and the decoding module are composed of N identical encoder and decoder stacks; each encoder has two sublayers; the first layer is a multi-head self-attention mechanism, and the second layer is a fully-connected feedforward neural network; residual connection is used around each sub-layer, and then layer normalization is performed; each decoder is provided with three sublayers, and besides two sublayers in the encoder, the decoder is newly inserted into one sublayer, so that a mask multi-head attention mechanism can be carried out on output embedding, information leakage can be prevented, and the output embedding is carried out and shifted, and the position i can only depend on the known output smaller than the position i; the multi-head attention mechanism enables the model to pay attention to different parts in the input sequence X at the same time and has global perception capability of a time sequence, so that the expression capability of the model is improved, and local and global information in the photovoltaic power data sequence is better processed;
multi-head attention mechanism: the multi-head attention mechanism consists of a self-attention layer, a splicing layer and a linear transformation layer; the self-attention layer comprises a dot product module and a Softmax function module; the multi-head attention mechanism excavates the dependency relationship of the load data from different angles by integrating a plurality of self-attention networks with different parameters, and compared with the traditional attention mechanism, the multi-head attention mechanism can more accurately represent the data characteristics; user daily load sequence x= (X) 1 ,x 2 ,...,x i ) Inputting a plurality of self-attention layers, and calculating an attention value matrix of each layer; initializing three different linear projection matrices for the ith self-attention layer Input load sample X maps to query Q i Bond K i And value V i Matrix, then calculating the result of each layer; the calculation process is shown as a formula (13);
wherein H is i Output results of the ith self-attention layer; d is the dimension of three linear projection matrices; the softmax () function is used to limit the matrix value range;
then splicing the calculation results of the self-attention layers; finally, the splicing result is subjected to linear change, and an attention value matrix H is output, wherein the attention value matrix H is shown in a formula (14):
H=Concat(H 1 ,H 2 ,...,H i )W o (14)
in which Concat () is a splicing function, W o Is a weight matrix;
the output module is used for mapping the output of the data passing through the encoding-decoding module to a proper dimension and softmax function through a Linear layer Linear to generate corresponding probability distribution, and obtaining the data with the highest probability as a reconstruction load curve;
when the reconstruction errors of the single power user load curve trained by the transducer model tend to be stable, the daily load curves of the training set and the testing set are respectively reconstructed, and the reconstruction curves and the original daily load curves are subjected to difference to obtain corresponding reconstruction residual error curves.
In a preferred embodiment, the step 4 specifically includes: assume the training set is Is a mapping function that maps samples into a high-dimensional space, whose problem solving is equivalent to:
wherein: omega is a weight vector; ρ is the bias; zeta type toy i Is a relaxation variable that prevents overfitting from introducing; v is punishment super-parameters, and n is the number of training set samples;
by solving the above problems, the optimal hyperplane parameters are obtained and recorded asAnd->The optimal hyperplane isFor a test set z, the decision function f (z) is expressed as:
if f (z) is greater than 0, then judging that z is a normal sample, and if f (z) is less than 0, then judging that z is an abnormal sample; and inputting the residual curve in the test set into the trained OCSVM model, and obtaining a classification result of the corresponding date so as to judge whether the user steals electricity or not in the day.
Compared with the prior art, the invention has the following beneficial effects:
the method realizes that the model of the electricity stealing time positioning can be trained only by a positive sample in a training set by using a transducer-OCSVM algorithm, and is a semi-supervised algorithm; the third prior art is to add simulated electricity larceny data into training data, and the training of the model requires positive samples and negative samples, which is a supervised algorithm.
The data of the power-stealing-free time tag in the training set is more suitable for the actual power-stealing-prevention work requirement. Because the electric energy has the characteristics of instantaneous passage, whether the electric energy is the electricity stealing user is easy to distinguish, but the electricity stealing period of the electricity stealing user is difficult to distinguish. In practice there is a lack of specific power theft time tags for power theft users.
In the method, each electricity stealing user is independently modeled for analysis, so that individual electricity utilization rules of each user can be grasped, and further the electricity stealing date can be positioned; the third model is to build a model for a plurality of users, the common electricity utilization rule of the users is grasped, each user has a unique electricity utilization rule due to the differences of local policies, time-of-use electricity prices, industries and the like, the third model is difficult to adapt to the situation, and the electricity is easily misjudged as electricity stealing due to the normal electricity utilization.
In the method, the daily load curve of each user is reconstructed, daily load data of the user are analyzed more finely, and misjudgment and missed judgment of electricity stealing time can be effectively reduced; the third model is to reconstruct a long-term load curve, and concerns about the overall trend of the curve rather than the abnormality of each part, so that the time leakage judgment on electricity theft is easy to cause.
The method converts the problem of electricity stealing time positioning into the problem of classifying electricity stealing dates, and the problem of classifying daily load curves is characterized by judging the time, so that the effect of a model can be comprehensively evaluated in an omnibearing way by using a confusion matrix; the third is still to judge the timestamp and construct the index de-evaluation algorithm itself.
Drawings
FIG. 1 is a flow chart of a semi-supervised electricity theft time positioning method according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of a method for processing power consumer data according to a preferred embodiment of the present invention;
FIG. 3 is a transducer schematic diagram of a preferred embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Referring to fig. 1-3, the method for positioning the electricity stealing time based on semi-supervision can judge the electricity stealing time of the user only by learning historical normal electricity load data of the user. The technical scheme flow chart of the application proposal is shown in figure 1. Firstly, processing historical electricity consumption data of a user; secondly, analyzing the electricity stealing principle, establishing a plurality of electricity stealing mathematical models, only establishing electricity stealing simulation samples in a test set, and only using normal samples in a training set; and then, extracting intrinsic law characteristics of normal electricity consumption of a user by using a transducer model, constructing a residual curve based on original data and reconstruction data, and finally classifying the residual curve by using an OCSVM model to judge the electricity stealing date.
Step 1: the historical electricity usage profile data of the user is processed, as shown in particular in fig. 2.
Firstly, cutting power consumption data of a user into daily load curves, and deleting daily load curves with missing values exceeding 20%; then removing abnormal values by adopting a three-sigma rule according to a formula (1), and carrying out linear interpolation on null values according to a formula (2); normalization is carried out according to a formula (3), and the maximum and minimum values in the training set can be used for reserving the magnitude difference characteristics between data every day so as to prevent data leakage; finally, the data set is divided into a training set and a testing set according to the ratio of 7:3.
Wherein X represents a daily load curve, X i Table of avg (x) representing load value of each pointThe average value of the daily load curve is shown, std (x) represents the variance of the load curve.
Wherein if x i Is a null value, it is expressed as x i ∈NaN
Wherein: x is x i Raw data of the power consumption of the user in hours, x i ' is normalized user hour electricity consumption data; x is the training set; max (X) is the maximum value in the training set; min (X) is the minimum value in the training set.
Step 2: and (3) analyzing an electricity stealing principle, establishing various electricity stealing mathematical models which are attached to actual electricity stealing conditions, and constructing electricity stealing data by randomly selecting dates in a test set, wherein the ratio of normal data to abnormal data in the test set is 1:1. The electric charge paid by one electric power consumer is as follows:
wherein S is the electricity charge of the user, p t R is the electrical energy consumed by the user in the t time period t The electricity price at time t.
In order to reduce the electric charge S, the user can destroy the intelligent ammeter by illegal operation, thereby affecting p t And r t . According to the existing method for stealing electricity and the reference, the following 6 electricity stealing modes can be simulated:
decreasing the daily load curve at time t according to a fixed proportion alpha:
h 1 (x t )=αx t ,α∈(0.2,0.8) (5)
changing the daily load curve at time t according to the random threshold gamma, x t When the value is larger than gamma, the value is fixed as gamma:
will be random time period (t 1 ,t 2 ) Setting the electric quantity in the battery to zero:
the daily load curve at the moment t is cut down according to a random threshold gamma, and non-negative is taken:
h 4 (x t )=max{x t -γ,0} (8)
5) Because of the real-time of the time-sharing peak-valley electricity price, the electricity consumption peak of the power user can be placed at the moment of low electricity price by reversing the load curve, so that the electricity fee is reduced:
h 5 (x t )=x 96-t (9)
6) Taking the average value of daily load curves, and reducing the electric quantity of electricity consumption peak at the moment of low electricity price so as to reduce the electricity charge
h 6 (x t )=mean(X) (10)
Step 3: and (3) establishing a transducer model as a reconstruction model of the daily load curve, and constructing a residual curve. The transducer is a sequence-to-sequence model architecture proposed in 2017, abandons the traditional convolution and recurrent neural network, completely relies on attention mechanisms to draw global dependency relations between input and output, automatically captures the relations of different positions of an input sequence, and has the advantages of simple structure, high model interpretability and high parallel speed. The overall architecture of the transducer model as shown in fig. 3 can be divided into 3 modules: an input module, an encoding-decoding module and an output module.
Daily load curve data input into the model is converted into vectors with the same length and higher dimension through an embedding layer (embedding). Unlike LSTM and RNN, the Transformer needs to add a position code to the input embedded vector in order to make the model use of the time series characteristics of the daily load curve. And the input embedding and the position coding have the same dimension, and the input embedding and the position coding are summed, so that the model coding-decoding module is beneficial to extracting and reconstructing normal power utilization characteristics of a user. The coding formula is as follows:
where pos is the position of the daily load curve in the time series; i represents the sequence position in the daily load curve, and the dimension of the output vector generated by the embedding layer is d model =256。
The encoding module and the decoding module are each composed of N identical encoder and decoder stacks. Each encoder has two sublayers. The first layer is a multi-headed self-care mechanism and the second layer is a fully connected feedforward neural network. Residual connections are used around each sub-layer, followed by layer normalization. Each decoder has three sublayers, and besides two sublayers in the encoder, the decoder is newly inserted into one sublayer, so that a mask multi-head attention mechanism can be performed on output embedding, information leakage can be prevented, and the output embedding is performed and shifted, and the position i can only depend on a known output smaller than the position i. The multi-head attention mechanism enables the model to pay attention to different parts in the input sequence X at the same time and has global perceptibility of the time sequence, so that the expression capacity of the model is improved, and local and global information in the photovoltaic power data sequence is processed better.
Multi-head attention mechanism: the multi-head attention mechanism consists of a self-attention layer, a splicing layer and a linear transformation layer. Wherein the self-attention layer comprises a dot product module and a Softmax function module. The multi-head attention mechanism digs the dependency relationship of the load data from different angles by integrating a plurality of self-attention networks with different parameters, and can represent the data characteristics more accurately than the traditional attention mechanism. User daily load sequence x= (X) 1 ,x 2 ,...,x i ) A matrix of attention values for each of a plurality of self-attention layers is calculated by inputting the attention value into the layers. Initializing three different linear projection matrices for the ith self-attention layer Input load sample X maps to query Q i Bond K i And value V i Matrix, and then calculate the results for each layer. The calculation process is shown in formula (13).
Wherein H is i Output results of the ith self-attention layer; d is the dimension of the three linear projection matrices. The softmax () function is used to limit the range of matrix values.
The calculation results of the plurality of self-attention layers are then spliced. Finally, the splicing result is subjected to linear change, and an attention value matrix H is output, wherein the attention value matrix H is shown in a formula (14):
H=Concat(H 1 ,H 2 ,...,H i )W o (14)
in which Concat () is a splicing function, W o Is a weight matrix.
And the output module is used for mapping the output of the data passing through the encoding-decoding module to a proper dimension and softmax function through a Linear layer (Linear) to generate corresponding probability distribution, and obtaining the data with the highest probability as a reconstruction load curve.
When the reconstruction errors of the single power user load curve trained by the transducer model tend to be stable, the daily load curves of the training set and the testing set are respectively reconstructed, and the reconstruction curves and the original daily load curves are subjected to difference to obtain corresponding reconstruction residual error curves.
Step 4: and taking a reconstructed residual curve of a normal daily load curve in the training set as input of an OCSVM (one-class support vectormachine, OCSVM) model, mapping the reconstructed residual curve from an input space to a high-dimensional characteristic space, taking an origin of the high-dimensional characteristic space as a sample negative class, and determining a hyperplane farthest from the origin, so that the reconstructed residual curves of the normal daily load curve are opposite to the origin, thereby constructing an optimal classified hyperplane.
Assume the training set is Is a mapping function that maps samples into a high-dimensional space, whose problem solving is equivalent to:
wherein: omega is a weight vector; ρ is the bias; zeta type toy i Is a relaxation variable that prevents overfitting from introducing; and v is punishment super-parameters, and n is the number of training set samples.
By solving the above problems, the optimal hyperplane parameters can be obtained, noted asAnd->The optimal hyperplane isFor a test set z, the decision function f (z) is expressed as:
if f (z) is greater than 0, it is determined that z is a normal sample, and if f (z) is less than 0, it is determined that z is an abnormal sample.
And inputting the residual curve in the test set into the trained OCSVM model, and obtaining a classification result of the corresponding date so as to judge whether the user steals electricity or not in the day.

Claims (5)

1. A semi-supervised electricity theft time locating method, comprising the steps of:
step 1: processing historical electricity utilization curve data of a user;
step 2: analyzing an electricity stealing principle, establishing various electricity stealing mathematical models which are attached to actual electricity stealing conditions, and constructing electricity stealing data by randomly selecting dates in a test set, wherein the ratio of normal data to abnormal data in the test set is 1:1;
step 3: establishing a transducer model as a reconstruction model of a daily load curve, and constructing a residual curve; the architecture of the transducer model is divided into 3 modules: the device comprises an input module, an encoding-decoding module and an output module;
step 4: and taking a reconstructed residual curve of a normal daily load curve in the training set as input of the OCSVM model, mapping the reconstructed residual curve from an input space to a high-dimensional feature space, taking an origin of the high-dimensional feature space as a negative class of a sample, and determining a hyperplane farthest from the origin, so that the reconstructed residual curves of the normal daily load curve are opposite to the origin, thereby constructing an optimal classified hyperplane.
2. The semi-supervised electricity theft time localization method of claim 1, wherein the step 1 is specifically: firstly, cutting power consumption data of a user into daily load curves, and deleting daily load curves with missing values exceeding 20%; then removing abnormal values by adopting a three-sigma rule according to a formula (1), and carrying out linear interpolation on null values according to a formula (2); normalization is carried out according to a formula (3), and the maximum and minimum values in the training set can be used for reserving the magnitude difference characteristics between data every day so as to prevent data leakage; finally, dividing the data set into a training set and a testing set according to a ratio of 7:3;
wherein X represents a daily load curve, X i Representing the load value at each point, avg (x) representing the average value of the daily load curve, std (x) representing the variance of the load curve;
wherein if x i Is a null value, it is expressed as x i ∈NaN
Wherein: x is x i Raw data of the power consumption of the user in hours, x' i The normalized user hour electricity consumption data; x is the training set; max (X) is the maximum value in the training set; min (X) is the minimum value in the training set.
3. The semi-supervised electricity theft time localization method according to claim 1, wherein the step 2 is specifically: the electric charge paid by one electric power consumer is as follows:
wherein S is the electricity charge of the user, p t R is the electrical energy consumed by the user in the t time period t The electricity price at the time t;
in order to reduce the electric charge S, the user damages the intelligent ammeter by illegal operation, thereby affecting p t And r t The method comprises the steps of carrying out a first treatment on the surface of the According to the existing method for stealing electricity and the reference, the following 6 electricity stealing modes can be simulated:
(1) Decreasing the daily load curve at time t according to a fixed proportion alpha:
h 1 (x t )=αx t ,α∈(0.2,0.8) (5)
(2) According to the random thresholdChange of value gamma to daily load curve at time t, x t When the value is larger than gamma, the value is fixed as gamma:
(3) Will be random time period (t 1 ,t 2 ) Setting the electric quantity in the battery to zero:
(4) The daily load curve at the moment t is cut down according to a random threshold gamma, and non-negative is taken:
h 4 (x t )=max{x t -γ,0} (8)
(5) Because of the real-time of the time-sharing peak-valley electricity price, the electricity consumption peak of the power user can be placed at the moment of low electricity price by reversing the load curve, so that the electricity fee is reduced:
h 5 (x t )=x 96-t (9)
(6) Taking the average value of daily load curves, and reducing the electric quantity of electricity consumption peak at the moment of low electricity price so as to reduce the electricity charge
h 6 (x t )=mean(X) (10)。
4. The semi-supervised electricity theft time localization method according to claim 1, wherein the step 3 is specifically: daily load curve data input into the model are converted into vectors with the same length and higher dimension through embedding layer ebadd; unlike LSTM and RNN, the Transformer needs to add a position code to the input embedded vector in order to make the model use the time series characteristics of the daily load curve; the input embedding and the position coding have the same dimension, and the input embedding and the position coding are summed, so that the model coding-decoding module is beneficial to extracting and reconstructing normal electricity utilization characteristics of a user; the coding formula is as follows:
where pos is the position of the daily load curve in the time series; i represents the sequence position in the daily load curve, and the dimension of the output vector generated by the embedding layer is d model =256;
The coding module and the decoding module are composed of N identical encoder and decoder stacks; each encoder has two sublayers; the first layer is a multi-head self-attention mechanism, and the second layer is a fully-connected feedforward neural network; residual connection is used around each sub-layer, and then layer normalization is performed; each decoder is provided with three sublayers, and besides two sublayers in the encoder, the decoder is newly inserted into one sublayer, so that a mask multi-head attention mechanism can be carried out on output embedding, information leakage can be prevented, and the output embedding is carried out and shifted, and the position i can only depend on the known output smaller than the position i; the multi-head attention mechanism enables the model to pay attention to different parts in the input sequence X at the same time and has global perception capability of a time sequence, so that the expression capability of the model is improved, and local and global information in the photovoltaic power data sequence is better processed;
multi-head attention mechanism: the multi-head attention mechanism consists of a self-attention layer, a splicing layer and a linear transformation layer; the self-attention layer comprises a dot product module and a Softmax function module; the multi-head attention mechanism excavates the dependency relationship of the load data from different angles by integrating a plurality of self-attention networks with different parameters, and compared with the traditional attention mechanism, the multi-head attention mechanism can more accurately represent the data characteristics; user daily load sequence x= (X) 1 ,x 2 ,...,x i ) Inputting a plurality of self-attention layers, and calculating an attention value matrix of each layer; initializing three different linear projection matrices W for the ith self-attention layer i Q 、W i K 、W i V Input loadSample X maps to query Q i Bond K i And value V i Matrix, then calculating the result of each layer; the calculation process is shown as a formula (13);
wherein H is i Output results of the ith self-attention layer; d is the dimension of three linear projection matrices; the softmax () function is used to limit the matrix value range;
then splicing the calculation results of the self-attention layers; finally, the splicing result is subjected to linear change, and an attention value matrix H is output, wherein the attention value matrix H is shown in a formula (14):
H=Concat(H 1 ,H 2 ,...,H i )W o (14)
in which Concat () is a splicing function, W o Is a weight matrix;
the output module is used for mapping the output of the data passing through the encoding-decoding module to a proper dimension and softmax function through a Linear layer Linear to generate corresponding probability distribution, and obtaining the data with the highest probability as a reconstruction load curve;
when the reconstruction errors of the single power user load curve trained by the transducer model tend to be stable, the daily load curves of the training set and the testing set are respectively reconstructed, and the reconstruction curves and the original daily load curves are subjected to difference to obtain corresponding reconstruction residual error curves.
5. The semi-supervised electricity theft time localization method according to claim 1, wherein the step 4 is specifically: assume the training set isIs a mapping function that maps samples into a high-dimensional space, whose problem solving is equivalent to:
wherein: omega is a weight vector; ρ is the bias; zeta type toy i Is a relaxation variable that prevents overfitting from introducing; v is punishment super-parameters, and n is the number of training set samples;
by solving the above problems, the optimal hyperplane parameters are obtained and recorded asAnd->The optimal hyperplane isFor a test set z, the decision function f (z) is expressed as:
if f (z) is greater than 0, then judging that z is a normal sample, and if f (z) is less than 0, then judging that z is an abnormal sample; and inputting the residual curve in the test set into the trained OCSVM model, and obtaining a classification result of the corresponding date so as to judge whether the user steals electricity or not in the day.
CN202311388066.8A 2023-10-25 2023-10-25 Semi-supervised electricity stealing time positioning method Pending CN117452063A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311388066.8A CN117452063A (en) 2023-10-25 2023-10-25 Semi-supervised electricity stealing time positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311388066.8A CN117452063A (en) 2023-10-25 2023-10-25 Semi-supervised electricity stealing time positioning method

Publications (1)

Publication Number Publication Date
CN117452063A true CN117452063A (en) 2024-01-26

Family

ID=89595905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311388066.8A Pending CN117452063A (en) 2023-10-25 2023-10-25 Semi-supervised electricity stealing time positioning method

Country Status (1)

Country Link
CN (1) CN117452063A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194055A (en) * 2024-05-14 2024-06-14 国网江西省电力有限公司信息通信分公司 Charging pile power curve matching method
CN118194141A (en) * 2024-05-17 2024-06-14 国网安徽省电力有限公司营销服务中心 Power consumption behavior discriminating method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194055A (en) * 2024-05-14 2024-06-14 国网江西省电力有限公司信息通信分公司 Charging pile power curve matching method
CN118194141A (en) * 2024-05-17 2024-06-14 国网安徽省电力有限公司营销服务中心 Power consumption behavior discriminating method and system

Similar Documents

Publication Publication Date Title
Buzau et al. Hybrid deep neural networks for detection of non-technical losses in electricity smart meters
Zheng et al. Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids
Wu et al. An integrated ensemble learning model for imbalanced fault diagnostics and prognostics
Huang et al. Electricity theft detection based on stacked sparse denoising autoencoder
CN117452063A (en) Semi-supervised electricity stealing time positioning method
CN112098714A (en) ResNet-LSTM-based electricity stealing detection method and system
CN116679211B (en) Lithium battery health state prediction method
CN112308124B (en) Intelligent electricity larceny prevention method for electricity consumption information acquisition system
CN110675020A (en) High-price low-access user identification method based on big data
Zhao et al. Systemic financial risk prediction using least squares support vector machines
CN115329839A (en) Electricity stealing user identification and electricity stealing amount prediction method based on convolution self-encoder and improved regression algorithm
CN115618248A (en) Load abnormity identification method and device for public building, storage medium and equipment
CN116881639A (en) Electricity larceny data synthesis method based on generation countermeasure network
Wang et al. Evsense: A robust and scalable approach to non-intrusive ev charging detection
CN118152857A (en) Power consumption abnormality detection method and device and computer readable storage medium
CN117131022B (en) Heterogeneous data migration method of electric power information system
CN116500335B (en) Smart power grid electricity larceny detection method and system based on one-dimensional features and two-dimensional features
CN116776209A (en) Method, system, equipment and medium for identifying operation state of gateway metering device
CN117331017A (en) Method and system for studying and judging misconnection of three-phase four-wire electric energy meter
Malinowski et al. Using smart meters to learn water customer behavior
CN116543198A (en) Smart electric meter fault classification method based on multi-granularity neighbor graphs
CN115508765A (en) Online self-diagnosis method and system for voltage transformer acquisition device
Emadaleslami et al. A Machine Learning Approach to Detect Energy Fraud in Smart Distribution Network
CN114595952A (en) Electricity stealing behavior detection method based on attention network improved convolutional neural network
CN112256735B (en) Power consumption monitoring method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination