Disclosure of Invention
The invention provides a method for judging an electricity stealing user based on a support vector machine, which overcomes the defects of the prior art and can effectively solve the problems of large judgment error and low efficiency of the traditional method for judging the electricity stealing user.
The technical scheme of the invention is realized by the following measures: a method for judging electricity stealing users based on a support vector machine comprises the following steps:
the first step: acquiring customer month electric quantity W by utilizing historical data of electricity consumption information acquisition system ij Line loss rate eta of the region where the customer is located and meter cover opening event O n Programming event P n Lunar electric quantity ring ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij ;
And a second step of: according to the lunar electric quantity ring ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij Calculating respective average valuesThe standard deviation and the worst value, and 9 index variables are obtained and recorded as x 1 ,…x 9 ]The 9 index variables are respectively the generation month electric quantity ring ratio increase rate alpha ij Average, standard deviation and worst value of (c), month electric quantity and rate of increase beta ij Average value, standard deviation and worst value of (a), line loss rate eta of a station area where a client is located, and meter cover opening event O n And programming event P n ;
And a third step of: creating training sample set [ a ] i ,y i ]Where i=1, … n, a i =[x 1 ,…x 9 ],a i ∈R 9 ,y i =1 is a normal customer, y i = -1 is a power stealing client;
fourth step: for training sample set [ a ]
i ,y
i ]Normalizing, training the normalized training sample through a support vector machine model of the Gaussian kernel function, and obtaining a classification function
Fifth step: by classification function
Classifying samples of unknown classification results, if +.>
Then it is a normal user if->
Then it is the electricity stealing user.
The following are further optimizations and/or improvements to the above-described inventive solution:
in the fourth step, the training sample set [ a ] i ,y i ]The specific process of normalization is as follows:
(1) Calculate training sample set [ a i ,y i ]Mean vector μ= [ μ ] 1 ,…μ 9 ]Sum standard deviation vector sigma= [ sigma ] 1 ,…σ 9 ];
(2) The training samples were normalized by equation (1):
wherein i=1, … n; j=1, …;
(3) Deriving a row vector of the normalized training samples
The specific process of obtaining the classification function in the fourth step is as follows:
(1) Calculating a support vector s according to the Lagrangian dual function i (i.epsilon.I), weight coefficient ζ i (I ε I) and constant term b;
(2) And (3) establishing an optimal decision function according to the calculation result to obtain a classification function, wherein the classification function is shown in a formula (2):
wherein the method comprises the steps of
For Gaussian kernel function +.>
In the first step, according to the monthly electricity quantity W of the customer ij Generating a lunar electric quantity cyclic ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij Lunar electric quantity ring ratio increase rate alpha ij The generation formula is shown as formula (3), and the lunar electric quantity is equal to the growth rate beta ij The generation formula is shown as formula (4):
wherein i represents year and j represents month.
According to the invention, firstly, basic data is provided for subsequent sample training by calculating and analyzing historical data in the existing electricity information acquisition system, then a training sample set comprising at least 400 pieces of normal electricity information data and at least 400 pieces of electricity stealing information data is established, 9 quantization characteristics are established for each training sample according to the basic data, then the training samples are trained to obtain a classification function, namely, a given training sample set is used as an input space by using a support vector machine model of a Gaussian kernel function, then a real value function g (x) is searched in the space, so that a classification function f (x) =sgn (g (x)) is obtained, and then samples with unknown classification results are classified through the classification function. Therefore, the invention can train and acquire the classification function based on the support vector machine technology according to the basic data of the historical electricity consumption information, and establish the accurate classification standard, thereby accurately judging unclassified clients, reducing the on-site checking times of electricity consumption inspectors, hitting electricity stealing behaviors in a targeted way, reducing the loss of power supply enterprises and saving the cost of manpower, financial resources and material resources.
Detailed Description
The present invention is not limited by the following examples, and specific embodiments can be determined according to the technical scheme and practical situations of the present invention.
The invention is further described below with reference to examples and figures:
example 1: as shown in figure 1, the method for judging the electricity stealing user based on the support vector machine comprises the following steps:
the first step: acquiring customer month electric quantity W by utilizing historical data of electricity consumption information acquisition system ij Line loss rate eta of the region where the customer is located and meter cover opening event O n Programming event P n Lunar electric quantity ring ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij ;
And a second step of: according to the lunar electric quantity ring ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij Calculating respective average value, standard deviation and worst value, and obtaining 9 index variables, and recording [ x ] 1 ,…x 9 ]The 9 index variables are respectively the generation month electric quantity ring ratio increase rate alpha ij Average, standard deviation and worst value of (c), month electric quantity and rate of increase beta ij Average value, standard deviation and worst value of (a), line loss rate eta of a station area where a client is located, and meter cover opening event O n And programming event P n ;
And a third step of: creating training sample set [ a ] i ,y i ]Where i=1, … n, a i =[x 1 ,…x 9 ],a i ∈R 9 ,y i =1 is a normal customer, y i = -1 is a power stealing client;
fourth step: for training sample set [ a ]
i ,y
i ]Normalizing, training the normalized training sample through a support vector machine model of the Gaussian kernel function, and obtaining a classification function
Fifth step: by classification function
Classifying samples of unknown classification results, if +.>
Then it is a normal user if->
Then is the electricity stealing user。
The first step is to provide a data base for subsequent learning training by utilizing historical data (historical electricity consumption data) in the existing electricity consumption information acquisition system, wherein the historical electricity consumption data for at least 3 years are required to be acquired; the increase rate alpha of the cycle ratio according to the lunar electric quantity in the second step ij And the month electric quantity is equal to the increase rate beta ij Calculating respective average value, standard deviation and worst value, wherein the worst value is month electric quantity ring ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij The respective maximum absolute values; in the third step, a training sample set [ a ] is established i ,y i ]The training sample set is internally provided with classified samples, the training sample set needs to comprise at least 400 pieces of normal electricity utilization information data and at least 400 pieces of electricity stealing information data, if the electricity stealing information data are 400 pieces, the electricity stealing information data need to comprise 80 pieces of electricity stealing information data for changing the wiring of the electric energy meter, 100 pieces of electricity stealing information data for destroying internal circuits or components of the electric energy meter, 30 pieces of electricity stealing information data for destroying internal programs of the electric energy meter, 50 pieces of electricity stealing information data of external signal interference metering modules and 140 pieces of electricity stealing information data of bypassing metering modules.
According to the invention, basic data is provided for subsequent sample training through calculation and analysis of historical data in an existing electricity information acquisition system, then a training sample set comprising at least 400 pieces of normal electricity information data and at least 400 pieces of electricity stealing information data is established, 9 quantization characteristics are established for each training sample according to the basic data, then training is carried out on the training samples, and a classification function is obtained, namely a given training sample set is used as an input space by using a support vector machine model of a Gaussian kernel function, and then a real value function g (x) is searched in the space, so that a classification function f (x) =sgn (g (x)) is obtained, and then samples of unknown classification results are classified through the classification function. Therefore, the invention can train and acquire the classification function based on the support vector machine technology according to the basic data of the historical electricity consumption information, and establish the accurate classification standard, thereby accurately judging unclassified clients, reducing the on-site checking times of electricity consumption inspectors, hitting electricity stealing behaviors in a targeted way, reducing the loss of power supply enterprises and saving the cost of manpower, financial resources and material resources.
The following are further optimizations and/or improvements to the above-described inventive solution:
as shown in figures 1 and 2, in the fourth step, training sample [ a ] i ,y i ]The specific process of normalization is as follows:
(1) Calculate training sample set [ a i ,y i ]Mean vector μ= [ μ ] 1 ,…μ 9 ]Sum standard deviation vector sigma= [ sigma ] 1 ,…σ 9 ];
(2) The training samples were normalized by equation (1):
wherein i=1, … n; j=1, …;
(3) Deriving row vectors for a normalized training sample set
The training sample set comprises at least 400 normal electricity consumption information data and at least 400 electricity stealing information data, wherein if the electricity stealing information data are 400, the electricity stealing information data need to comprise 80 electricity stealing information data for changing the wiring of the electric energy meter, 100 electricity stealing information data for changing the internal circuit or the components of the electric energy meter after destruction, 30 electricity stealing information data for destroying the internal program of the electric energy meter, 50 electricity stealing information data of the external signal interference metering module and 140 electricity stealing information data of the bypass metering module.
As shown in fig. 1 and 2, the specific process of obtaining the classification function in the fourth step is as follows:
(1) Calculating a support vector s according to the Lagrangian dual function i (i.epsilon.I), weight coefficient ζ i (I ε I) and constant term b;
(2) And (3) establishing an optimal decision function according to the calculation result to obtain a classification function, wherein the classification function is shown in a formula (2):
wherein the method comprises the steps of
For Gaussian kernel function +.>
The calculation of the support vector s based on the Lagrangian dual function i (i.epsilon.I), weight coefficient ζ i The procedure for (I ε I) and constant term b is as follows:
(1) Training sample set [ a ] i ,y i ]The training samples in the model (1) are introduced into a Lagrangian dual function shown in the following formula to obtain an optimization problem, and an optimal solution alpha= [ alpha ] of the weight coefficient is obtained 1 ,...,α 800 ];
(2) According to ζ=αy and
obtain the weight coefficient ζ= [ ζ ]
1 ,...,ζ
800 ]Support vector s
i ;
(3) Selecting a non-zero component alpha in alpha
* Its corresponding classification result is y
* Classifying samples as
(4) According to
The constant term b is obtained.
The process of discriminating the classification sample of the unknown classification result using the above formula (2) is as follows:
(1) To sort samples
Carry into formula (2) calculate +.>
(2) Determining the discrimination result by the following formula:
as shown in figures 1 and 2, in the first step, according to the monthly electricity quantity W of the customer ij Generating a lunar electric quantity cyclic ratio increase rate alpha ij And the month electric quantity is equal to the increase rate beta ij Lunar electric quantity ring ratio increase rate alpha ij The generation formula is shown as formula (3), and the lunar electric quantity is equal to the growth rate beta ij The generation formula is shown as formula (4):
wherein i represents year and j represents month.
The technical characteristics form the optimal embodiment of the invention, have stronger adaptability and optimal implementation effect, and can increase or decrease unnecessary technical characteristics according to actual needs so as to meet the requirements of different situations.