CN115730962A - Big data-based electric power marketing inspection analysis system and method - Google Patents

Big data-based electric power marketing inspection analysis system and method Download PDF

Info

Publication number
CN115730962A
CN115730962A CN202211504926.5A CN202211504926A CN115730962A CN 115730962 A CN115730962 A CN 115730962A CN 202211504926 A CN202211504926 A CN 202211504926A CN 115730962 A CN115730962 A CN 115730962A
Authority
CN
China
Prior art keywords
data
inspection
power
analysis
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211504926.5A
Other languages
Chinese (zh)
Inventor
刘亦驰
吴方权
汤成佳
李雄
胡骏涵
杨松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211504926.5A priority Critical patent/CN115730962A/en
Publication of CN115730962A publication Critical patent/CN115730962A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a big data-based electric power marketing inspection analysis system and method, which comprise a cross-system data partitioning module, an inspection special subject comprehensive management module, a special subject abnormity inspection module, a precise abnormity inspection model module and an inspection work and summary analysis module, wherein the cross-system data partitioning module is used for partitioning electric power marketing data, the inspection special subject comprehensive management module is used for constructing a special subject library of full data retrieval and indexes by combining metering automation data and marketing historical data on the basis of data of a marketing inspection abnormal library, the special subject abnormity inspection module is used for inspecting the quality of special data, the precise abnormity inspection module is used for constructing abnormal data indexes, and different algorithms are selected for model construction and training. The system and the method can achieve the aims of standardizing marketing behaviors, exploiting potential and increasing efficiency, improving marketing policy execution capacity and reducing marketing error accidents by effectively carrying out electric power inspection work.

Description

Big data-based electric power marketing inspection analysis system and method
Technical Field
The invention relates to a big data-based electric power marketing inspection analysis system and method, and belongs to the technical field of electric power marketing inspection analysis.
Background
With the rapid development of social economy, the number of electricity marketing services is increasing day by day, service operation is more frequent, marketing and management risks such as personnel working errors, external default electricity utilization, electricity stealing and the like are more prominent, and the situation faced by electricity marketing work is increasingly complex and severe. The traditional electric power marketing inspection management and control mode mainly takes manpower as a main mode and data analysis as an auxiliary mode, and inspection workers need to come to the site to carry out corresponding work, no matter on-site information collection or subsequent information processing, and the adopted data analysis technology is relatively backward, so that the data analysis efficiency is influenced to a certain extent. Moreover, many electric power marketing inspection problems are difficult to discover in time through data analysis, and relevant workers are required to complete field condition analysis by means of own professional knowledge and experience, find problems therein, and urge relevant main bodies to carry out rectification, so that technical bottlenecks such as low risk prediction intelligence degree, untimely automatic early warning and the like exist, and the problems of low management and control efficiency, poor effect and the like exist.
Therefore, the requirements of further strengthening marketing inspection management work are provided, closed-loop management and control of marketing business risks are realized as a hand grip, automatic statistics and analysis of the system are used as a support, an inspection mechanism is perfected, inspection intensive management is established, data inspection is enhanced, and inspection work efficiency is improved.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the utility model provides a big data-based electric power marketing inspection analysis system and method, which solves the technical problems in the prior art.
The technical scheme adopted by the invention is as follows: a power marketing inspection analysis system based on big data comprises a cross-system data dividing module, an inspection special subject comprehensive management module, a special subject abnormity inspection module, a precise abnormity inspection model module and an inspection work and summary analysis module, wherein the cross-system data dividing module is used for dividing power marketing data, the inspection special subject comprehensive management module is used for combining metering automation data and marketing historical data on the basis of data of a marketing inspection abnormity base to construct a special subject base of full data retrieval and indexes, the special subject abnormity inspection module is used for quality inspection of special data, the precise abnormity inspection module is used for constructing abnormal data indexes, different algorithms are selected to carry out model construction and training, and the inspection work and summary model analysis module is used for analyzing, managing and counting quality conditions of inspection data.
An analysis method of a big data-based electric power marketing inspection analysis system comprises the following steps:
step 1, data extraction: the data source is from a metering automation system and a marketing service application system, and the data extraction model passes through a message queue and a stream processing method;
step 2, data preprocessing: preprocessing the data extracted in the step 1 by adopting a data screening, data cleaning and data conversion method;
step 3, selecting a characteristic index;
step 4, establishing an inspection model: the logistic regression algorithm and the support vector machine score are fused through a voting method to obtain an inspection model;
and 5, obtaining a model output result through training.
The data screening method in the step 2 comprises the following steps: abnormal value detection is carried out on input time sequence data by using a wavelet multi-scale analysis method, and when power utilization curve data of a metering automation system are analyzed, data with the defect number of more than 80% every day can be screened out.
The data cleaning method in the step 2 comprises the following steps:
user information cleaning: reading the relevant information of the user file from the marketing business application system, and excluding the factors that the important business field is empty and irregular filling is performed;
and (4) cleaning a table code: the code display numbers in the sequential time are coherent, and abnormal factors of sudden increase and decrease and small number point shift are eliminated.
And (3) cleaning load curve data: checking the power curve and daily electric quantity, checking the voltage-current curve and the power curve, and eliminating logic error factors;
washing with electric detail data: and filtering records of NULL and NULL values aiming at basic data of voltage, current, power and power factor.
Data cleaning is carried out in different modes according to the user types, and the method comprises the following four types:
1) Filtering operation capacity, recording of comprehensive multiplying power of 0, NULL, NULL value and negative value;
2) Filtering records of the daily electric quantity of the user, such as NULL, NULL value and negative value;
3) Filtering records with a power factor greater than 1;
4) Filtering the record of voltage, current and power as NULL and NULL values.
The data conversion method in the step 2 comprises the following steps: and carrying out numerical processing on the non-numerical data and carrying out dimensionless processing and normalization processing on the original data with different dimensions.
Selecting the characteristic index includes: voltage abnormality index, electric quantity trend decline index, power and current correlation index, measurement reversed polarity index, power factor correlation index and current imbalance correlation index;
(1) The voltage abnormity index takes the proportion of abnormal points in a period as a characteristic index of the model, and the quantization formula is as follows:
Figure BDA0003967838590000031
in the formula, K is the number of abnormal occurrence points, and Q is the number of effective data points;
(2) The quantitative formula of the electric quantity trend decline index is as follows:
Figure BDA0003967838590000041
in the formula, k 1 As an index of the current day's downward trend, f i Is the amount of electricity in the day, f l For a few days before and after, alpha i D is the number of days before and after the weight;
extracting characteristic quantity by a data mining method, and analyzing the power consumption descending trend by a daily power curve and a descending trend method;
(3) And (3) power and current correlation indexes adopt a linear regression function:
P=f(||Ia|+|Ib|+|Ic|)
wherein, P is instantaneous active power, ib and Ic are three-phase current respectively, f is a mapping function of the three-phase current and the instantaneous active power, and the numerical value is a regression coefficient obtained by a least square method;
(4) Measure reverse polarity index
The proportion of abnormal points occurring in the period is used as a characteristic index of the model, and the quantitative formula is as follows:
Figure BDA0003967838590000042
in the formula, K is the number of abnormal occurrence points, and Q is the number of effective data points;
(5) The power factor correlation index quantitatively analyzes the power factor from the power factor of a curve point, the daily freezing power factor and the monthly power factor, and the users of three-phase three-wire and three-phase four-wire in a metering mode analyze the monthly power factor curve and current curve data, wherein the analysis content comprises the following steps: analyzing the daily and monthly power factor fluctuation rate; analyzing daily power factor fluctuation rate; analyzing the correlation between the power factor curve and the current; eliminating interference of low current; the specific analytical and quantitative process is as follows:
1) The power factor fluctuation rate represents the dispersion degree of the power factor, and the variation dispersion coefficient is used for describing the power distribution characteristics, and in probability theory and statistics, the variation coefficient is a normalized measure of the dispersion degree of the probability distribution, and is defined as the ratio of standard deviation to average value:
Figure BDA0003967838590000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003967838590000052
to be the fluctuation ratio, μ represents the average of the samples X1, X2.., xn used, xi represents the power factor value of the ith point, N represents the number of data, typically 96 points;
2) Power factor and current dependency analysis
Figure BDA0003967838590000053
Wherein Cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current fluctuation rate, and Y is power factor fluctuation rate;
wherein the covariance
Figure BDA0003967838590000054
Wherein Cov (X, Y) represents covariance,
Figure BDA0003967838590000055
the average value of the power factor is represented,
Figure BDA0003967838590000056
represents the average value of the current;
(6) The current imbalance correlation index is used for carrying out combined analysis on the current curve and load rate curve data of a monthly level for a special transformer user with a three-phase three-wire and three-phase four-wire metering mode, and the analysis content comprises the following steps: the method comprises the following steps of (1) eliminating data disturbance interference under the condition of low load, wherein the split-phase current balance degree is in relation with a load rate, the time period above a certain load level is in relation with the split-phase current balance rate; the specific analytical and quantitative process is as follows:
1) The current imbalance quantization formula is:
x=max(In-Ip)/Ip
in the formula, in is split-phase power, ip is a three-phase current average value, and X is a three-phase unbalance rate;
2) The load factor quantization formula is:
Y=S/Se
in the formula, S is active power (kW) at a certain point; se is operating capacity (kW); y is the load factor;
3) The three-phase current presents a correlation coefficient in a certain load level, and the quantization formula is as follows:
Figure BDA0003967838590000061
wherein Cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current imbalance ratio, and Y is load ratio, wherein
Figure BDA0003967838590000062
Wherein Cov (X, Y) represents covariance,
Figure BDA0003967838590000063
the average value of the degree of unbalance of the current is represented,
Figure BDA0003967838590000064
the average load factor is shown.
The inspection model is as follows:
P=λ 1 f logic ++λ 2 f svm
in the formula, i belongs to (1, 2) as algorithm weight, i belongs to (1, 2), f logic Logistic regression algorithm, f svm RepresentAnd supporting a vector machine algorithm.
The output result comprises an output model early warning list, and a suspected abnormal user analysis report is generated, and the main contents are as follows:
1) The household number, the household name, the electricity consumption information and the basic information of the metering assets of the user;
2) An exception report comprising: the abnormal coefficient, the general description of the abnormality, and the description of the operating characteristics of electric load, voltage, current, power factor and phase angle;
3) And (3) evidence data: the method is used for supporting analysis summary content and relevant various curve evidential displays aiming at different models.
The invention has the beneficial effects that: compared with the prior art, the system and the method can achieve the aims of standardizing marketing behaviors, exploiting potential synergies, improving marketing policy execution power and reducing marketing error accidents by effectively carrying out electric power inspection work.
Drawings
FIG. 1 is an application architecture diagram of the present invention;
FIG. 2 is a diagram of an inspection early warning model architecture;
FIG. 3 is a schematic diagram of an audit early warning model;
FIG. 4 is a flow chart of a weight scoring training process for different feature selection algorithms;
FIG. 5 is a cross-validation of the results for different model parameters.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1: as shown in fig. 1-3, an electric power marketing inspection analysis system based on big data comprises a cross-system data division module, an inspection special topic comprehensive management module, a special topic abnormity inspection module, a precise abnormity inspection model module and an inspection work and summary analysis module, wherein the cross-system data division module is used for electric power marketing data division, the inspection special topic comprehensive management module is used for constructing a special topic library of full data retrieval and indexes by combining metering automation data and marketing history data on the basis of the data of a marketing inspection abnormal library, the special topic abnormity inspection module is used for quality inspection of special data, the precise abnormity inspection model module is used for constructing abnormal data indexes, different algorithms are selected for model construction and training, and the inspection work and summary analysis module is used for analyzing and counting the quality management conditions of inspection data.
The electric power marketing inspection analysis system based on the big data is based on massive data of the existing marketing system, the metering automation system and other business systems, multi-dimensional maintenance analysis and deep excavation are carried out, intelligent marketing inspection analysis is constructed, business modules such as cross-system data comparison, inspection thematic comprehensive management, thematic anomaly inspection, inspection work analysis summarization and the like are included, an accurate anomaly inspection model is constructed through links such as data exploration, index definition, model construction, training, evaluation and the like, the identification accuracy of marketing inspection anomalies is effectively improved, and the working efficiency is improved. On the basis of the data of the abnormal marketing inspection database, metering automation data and marketing historical data are combined to construct a subject database of full data retrieval and indexes, partial rules outside the marketing management system are expanded, the correlation of inspection is analyzed according to business requirements, the customization of inspection subjects is carried out, the inspection indexes are customized, and certain guiding significance and powerful decision support are provided for inspection work.
Cross-system data alignment: in the data verification process, the important level of the data is further subdivided, and the electricity marketing data is divided into core data, important data and general data by taking factors such as customer service, electricity price and electricity charge as grading conditions. And the data grade is divided, huge marketing data are managed in a graded mode, management resources are saved, and the scientific process of a data quality management normal state mechanism is promoted.
Comprehensive management of inspection topics: on the basis of the data of the marketing inspection abnormal database, the method combines the metering automation data and the marketing history data to construct a special subject database of full data retrieval and indexes, expands part of rules outside the marketing management system, analyzes the inspection correlation according to the business requirements, customizes the inspection special subjects and self-defines the indexes.
Inspecting abnormal special subjects: with the continuous expansion and improvement of marketing business, it is necessary to develop special data inspection. The special inspection of data quality utilizes the inspection to monitor the special inspection management mode, and the following links need to be emphasized: the source of the content of the special inspection, the frequency of the special inspection, the time limit setting of each special inspection and the evaluation of the special inspection.
Accurate anomaly checking model: the method comprises the steps of deeply understanding and inspecting typical cases, integrally inspecting abnormal database data, exploring and analyzing data, constructing data indexes, selecting different algorithms to perform links such as model construction, training, testing, evaluation and the like, optimizing and reconstructing in actual business, and adjusting evaluation indexes, so that the whole intelligent diagnosis model is more intelligent, and the analysis result is more accurate.
Checking work and summary analysis: customizing marketing abnormal business execution condition inspection theme, defining inspection script, analyzing and counting data quality control condition and the like. The inspection function of the marketing information system is perfected, and the development and application of artificial intelligence functions such as intelligent judgment, intelligent check, intelligent classification, intelligent pretreatment and the like are enhanced on the basis of realizing the full-amount cruise inspection of marketing services; and (4) a marketing and auditing linkage information channel is opened, verification and rectification conditions of auditing doubtful problems are tracked, supervised, analyzed and summarized in a marketing information system, and information sharing of a service system and an auditing platform is realized.
Example 2: an analysis method of a power marketing inspection analysis system based on big data comprises the following steps: the user-defined inspection rule is used for finding the existing abnormal data information, deeply mining the abnormal data information, establishing a model for analyzing the abnormal information condition, updating and establishing a case base, a rule base and an abnormal base, thereby comprehensively tracking the user who needs to recover the economic loss by inspection, and fully playing the roles of 'inspection promotion' and 'inspection management'. The method relates to the parts of data extraction, data preprocessing, model construction, model output and the like in the analysis process, and introduces the implementation process by taking electricity abnormity inspection as an example.
1. Data extraction
The data source of the method is from a metering automation system and a marketing business application system, the method mainly comprises massive multi-source data such as user file type information, electricity consumption metering load data, marketing electricity quantity and electricity charge information and the like, and in order to facilitate data processing and improve access efficiency, a data extraction model converts massive, heterogeneous and low-quality basic data without analysis capability into structured data meeting set requirements through a message queue and a stream processing technology.
2. Data pre-processing
The quality of mass data for modeling is important for checking the accuracy of the early warning model. When data are extracted from a metering automation system and a marketing business application system, a large amount of incomplete, inconsistent and abnormal data exist in massive raw data, the execution efficiency of data mining modeling is seriously influenced, and even deviation of a mining result can be caused, so that the data preprocessing is very important.
(1) Data screening
The abnormal value analysis is a process for checking whether the data has logging errors and contains data with abnormal results, and neglecting the existence of the abnormal values is dangerous, and the abnormal values are included in the calculation and analysis process of the data without removing the abnormal values, so that the result is adversely affected. Considering that input data of the inspection early warning model is time sequence data, the method utilizes a wavelet multi-scale analysis method to detect abnormal values of the input time sequence data.
User data with a large number of defects can lose authenticity, so that the method is not suitable for analysis of an inspection early warning model, and data with the number of defects of more than 80% every day can be screened out when power utilization curve data of a metering automatic system are analyzed.
(2) Data cleansing
User information cleaning: and reading the relevant information of the user file from the marketing business application system, and excluding the factors of empty important business fields, irregular filling and the like.
And (4) cleaning a table code: the code representation numbers in the sequence time are coherent, and abnormal factors such as sudden increase and sudden decrease, small number displacement and the like are eliminated.
And (3) cleaning load curve data: and checking the power curve and the daily electric quantity, checking the voltage-current curve and the power curve, and eliminating logic error factors.
Washing with electric detail data: and filtering records of NULL and NULL values aiming at basic data such as voltage, current, power factor and the like.
When the user power consumption detail data is cleaned, users with different power consumption types need to extract different data indexes, and different modes are needed to clean the data according to the user types.
1) Filtering operation capacity, recording of comprehensive multiplying power of 0, NULL, NULL value and negative value;
2) Filtering records of user daily electricity consumption as NULL, NULL value and negative value;
3) Filtering records with a power factor greater than 1;
4) Filtering the record of voltage, current and power as NULL and NULL values.
(3) Data conversion
The data used for inspecting the early warning model includes non-numerical data, and in order to facilitate the modeling numerical calculation, the non-numerical data needs to be processed numerically.
The data used for inspecting the early warning model comprises different types of data, and in order to facilitate machine learning modeling, the original data with different dimensions needs to be subjected to non-dimensionalization and normalization processing.
3. Characteristic index
Data and characteristics determine the upper limit of model prediction, and an algorithm is only used for approaching the upper limit, so that characteristic engineering plays a crucial role in constructing the model. At the initial stage of model construction, relevant electricity utilization characteristics of electricity stealing users, such as basic characteristics, load characteristics, voltage characteristics, current characteristics, electricity consumption characteristics, comprehensive characteristics and the like, are summarized through analysis.
Serial number Feature classification Detailed description of the characteristics
1 Basic characteristics of Wiring, metering, capacity of operation, etc
2 Characteristic of load Load abnormality rate, average load fluctuation rate, maximum load fluctuation rate, load imbalance rate, and the like
3 Voltage characteristic Voltage abnormality rate, voltage fluctuation rate, voltage unbalance rate, and the like
4 Current characteristics Current abnormality rate, current fluctuation rate, current imbalance rate, and the like
5 Characteristic of electricity consumption Tendency of electric meter backward running and power consumption
6 General characteristics Operational load rate, power factor correlation, alarm events, line loss correlation, etc
In order to obtain a better model, meaningful features are selected to be input into the algorithm and the model of machine learning for training. The relevant electricity utilization characteristics of the electricity stealing users are summarized by trying to use different characteristic selection algorithms, such as a filtering method, a packaging method, an embedding method and the like, and finally index characteristics, which enable the accuracy rate and the recall rate of the model to reach more than 80%, of the training data set are selected by continuously training and evaluating the model, as shown in fig. 4.
The feature indices selected at present are: voltage abnormality index, electric quantity trend decline index, power and current correlation index, measurement reversed polarity index, power factor correlation index and current imbalance correlation index.
(1) Voltage abnormality index
The users who use the under-voltage method of connecting resistors in series or causing poor contact of the voltage loop to carry out abnormal electricity utilization respectively have the data characteristics of split-phase voltage loss and open-phase voltage loss.
The more the abnormal occurrence frequency is, the higher the reliability of the event is, the proportion of abnormal points occurring in the period is taken as a characteristic index of the model, and the quantization formula is as follows:
Figure BDA0003967838590000121
in the formula, K is the number of abnormal occurrence points, and Q is the number of valid data points.
(2) Electric quantity trend decline index
The electric quantity trend decline index can reflect that the abnormal electricity utilization characteristic of the metering loop is changed and is used as a characteristic index of the model, and users in part of industries possibly make misjudgment on the result in spring festival and long and false data and need to reject the result; the quantization formula is:
Figure BDA0003967838590000122
in the formula, k l As an index of the current day's downward trend, f i Is the amount of electricity in the day, f l For a few days before and after, alpha i D is the number of days before and after the weight;
and extracting the characteristic quantity by a data mining method. And analyzing the descending trend of the power consumption by a daily power curve and a descending trend method.
(3) Power and current dependency indicator
The current and the power are in positive correlation, and linear regression coefficients between the current and the power established under the same multiplying power are consistent. In which a linear regression function is used:
P=f(|Ia|+|Ib|+|Ic|)
wherein, P is instantaneous active power, ia, ib and Ic are three-phase current respectively, f is a mapping function of the three-phase current and the instantaneous active power, and the numerical value is a regression coefficient obtained by a least square method. It can be seen from the average value of the data that there is a large difference in average power when the average currents are close. As can be seen from the current-power relationship, the currents are very different when the power is the same, indicating that the power metering is unchanged although the power consumption is increased.
(4) Measure reverse polarity index
The users who analyze abnormal electricity by adopting a phase-shifting method such as a voltage/current loop connection method have the following load characteristics: comparing the load characteristics before and after the abnormity, and finding that the load has the characteristic of 'power reversed polarity'.
The more the abnormal occurrence frequency is, the higher the reliability of the event is, the proportion of abnormal points occurring in the period is taken as a characteristic index of the model, and the quantization formula is as follows:
Figure BDA0003967838590000131
in the formula, K is the number of abnormal occurrence points, and Q is the number of valid data points.
(5) Power factor correlation indicator
The proportion of active electric quantity and reactive electric quantity reflected by a production type user from an electric energy metering device is relatively stable, and the power factor is quantitatively analyzed from a curve point power factor, a daily freezing power factor and a monthly power factor.
The users of three-phase three-wire and three-phase four-wire in the metering mode are analyzed according to monthly-level power factor curve and current curve data, and the following contents are analyzed:
analyzing daily and monthly power factor fluctuation rate;
analyzing daily power factor fluctuation rate;
analyzing the correlation between the power factor curve and the current;
eliminating the interference of low current.
The analytical quantification procedure is as follows:
1) The power factor fluctuation rate represents the dispersion degree of the power factor, and the variation dispersion coefficient is used for describing the power distribution characteristics, and in probability theory and statistics, the variation coefficient is a normalized measure of the dispersion degree of the probability distribution, and is defined as the ratio of standard deviation to average value:
Figure BDA0003967838590000141
in the formula (I), the compound is shown in the specification,
Figure BDA0003967838590000142
to the fluctuation ratio, μ represents the average of the samples X1, X2.., xn used, and Xi represents the power factor value at the i-th point.
2) Power factor and current dependency analysis
Figure BDA0003967838590000143
Wherein Cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current fluctuation rate, and Y is power factor fluctuation rate.
Figure BDA0003967838590000144
Wherein Cov (X, Y) represents covariance,
Figure BDA0003967838590000145
the average value of the power factor is represented,
Figure BDA0003967838590000146
the current average value is indicated.
(6) Current imbalance correlation index
The electric energy quality of a user can be ensured only by three-phase load balance, the method is a basis for safe power supply and is a basis for saving energy consumption, reducing loss and price, but the condition that three phases are unbalanced due to the influence on a certain phase by methods such as a phase shifting method exists. For the special transformer users with three-phase three-wire and three-phase four-wire metering modes, combined analysis is carried out by using monthly current curve and load rate curve data, and the analysis content is as follows:
the phase-splitting current balance degree is related to the load factor;
the time interval above a certain load level is related to the split-phase current balance rate;
and eliminating data disturbance interference in a low-load situation.
The analytical quantification procedure is as follows:
1) The current imbalance degree is a condition for representing the split-phase load at a certain time point, and the quantization formula is as follows:
x=max(In-Ip)/Ip
in the formula, in is the split-phase power, ip is the average value of three-phase current, and X is the three-phase imbalance rate.
2) The load rate is the ratio of the user running power to the running capacity, and the quantization formula is as follows:
Y=S/Se
in the formula, S is active power (kW) at a certain point; se is operating capacity (kW); y is the load factor.
3) For a user, production generally has continuity and similarity, three-phase current presents a correlation coefficient in a certain load level, and a quantization formula is as follows:
Figure BDA0003967838590000151
in the formula, cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current unbalance rate, and Y is load rate. Wherein
Figure BDA0003967838590000152
Wherein Cov (X, Y) represents covariance,
Figure BDA0003967838590000153
the average value of the degree of unbalance of the current is represented,
Figure BDA0003967838590000154
the average load factor is shown.
4. Construction of inspection model
Taking historical abnormal user data as a training data set of the model, wherein the training set accounts for 70 percent and is used for model training; the verification set accounts for 30%, the model training process is participated in, the results of different model parameters are subjected to cross verification, and the optimal hyper-parameter of the model is selected. The test set accounts for 30%, and is used for independently evaluating the generalization ability of the model and absolutely not participating in model training, as shown in fig. 5.
And (3) trying to evaluate and verify the model by using different algorithms, and finally evaluating and determining the optimal algorithm suitable for the model through the precision ratio and the recall ratio of the model.
a) Logistic regression, SVM: the performance is stable, the accuracy and recall rate of the training set and the test set do not have large fluctuation, and the generalization capability of the model is strong.
b) Decision tree, random forest: the training set has better performance, but the test set has larger fluctuation, an overfitting phenomenon exists, and the generalization capability of the model is general.
c) K is adjacent to each other: the overall performance is worst.
Figure BDA0003967838590000161
In order to further improve the prediction capability and generalization capability of the model, the 'logistic regression' and 'support vector machine' algorithms which are relatively stable in performance are fused by a 'voting method'.
P=λ 1 f logic ++λ 2 f svm
In the formula of lambda i For the algorithm weight, i ∈ (1, 2), f logic Logistic regression algorithm, f svm : and (4) supporting a vector machine algorithm.
Figure BDA0003967838590000162
The integrated model is evaluated, and the precision ratio and the recall ratio of the model are slightly improved. And the problem that the single model is easy to generate overfitting or low in prediction precision can be avoided as much as possible.
5. Model output
Outputting a model early warning list, and generating a suspected abnormal user analysis report, wherein the main contents are as follows:
1) User basic information (account number, account name, electricity consumption information, meter assets);
2) An exception report comprising: abnormal coefficient, abnormal general description, described by electrical characteristics (including load, voltage, current, power factor, phase angle operating characteristics);
3) Corroborative data: the method is used for supporting analysis summary content and relevant various curve evidence demonstration aiming at different models.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and therefore the scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. The utility model provides an electric power marketing inspection analytic system based on big data which characterized in that: the system comprises a cross-system data dividing module, an inspection special topic comprehensive management module, a special topic abnormity inspection module, a precise abnormity inspection model module and an inspection work and summary analysis module, wherein the cross-system data dividing module is used for dividing electric marketing data, the inspection special topic comprehensive management module is used for constructing a special topic library of full data retrieval and indexes by combining metering automation data and marketing historical data on the basis of the data of a marketing inspection abnormal library, the special topic abnormity inspection module is used for inspecting the quality of special data, the precise abnormity inspection model module is used for constructing abnormal data indexes, different algorithms are selected for carrying out model construction and training, and the inspection work and summary analysis module is used for analyzing and counting the quality control conditions of the inspection data.
2. The analysis method of the big data based electric power marketing inspection analysis system according to claim 1, wherein: the method comprises the following steps:
step 1, data extraction: the data source is from a metering automation system and a marketing service application system, and the data extraction model passes through a message queue and a stream processing method;
step 2, data preprocessing: preprocessing the data extracted in the step 1 by adopting a data screening, data cleaning and data conversion method;
step 3, selecting a characteristic index;
step 4, establishing an inspection model: the logistic regression algorithm and the support vector machine score are fused through a voting method to obtain an inspection model;
and 5, obtaining a model output result through training.
3. The analysis method of the big data based electric power marketing inspection analysis system according to claim 2, wherein: the data screening method in the step 2 comprises the following steps: abnormal value detection is carried out on input time sequence data by utilizing a wavelet multi-scale analysis method, and when power utilization curve data of a metering automation system are analyzed, data with the defect number of more than 80% per day can be screened out.
4. The analysis method of the big data based electric power marketing inspection analysis system according to claim 3, wherein: the data cleaning method in the step 2 comprises the following steps:
user information cleaning: reading the relevant information of the user file from the marketing business application system, and excluding the factors that the important business field is empty and irregular filling is performed;
and (4) cleaning a table code: the code display numbers in the sequential time are coherent, and abnormal factors of sudden increase and decrease and small number point shift are eliminated.
And (3) cleaning load curve data: checking the power curve and daily electric quantity, checking the voltage current curve and the power curve, and eliminating logic error factors;
washing with electric detail data: and filtering records of NULL and NULL values aiming at basic data of voltage, current, power and power factor.
5. The analysis method of the big data based electric power marketing inspection analysis system according to claim 4, wherein: data cleaning is carried out in different modes according to user types, and the method comprises the following four types:
1) Recording the filtration operation capacity, the comprehensive multiplying power of 0, NULL, NULL value and negative value;
2) Filtering records of the daily electric quantity of the user, such as NULL, NULL value and negative value;
3) Filtering records with a power factor greater than 1;
4) Filtering the record of voltage, current and power as NULL and NULL values.
6. The analysis method of the big data based electric power marketing inspection analysis system according to claim 5, wherein: the data conversion method in the step 2 comprises the following steps: and carrying out numerical processing on the non-numerical data and carrying out dimensionless processing and normalization processing on the original data with different dimensions.
7. The analysis method of the big-data-based electric power marketing inspection analysis system according to claim 6, wherein: selecting the characteristic index includes: voltage abnormality index, electric quantity trend decline index, power and current correlation index, measurement reversed polarity index, power factor correlation index and current imbalance correlation index;
(1) The voltage abnormity index takes the proportion of abnormal points in a period as a characteristic index of the model, and the quantization formula is as follows:
Figure FDA0003967838580000031
in the formula, K is the number of abnormal occurrence points, and Q is the number of effective data points;
(2) The quantitative formula of the electric quantity trend decline index is as follows:
Figure FDA0003967838580000032
in the formula, k l Is an index of the current day's downward trend, f i Is the amount of electricity in the day, f l Electric quantity of a few days before and after, alpha i D is the number of days before and after the weight;
extracting characteristic quantity by a data mining method, and analyzing the descending trend of the power consumption by a daily power curve and a descending trend method;
(3) And (3) power and current correlation indexes adopt a linear regression function:
P=f(|Ia|+|Ib|+|Ic|)
wherein, P is instantaneous active power, ia, ib and Ic are three-phase currents respectively, f is a mapping function of the three-phase currents and the instantaneous active power, and the numerical value is a regression coefficient obtained by a least square method;
(4) Measure reverse polarity index
The proportion of abnormal points occurring in the period is used as a characteristic index of the model, and the quantitative formula is as follows:
Figure FDA0003967838580000033
in the formula, K is the number of abnormal occurrence points, and Q is the number of effective data points;
(5) The power factor correlation index carries out quantitative analysis on the power factor from the curve point power factor, the daily freezing power factor and the monthly power consumption power factor, and users of three-phase three-wire and three-phase four-wire in a metering mode analyze the monthly power factor curve and the current curve data, and the analysis content comprises the following steps: analyzing the daily and monthly power factor fluctuation rate; analyzing daily power factor fluctuation rate; analyzing the correlation between the power factor curve and the current; eliminating interference of low current; the specific analysis and quantification process is as follows:
1) The power factor fluctuation rate represents the dispersion degree of the power factor, and the variation dispersion coefficient is used for describing the power distribution characteristics, and in probability theory and statistics, the variation coefficient is a normalized measure of the dispersion degree of the probability distribution, and is defined as the ratio of standard deviation to average value:
Figure FDA0003967838580000041
in the formula (I), the compound is shown in the specification,
Figure FDA0003967838580000042
for the fluctuation ratio, μ represents the average of the samples X1, X2,.., xn, X, employed i Representing the power factor value of the ith point, wherein N represents the number of data;
2) Power factor and current dependency analysis
Figure FDA0003967838580000043
Wherein Cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current fluctuation rate, and Y is power factor fluctuation rate;
wherein the covariance
Figure FDA0003967838580000044
Wherein Cov (X, Y) represents covariance,
Figure FDA0003967838580000045
it is expressed as an average value of the power factor,
Figure FDA0003967838580000046
represents the average value of the current;
(6) The current imbalance correlation index is used for carrying out combined analysis on the current curve and load rate curve data of a monthly level for a special transformer user with a three-phase three-wire and three-phase four-wire metering mode, and the analysis content comprises the following steps: the method comprises the following steps of (1) eliminating data disturbance interference under the condition of low load, wherein the split-phase current balance degree is in relation with a load rate, the time period above a certain load level is in relation with the split-phase current balance rate; the specific analysis and quantification process is as follows:
1) The current imbalance quantization formula is:
x=max(In-Ip)/Ip
in the formula, in is split-phase power, ip is a three-phase current average value, and X is a three-phase unbalance rate;
2) The load factor quantization formula is:
Y=S/Se
in the formula, S is active power (kW) at a certain point; se is operating capacity (kW); y is the load factor;
3) The three-phase current presents a correlation coefficient in a certain load level, and the quantization formula is as follows:
Figure FDA0003967838580000051
wherein Cov (X, Y) is covariance of X and Y, var [ X ] is variance of X, var [ Y ] is variance of Y, X is current imbalance ratio, and Y is load ratio, wherein
Figure FDA0003967838580000052
Wherein Cov (X, Y) represents covariance,
Figure FDA0003967838580000053
the average value of the current imbalance is represented,
Figure FDA0003967838580000054
the average load factor is shown.
8. The analysis method of the big data based electric power marketing inspection analysis system according to claim 7, wherein: the inspection model is as follows:
P=λ 1 f logic ++λ 2 f svm
wherein i belongs to (1, 2) is algorithm weight, i belongs to (1, 2), f logic Logistic regression algorithm, f svm A support vector machine algorithm is represented.
9. The analysis method of the big data based electric power marketing inspection analysis system according to claim 7, wherein: the output result comprises an output model early warning list and a suspected abnormal user analysis report, and the main contents are as follows:
1) The household number, the household name, the electricity consumption information and the basic information of the meter assets of the user;
2) An exception report comprising: the abnormal coefficient, the general description of the abnormality, and the description of the operating characteristics of electric load, voltage, current, power factor and phase angle;
3) Corroborative data: the method is used for supporting analysis summary content and relevant various curve evidential displays aiming at different models.
CN202211504926.5A 2022-11-28 2022-11-28 Big data-based electric power marketing inspection analysis system and method Pending CN115730962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211504926.5A CN115730962A (en) 2022-11-28 2022-11-28 Big data-based electric power marketing inspection analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211504926.5A CN115730962A (en) 2022-11-28 2022-11-28 Big data-based electric power marketing inspection analysis system and method

Publications (1)

Publication Number Publication Date
CN115730962A true CN115730962A (en) 2023-03-03

Family

ID=85298839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211504926.5A Pending CN115730962A (en) 2022-11-28 2022-11-28 Big data-based electric power marketing inspection analysis system and method

Country Status (1)

Country Link
CN (1) CN115730962A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117291649A (en) * 2023-11-27 2023-12-26 云南电网有限责任公司信息中心 Intensive marketing data processing method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117291649A (en) * 2023-11-27 2023-12-26 云南电网有限责任公司信息中心 Intensive marketing data processing method and system
CN117291649B (en) * 2023-11-27 2024-02-23 云南电网有限责任公司信息中心 Intensive marketing data processing method and system

Similar Documents

Publication Publication Date Title
CN110223196B (en) Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
Cody et al. Decision tree learning for fraud detection in consumer energy consumption
CN110458230A (en) A kind of distribution transforming based on the fusion of more criterions is with adopting data exception discriminating method
CN113887616B (en) Real-time abnormality detection method for EPG connection number
CN105426980B (en) Power distribution network health index assessment engineering application system
CN110222991B (en) Metering device fault diagnosis method based on RF-GBDT
CN111861211A (en) System with double-layer anti-electricity-stealing model
CN116432123A (en) Electric energy meter fault early warning method based on CART decision tree algorithm
Long et al. A data-driven combined algorithm for abnormal power loss detection in the distribution network
Qiu et al. Anomaly detection for power consumption patterns in electricity early warning system
CN115730962A (en) Big data-based electric power marketing inspection analysis system and method
CN110968703B (en) Method and system for constructing abnormal metering point knowledge base based on LSTM end-to-end extraction algorithm
Baembitov et al. Fast extraction and characterization of fundamental frequency events from a large PMU dataset using big data analytics
CN115409120A (en) Data-driven-based auxiliary user electricity stealing behavior detection method
CN115293257A (en) Detection method and system for abnormal electricity utilization user
Massaferro et al. Improving electricity non technical losses detection including neighborhood information
CN114240041A (en) Lean line loss analysis method and system for distribution network distribution area
CN115905319B (en) Automatic identification method and system for abnormal electricity fees of massive users
CN115147242A (en) Power grid data management system based on data mining
CN112256735B (en) Power consumption monitoring method and device, computer equipment and storage medium
Huaiying et al. Research on technical architecture and application of big data cloud platform for electric power measurement
Jingyu et al. Statistical analysis of distribution network fault information based on multi-source heterogeneous data mining
CN112308338A (en) Power data processing method and device
Lu et al. Time series power anomaly detection based on Light Gradient Boosting Machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination