CN110675020A - High-price low-access user identification method based on big data - Google Patents

High-price low-access user identification method based on big data Download PDF

Info

Publication number
CN110675020A
CN110675020A CN201910764680.7A CN201910764680A CN110675020A CN 110675020 A CN110675020 A CN 110675020A CN 201910764680 A CN201910764680 A CN 201910764680A CN 110675020 A CN110675020 A CN 110675020A
Authority
CN
China
Prior art keywords
price
electricity
model
low
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910764680.7A
Other languages
Chinese (zh)
Inventor
段志田
陈莹
邹禹平
贾嘉
董兵
高伟
臧依璨
高嘉伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910764680.7A priority Critical patent/CN110675020A/en
Publication of CN110675020A publication Critical patent/CN110675020A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a high-price low-access user identification method based on big data, which is characterized in that service characteristic items are combed to form the basic construction of a model based on service regulation requirements, key work development conditions and problems found by internal and external inspection; based on the extracted model characteristic variables, high-price low-connection characteristic analysis such as mismatching of user industry characteristics and power utilization characteristics, inconsistency of power price execution and policy, inconsistency of archives and execution conditions of power charge calculation and the like is realized; and establishing a three-layer perceptron neural network based on a neural network quantile regression model, and realizing the electrovalence execution standard classification by adopting a cross verification method, an AIC (automatic information center) criterion, a BIC (binary information center) criterion and the like. Moreover, machine learning is realized through a big data supervised learning model, and a potential association mode of the expression characteristics of the high-price low-level connection judgment object and the high-price low-level connection judgment problem is found out; by continuously optimizing the monitoring indexes and rules, the problem object is comprehensively mined, and the accuracy of the suspected problem object is improved.

Description

High-price low-access user identification method based on big data
Technical Field
The invention belongs to the technical field, and particularly relates to a high-price low-access user identification method based on big data.
Background
The electricity price is an important factor of the income of electricity selling and is the key of the operational benefit of the power enterprises. The situation that the electricity utilization types of some users are different from the industry types of some users in daily electricity inspection and business handling, such as the industries of agriculture, industry, manufacturing and the like, is directly lost for company operation, so that a high-price low-access user big data model is established in a semi-supervised machine learning mode and the like, the identification of high-price low-access users of a company is finally realized in a model training mode and the like, and accordingly, the field screening is completed by combining with the field inspection work of electricity utilization inspection and the like, and the economic loss is recovered for the company.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a high-price low-access user identification method based on big data. The method analyzes a large amount of data of high-price low-access users discovered historically, utilizes a big data analysis tool to extract characteristics of the high-price low-access users, constructs a high-price low-access user identification big data model, completes the perfection of the model through model training, and realizes the identification of the high-price low-access users existing in a company.
The invention is realized by the following technical scheme:
a high-price low-access user identification method based on big data is carried out according to the following steps:
step one, extracting model characteristic variables
Combing service characteristic items to form basic construction of a model based on service regulation requirements, key work development conditions and problems found by internal and external inspection;
step two, high-price low-junction characteristic analysis
1) User industry feature and power usage feature mismatch
The mismatching of user industry characteristics and electricity utilization characteristics mainly analyzes and researches electricity price into agricultural drainage and irrigation, electricity exists in each month, the electricity consumption of resident electricity price users is overlarge in each month, electricity consumption characteristics of schools do not accord, the electricity price of resident lighting users is classified into non-residents, temporary electricity consumers execute large industrial electricity price, temporary electricity consumers exceed the period, temporary electricity consumers sell pre-harvest, execute large industry, agriculture and general industrial electricity price, the electricity consumption of electricity heating and electricity price users is slightly small in winter, the electricity consumption of the valley section of the retail industry is overlarge, the industry is manufacturing industry, the load rate is lower than a certain value, the electricity price for charging electricity is used, but the electricity consumption is higher, the contract capacity is smaller than 110kv household names include hotel, restaurant, hotel, cold storage, shopping mall, non-execution time-sharing electricity price of supermarket, the household name includes ' committee ' and non-execution committee electricity price, the household name includes ' school ' and ' kindergarten ' no execution electricity price ', the household name does not execute electricity price, The house name contains 'drainage and irrigation' and does not perform characteristic data mining such as drainage and irrigation electricity price.
2) The price of electricity is not in accordance with the policy
Analyzing that the execution of the electricity selling price is inconsistent with the policy, mainly analyzing from the following policies and mining the rule:
① users with the transformer capacity below 100KVA, the power factor standard should be "not assessed";
② agricultural and wholesale households with transformer capacity above 100KVA, the power factor standard should be 0.8;
③ the power factor standard should be 0.85 for industrial and non-industrial users with transformer capacity above 100KVA and below 160 KVA;
④ Industrial users with transformer capacity above 160KVA, the power factor standard should be 0.9;
⑥ standard power factor is 0.9, transformer capacity is lower than 160 KVA;
⑦ contract capacity is small 315KVA, large industrial electricity price cannot be executed;
⑦ executing agricultural electricity prices and not executing time-sharing;
⑧ demand user, checking the demand less than 40% of the sum of the capacities;
⑨ the users who have expired capacity reduction and new loading and capacity increase apply 50% of the basic electricity charge for reduction or suspension within two years.
3) The file is inconsistent with the execution condition of the electric charge calculation
The file is analyzed to be inconsistent with the execution condition of the electric charge calculation, and the file is mainly mined from the condition that the metering mode is high-supply low-count users and the charge loss is not counted.
Step three, model algorithm design
The method comprises the steps of extracting and storing power utilization collected data, and realizing data preprocessing through collected data management, wherein the data preprocessing comprises data format management, data integrity check, marketing data format check, archive problem management, voltage and current missing data management, repeated data management, voltage and current error data management, table changing behavior analysis, user ammeter abnormal event analysis and the like.
And the big data modeling is built by adopting a deep learning frame TensorFlow, and the training is accelerated by utilizing GPU equipment. And verifying the effect of the model by adopting K-fold cross verification, wherein the illegal electricity utilization identification model adopts indexes such as accuracy, recall rate, AUC and the like to evaluate the effect. And after the model is deployed, the model is evaluated through off-line analysis and on-line analysis at the same time. And the offline part is used for calculating the accuracy of the default electricity utilization identification model by utilizing the AUC index aiming at the test set with the label, and the larger the AUC is, the more accurate the default electricity utilization identification model is. In the on-line actual operation process, access user data is predicted in near real time, suspected user results exceeding the probability threshold selected in the modeling process are provided for business personnel for use, verification of default electricity utilization is assisted, the precision rate of the default electricity utilization identification model is calculated according to the verification results, and the accuracy in the actual operation process is further evaluated.
The label y of the default power consumption such as various high-price low-price connections in the power consumption history check record is 1, and the other users are used as negative samples y which are 0. And respectively establishing high-price low-connection analysis models for public low-voltage users and special transformer users. And comparing and selecting models such as a gradient ascending decision tree, an LSTM neural network time sequence model, an SVM and the like, and establishing the association between X and y through model training. Wherein, the structure diagram of the LSTM model is shown as the following diagram:
at each time, the input of the neural network model comprises the multidimensional characteristic X at the current time, and n state nodes St < S1, S2, S3, … and Sn > at the current time t are obtained through the transformation of each hidden layer. On the basis of the DNN network, a time sequence Long-Term Memory (LSTM) model is combined, and the output of the time t is a function Ot ═ f (St + W × St-1) of the current time state St and the previous time state St-1. The LSTM can model the dependency relationship in the long-term and short-term data of the client at the same time, and iterative training and prediction are developed along with time. The model finally outputs the probability Pi of the default power utilization as 1/(1+ e-Ot).
And the default electricity identification model feeds back the result of the manual verification to the training process of the model to form a data optimization closed loop and continuously optimize the model effect. And considering that the sample data of different regions are different, independently optimizing the corresponding modeling method of the user characteristics of the specific region in the optimization process. For users with different power consumption scales, the power consumption behaviors of the users have obvious difference, and model modeling results of the users with different power consumption grades are different. For users with different electricity consumptions, the electricity consumption behaviors of the users are analyzed based on the information of the daily freezing data, the electricity consumption user files, the specifications of the electric energy meters and the like of the electric energy meters, the influence of different electricity consumptions on high-price and low-price electricity connection identification is judged, and a high-price and low-price electricity connection identification model is continuously perfected and optimized, so that the identification of the users with different characteristics is more accurate. The line loss characteristics of the distribution areas with different load bearing user quantities are different, the model needs to be further optimized and adapted, the distribution areas with large number of users in different areas need to be optimized and adapted according to specific conditions due to different power supply radiuses and user load characteristics, continuous adaptation is carried out according to field checking results, the model is optimized, and the accuracy of model calculation is continuously improved.
Step four, machine learning
And accumulating found problem case data for a long time, periodically taking a result determined by checking as experience information, inputting the experience information into a rule optimization model, realizing machine learning through a big data supervision learning model, and finding out a potential association mode of the expression characteristics of the high-price low-connection judgment object and the high-price low-connection judgment problem. And monitoring indexes and rules are continuously optimized, the comprehensiveness of problem object mining is improved, and the accuracy of suspected problem objects is improved. The method comprises the following specific steps:
1) when the high-price low-price monitoring is triggered, the characteristic information (including all monitoring index information and basic attribute information of high-price low-price judgment objects) of various high-price low-price judgment objects in the current period and high-price low-price judgment result information (after verification, whether the suspected problem object really has a problem or not) of the suspected problem object are comprehensively recorded by combining the high-price low-price judgment index and the rule system. And forming mass high-price low-price judgment object characteristic information and high-price low-price judgment result information data through long-term accumulation, and inputting the data as training sample data into a rule optimization model to carry out machine learning training.
2) Before a high-price low-junction judgment period is triggered, machine learning training is carried out on a current high-price low-junction judgment subject based on high-price low-junction judgment experience data accumulated historically, according to high-price low-junction judgment object feature information and high-price low-junction judgment result information data in historical high-price low-junction judgment case data, a supervised learning related big data technology is adopted, appearance features capable of representing whether an object has problems or not are found, and a correlation mode of the high-price low-junction judgment object appearance features and the high-price low-junction judgment problems is determined. Based on the objective analysis of the model on the historical experience case, guidance suggestions are provided for optimizing high-price low-price judgment indexes and rule systems.
3) Comprehensively analyzing factors such as customer electricity utilization files, industry classification, electricity utilization equipment, load characteristics, electricity selling average price and the like, drawing expert high-price low-price connection judgment experience, establishing a high-price low-price connection judgment data model for electricity selling price execution by utilizing a big data technology, analyzing self conditions of classified electricity price, time-of-use electricity price, fund and additional, basic electricity charge and power-adjusted electricity charge, and providing a basis for standardized electricity selling price execution management.
Step five, high-price low-access user identification
The identification of high-price low-connection users is realized based on a data model, the field discrimination is completed by combining the analyzed result with the field checking work such as electricity inspection, and the related abnormal problem treatment is assisted.
The invention has the advantages and beneficial effects that:
the high-price low-access user identification method based on big data, disclosed by the invention, is based on business regulation requirements, key work development conditions and problems found by internal and external inspection, and is used for combing business characteristic items to form basic construction of a model; based on the extracted model characteristic variables, high-price low-connection characteristic analysis such as mismatching of user industry characteristics and power utilization characteristics, inconsistency of power price execution and policy, inconsistency of archives and execution conditions of power charge calculation and the like is realized; and establishing a three-layer perceptron neural network based on a neural network quantile regression model, and realizing the electrovalence execution standard classification by adopting a cross verification method, an AIC (automatic information center) criterion, a BIC (binary information center) criterion and the like. Moreover, machine learning is realized through a big data supervised learning model, and a potential association mode of the expression characteristics of the high-price low-level connection judgment object and the high-price low-level connection judgment problem is found out; by continuously optimizing the monitoring indexes and rules, the problem object is comprehensively mined, and the accuracy of the suspected problem object is improved. The invention realizes the identification of high-price low-connection users based on a big data model, completes the on-site discrimination by combining the analyzed result with the on-site checking work such as electricity inspection and the like, assists in abnormal processing and improves the production and operation level of companies.
Drawings
Fig. 1 is a high-price low-connection user identification method architecture diagram based on big data.
FIG. 2 is a diagram of the LSTM model structure in the embodiment.
For a person skilled in the art, other relevant figures can be obtained from the above figures without inventive effort.
Detailed Description
In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.
Examples
(1) Model feature variable extraction
As shown in FIG. 1, based on the business regulation requirements, the development of the key work, and the problems found by internal and external inspection, the business feature items are combed to form the basic construction of the model. The high-price low-power-price execution characteristics are mainly considered from three aspects of mismatching of the user industry characteristics and the power utilization characteristics, mismatching of power price execution and policy, and inconsistent execution conditions of files and power charge calculation, and are specifically shown in the following table 1.
TABLE 1 high-price and low-price execution characteristic category list
Figure BDA0002171551450000051
Figure BDA0002171551450000061
(2) High cost low cost signature analysis
1) User industry feature and power usage feature mismatch
The mismatching of user industry characteristics and electricity utilization characteristics mainly analyzes and researches electricity price into agricultural drainage and irrigation, electricity exists in each month, the electricity consumption of resident electricity price users is overlarge in each month, electricity consumption characteristics of schools do not accord, the electricity price of resident lighting users is classified into non-residents, temporary electricity consumers execute large industrial electricity price, temporary electricity consumers exceed the period, temporary electricity consumers sell pre-harvest, execute large industry, agriculture and general industrial electricity price, the electricity consumption of electricity heating and electricity price users is slightly small in winter, the electricity consumption of the valley section of the retail industry is overlarge, the industry is manufacturing industry, the load rate is lower than a certain value, the electricity price for charging electricity is used, but the electricity consumption is higher, the contract capacity is smaller than 110kv household names include hotel, restaurant, hotel, cold storage, shopping mall, non-execution time-sharing electricity price of supermarket, the household name includes ' committee ' and non-execution committee electricity price, the household name includes ' school ' and ' kindergarten ' no execution electricity price ', the household name does not execute electricity price, The house name contains 'drainage and irrigation' and does not perform characteristic data mining such as drainage and irrigation electricity price.
2) The price of electricity is not in accordance with the policy
Analyzing that the execution of the electricity selling price is inconsistent with the policy, mainly analyzing from the following policies and mining the rule:
① users with the transformer capacity below 100KVA, the power factor standard should be "not assessed";
② agricultural and wholesale households with transformer capacity above 100KVA, the power factor standard should be 0.8;
③ the power factor standard should be 0.85 for industrial and non-industrial users with transformer capacity above 100KVA and below 160 KVA;
④ Industrial users with transformer capacity above 160KVA, the power factor standard should be 0.9;
⑥ standard power factor is 0.9, transformer capacity is lower than 160 KVA;
⑦ contract capacity is small 315KVA, large industrial electricity price cannot be executed;
⑦ executing agricultural electricity prices and not executing time-sharing;
⑧ demand user, checking the demand less than 40% of the sum of the capacities;
⑨ the users who have expired capacity reduction and new loading and capacity increase apply 50% of the basic electricity charge for reduction or suspension within two years.
3) The file is inconsistent with the execution condition of the electric charge calculation
The file is analyzed to be inconsistent with the execution condition of the electric charge calculation, and the file is mainly mined from the condition that the metering mode is high-supply low-count users and the charge loss is not counted.
(3) Model algorithm design
The method comprises the steps of extracting and storing power utilization collected data, and realizing data preprocessing through collected data management, wherein the data preprocessing comprises data format management, data integrity check, marketing data format check, archive problem management, voltage and current missing data management, repeated data management, voltage and current error data management, table changing behavior analysis, user ammeter abnormal event analysis and the like.
And the big data modeling is built by adopting a deep learning frame TensorFlow, and the training is accelerated by utilizing GPU equipment. And verifying the effect of the model by adopting K-fold cross verification, wherein the illegal electricity utilization identification model adopts indexes such as accuracy, recall rate, AUC and the like to evaluate the effect. And after the model is deployed, the model is evaluated through off-line analysis and on-line analysis at the same time. And the offline part is used for calculating the accuracy of the default electricity utilization identification model by utilizing the AUC index aiming at the test set with the label, and the larger the AUC is, the more accurate the default electricity utilization identification model is. In the on-line actual operation process, access user data is predicted in near real time, suspected user results exceeding the probability threshold selected in the modeling process are provided for business personnel for use, verification of default electricity utilization is assisted, the precision rate of the default electricity utilization identification model is calculated according to the verification results, and the accuracy in the actual operation process is further evaluated.
The label y of the default power consumption such as various high-price low-price connections in the power consumption history check record is 1, and the other users are used as negative samples y which are 0. And respectively establishing high-price low-connection analysis models for public low-voltage users and special transformer users. And comparing and selecting models such as a gradient ascending decision tree, an LSTM neural network time sequence model, an SVM and the like, and establishing the association between X and y through model training. Wherein, the structure diagram of the LSTM model is shown as the following diagram:
at each time, the input of the neural network model comprises the multidimensional characteristic X at the current time, and n state nodes St < S1, S2, S3, … and Sn > at the current time t are obtained through the transformation of each hidden layer. On the basis of the DNN network, a time sequence Long-Term Memory (LSTM) model is combined, and the output of the time t is a function Ot ═ f (St + W × St-1) of the current time state St and the previous time state St-1. The LSTM can model the dependency relationship in the long-term and short-term data of the client at the same time, and iterative training and prediction are developed along with time. The model finally outputs the probability Pi of the default power utilization as 1/(1+ e-Ot).
And the default electricity identification model feeds back the result of the manual verification to the training process of the model to form a data optimization closed loop and continuously optimize the model effect. And considering that the sample data of different regions are different, independently optimizing the corresponding modeling method of the user characteristics of the specific region in the optimization process. For users with different power consumption scales, the power consumption behaviors of the users have obvious difference, and model modeling results of the users with different power consumption grades are different. For users with different electricity consumptions, the electricity consumption behaviors of the users are analyzed based on the information of the daily freezing data, the electricity consumption user files, the specifications of the electric energy meters and the like of the electric energy meters, the influence of different electricity consumptions on high-price and low-price electricity connection identification is judged, and a high-price and low-price electricity connection identification model is continuously perfected and optimized, so that the identification of the users with different characteristics is more accurate. The line loss characteristics of the distribution areas with different load bearing user quantities are different, the model needs to be further optimized and adapted, the distribution areas with large number of users in different areas need to be optimized and adapted according to specific conditions due to different power supply radiuses and user load characteristics, continuous adaptation is carried out according to field checking results, the model is optimized, and the accuracy of model calculation is continuously improved.
(4) Machine learning
And accumulating found problem case data for a long time, periodically taking a result determined by checking as experience information, inputting the experience information into a rule optimization model, realizing machine learning through a big data supervision learning model, and finding out a potential association mode of the expression characteristics of the high-price low-connection judgment object and the high-price low-connection judgment problem. And monitoring indexes and rules are continuously optimized, the comprehensiveness of problem object mining is improved, and the accuracy of suspected problem objects is improved. The method comprises the following specific steps:
1) when the high-price low-price monitoring is triggered, the characteristic information (including all monitoring index information and basic attribute information of high-price low-price judgment objects) of various high-price low-price judgment objects in the current period and high-price low-price judgment result information (after verification, whether the suspected problem object really has a problem or not) of the suspected problem object are comprehensively recorded by combining the high-price low-price judgment index and the rule system. And forming mass high-price low-price judgment object characteristic information and high-price low-price judgment result information data through long-term accumulation, and inputting the data as training sample data into a rule optimization model to carry out machine learning training.
2) Before a high-price low-junction judgment period is triggered, machine learning training is carried out on a current high-price low-junction judgment subject based on high-price low-junction judgment experience data accumulated historically, according to high-price low-junction judgment object feature information and high-price low-junction judgment result information data in historical high-price low-junction judgment case data, a supervised learning related big data technology is adopted, appearance features capable of representing whether an object has problems or not are found, and a correlation mode of the high-price low-junction judgment object appearance features and the high-price low-junction judgment problems is determined. Based on the objective analysis of the model on the historical experience case, guidance suggestions are provided for optimizing high-price low-price judgment indexes and rule systems.
3) Comprehensively analyzing factors such as customer electricity utilization files, industry classification, electricity utilization equipment, load characteristics, electricity selling average price and the like, drawing expert high-price low-price connection judgment experience, establishing a high-price low-price connection judgment data model for electricity selling price execution by utilizing a big data technology, analyzing self conditions of classified electricity price, time-of-use electricity price, fund and additional, basic electricity charge and power-adjusted electricity charge, and providing a basis for standardized electricity selling price execution management.
(5) High cost low access user identification
The identification of high-price low-connection users is realized based on a data model, the field discrimination is completed by combining the analyzed result with the field checking work such as electricity inspection, and the related abnormal problem treatment is assisted.
The invention has been described in an illustrative manner, and it is to be understood that any simple variations, modifications or other equivalent changes which can be made by one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.

Claims (5)

1. A high-price low-access user identification method based on big data is characterized by comprising the following steps:
step one, extracting model characteristic variables
Combing service characteristic items to form basic construction of a model based on service regulation requirements, key work development conditions and problems found by internal and external inspection;
step two, high-price low-junction characteristic analysis
The high-price low-connection features include: 1) the user industry characteristics are not matched with the electricity utilization characteristics; 2) the electricity price execution does not accord with the policy; 3) the file is inconsistent with the execution condition of the electric charge calculation;
step three, model algorithm design
(1) Extracting and storing the electricity utilization collected data, and realizing data preprocessing by collected data treatment;
(2) building big data modeling by adopting a deep learning framework TensorFlow, and accelerating training by utilizing GPU equipment;
(3) verifying the model effect by adopting K-fold cross verification;
(4) after the model is deployed, simultaneously performing model evaluation through off-line analysis and on-line analysis;
(5) various labels y of default electricity consumption such as high-price low-voltage connection and the like in the electricity consumption history check record are equal to 1, other users are used as negative samples y equal to 0, high-price low-voltage connection analysis models are respectively established for public low-voltage users and special transformer users, a gradient rising decision tree, an LSTM neural network time sequence model and an SVM model are compared and selected, and the association of X and y is established through model training; the structure of the LSTM model is as follows: at each moment, the input of a neural network model comprises a multi-dimensional feature X of the current moment, n state nodes St of the current moment t are obtained through transformation of hidden layers, wherein n state nodes St of the current moment t are < S1, S2, S3, … and Sn >, on the basis of the DNN network, a time sequence long short-term memory model is combined, the output of the moment t is a function Ot f (St + W St-1) of the current moment state St and a previous moment state St-1, the LSTM can simultaneously model the dependency relationship in client long-term and short-term data, iterative training and prediction are carried out along with time development, and the probability Pi of the finally outputting the power utilization violation is 1/(1+ e-Ot) by the model;
(6) the default electricity identification model feeds back the result of the manual verification to the training process of the model to form a data optimization closed loop and continuously optimize the effect of the model;
step four, machine learning
Accumulating found problem case data for a long time, periodically taking a result determined by checking as experience information, inputting the experience information into a rule optimization model, realizing machine learning through a big data supervision learning model, finding out the expression characteristics of a high-price low-connection judgment object and a potential association mode of the high-price low-connection judgment problem, continuously optimizing monitoring indexes and rules, improving the comprehensiveness of problem object mining and improving the accuracy of suspected problem objects;
step five, high-price low-access user identification
The identification of high-price low-connection users is realized based on a data model, the field discrimination is completed by combining the analyzed result with the field checking work such as electricity inspection, and the related abnormal problem treatment is assisted.
2. A big data based high-priced low-connected user recognition method as claimed in claim 1, characterized in that: in step two, the feature data of the user industry feature not matched with the electricity utilization feature comprises: the electricity price is agricultural drainage and irrigation, electricity quantity exists in each month, the monthly electricity consumption of resident electricity price users is overlarge, the electricity consumption characteristics of schools are inconsistent, the electricity price industry of resident lighting users is classified into non-residents, the temporary electricity consumption users execute large industrial electricity price, the temporary electricity consumption users exceed the period, the temporary electricity consumption users sell pre-harvest, the electricity price industry of general industrial and commercial is classified into resident life, the electricity consumption of electricity heating electricity price users is small in winter, the electricity quantity in the valley section of the retail industry is overlarge, the industry is the manufacturing industry, the load rate is lower than a certain value, and electricity is consumed at the charging electricity price, but the power consumption is higher, the contract capacity is less than 110kv, and the time-of-use electricity price of the hotel, the restaurant, the hotel, the refrigerator, the mall and the supermarket should not be executed, the house name comprises the electricity price of the "committee of living" and the electricity price of committee of living, the house name comprises the electricity price of school, "kindergarten" and the electricity price of drainage and irrigation are not executed.
3. A big data based high-priced low-connected user recognition method as claimed in claim 1, characterized in that: the characteristic data of the power rate execution and policy inconsistency comprises:
① users with the transformer capacity below 100KVA, the power factor standard should be "not assessed";
② agricultural and wholesale households with transformer capacity above 100KVA, the power factor standard should be 0.8;
③ the power factor standard should be 0.85 for industrial and non-industrial users with transformer capacity above 100KVA and below 160 KVA;
④ Industrial users with transformer capacity above 160KVA, the power factor standard should be 0.9;
⑥ standard power factor is 0.9, transformer capacity is lower than 160 KVA;
⑦ contract capacity is small 315KVA, large industrial electricity price cannot be executed;
⑦ executing agricultural electricity prices and not executing time-sharing;
⑧ demand user, checking the demand less than 40% of the sum of the capacities;
⑨ the users who have expired capacity reduction and new loading and capacity increase apply 50% of the basic electricity charge for reduction or suspension within two years.
4. The big-data-based high-price low-connection user identification method as claimed in claim 1, wherein the data characteristics of the file inconsistent with the implementation of the electric charge calculation are as follows: the metering mode is high-supply low-metering users, and the loss of the charge is not counted.
5. The big-data-based high-price low-connection user identification method as claimed in claim 1, wherein the implementation steps of the machine learning in the fourth step are as follows:
1) when high-price low-price connection monitoring is triggered, combining a high-price low-price connection judgment index and a rule system, comprehensively recording characteristic information of various high-price low-price connection judgment objects and high-price low-price connection judgment result information of suspected problem objects in the current period; and forming mass high-price low-price judgment object characteristic information and high-price low-price judgment result information data through long-term accumulation, and inputting the data as training sample data into a rule optimization model to carry out machine learning training.
2) Before a high-price low-junction judgment period is triggered, machine learning training is carried out on a current high-price low-junction judgment subject based on high-price low-junction judgment experience data accumulated historically, according to high-price low-junction judgment object feature information and high-price low-junction judgment result information data in historical high-price low-junction judgment case data, a supervised learning related big data technology is adopted, appearance features capable of representing whether an object has problems or not are found, and a correlation mode of the high-price low-junction judgment object appearance features and the high-price low-junction judgment problems is determined. Based on the objective analysis of the model on the historical experience case, guidance suggestions are provided for optimizing high-price low-price judgment indexes and rule systems.
3) Comprehensively analyzing factors such as customer electricity utilization files, industry classification, electricity utilization equipment, load characteristics, electricity selling average price and the like, drawing expert high-price low-price connection judgment experience, establishing a high-price low-price connection judgment data model for electricity selling price execution by utilizing a big data technology, analyzing self conditions of classified electricity price, time-of-use electricity price, fund and additional, basic electricity charge and power-adjusted electricity charge, and providing a basis for standardized electricity selling price execution management.
CN201910764680.7A 2019-08-19 2019-08-19 High-price low-access user identification method based on big data Pending CN110675020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910764680.7A CN110675020A (en) 2019-08-19 2019-08-19 High-price low-access user identification method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910764680.7A CN110675020A (en) 2019-08-19 2019-08-19 High-price low-access user identification method based on big data

Publications (1)

Publication Number Publication Date
CN110675020A true CN110675020A (en) 2020-01-10

Family

ID=69075499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910764680.7A Pending CN110675020A (en) 2019-08-19 2019-08-19 High-price low-access user identification method based on big data

Country Status (1)

Country Link
CN (1) CN110675020A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275576A (en) * 2020-01-19 2020-06-12 烟台海颐软件股份有限公司 Identification method and identification system for abnormal electricity price execution user
CN111539843A (en) * 2020-04-17 2020-08-14 国网新疆电力有限公司电力科学研究院 Data-driven intelligent early warning method for preventing electricity stealing
CN113392910A (en) * 2021-06-17 2021-09-14 国网江西省电力有限公司供电服务管理中心 Multi-dimensional intelligent analysis experience algorithm and classification algorithm for judging default electricity utilization and electricity stealing
CN115241980A (en) * 2022-09-19 2022-10-25 国网江西省电力有限公司电力科学研究院 System and method for checking power supply radius of distribution network area based on unmanned aerial vehicle front end identification

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275576A (en) * 2020-01-19 2020-06-12 烟台海颐软件股份有限公司 Identification method and identification system for abnormal electricity price execution user
CN111539843A (en) * 2020-04-17 2020-08-14 国网新疆电力有限公司电力科学研究院 Data-driven intelligent early warning method for preventing electricity stealing
CN113392910A (en) * 2021-06-17 2021-09-14 国网江西省电力有限公司供电服务管理中心 Multi-dimensional intelligent analysis experience algorithm and classification algorithm for judging default electricity utilization and electricity stealing
CN115241980A (en) * 2022-09-19 2022-10-25 国网江西省电力有限公司电力科学研究院 System and method for checking power supply radius of distribution network area based on unmanned aerial vehicle front end identification

Similar Documents

Publication Publication Date Title
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
Cody et al. Decision tree learning for fraud detection in consumer energy consumption
CN110675020A (en) High-price low-access user identification method based on big data
Zidi et al. Theft detection dataset for benchmarking and machine learning based classification in a smart grid environment
CN105117810A (en) Residential electricity consumption mid-term load prediction method under multistep electricity price mechanism
CN111738462B (en) Fault first-aid repair active service early warning method for electric power metering device
CN111008193B (en) Data cleaning and quality evaluation method and system
CN116681187B (en) Enterprise carbon quota prediction method based on enterprise operation data
Schirmer et al. Residential energy consumption prediction using inter-household energy data and socioeconomic information
CN115905319B (en) Automatic identification method and system for abnormal electricity fees of massive users
CN110852621A (en) Power customer load characteristic analysis and classification method, device and readable storage medium
Sankari et al. Detection of non-technical loss in power utilities using data mining techniques
CN115147242A (en) Power grid data management system based on data mining
CN114595952A (en) Electricity stealing behavior detection method based on attention network improved convolutional neural network
CN114004530A (en) Enterprise power credit score modeling method and system based on sequencing support vector machine
Poudel et al. Artificial intelligence for energy fraud detection: a review
CN112016631A (en) Improvement scheme related to low-voltage treatment
Peiyi et al. Analysis and research on enterprise resumption of work and production based on K-means clustering
Yu et al. Research on risk identification of power theft in low-voltage distribution network based on machine learning hybrid algorithm
Raju et al. Application of Machine Learning Algorithms for Short term Load Prediction of Smart grid
Liu et al. Research on Overload Warning Method of Distribution Network Transformer Based on Neural Network
Somaratna et al. Which is better for inflation forecasting? Neural networks or data mining
Li Statistical and probabilistic models for smart electricity distribution networks
Coelho et al. HyMO-RF: Automatic Hyperparameter Tuning for Energy Theft Detection Based on Random Forest Classification
Tian et al. Research on Scenario-based Intelligent Inspection Mode Based on Big Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110

RJ01 Rejection of invention patent application after publication