CN111242391B - Machine learning model training method and system for power load identification - Google Patents

Machine learning model training method and system for power load identification Download PDF

Info

Publication number
CN111242391B
CN111242391B CN202010152521.4A CN202010152521A CN111242391B CN 111242391 B CN111242391 B CN 111242391B CN 202010152521 A CN202010152521 A CN 202010152521A CN 111242391 B CN111242391 B CN 111242391B
Authority
CN
China
Prior art keywords
data
power
training
model
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010152521.4A
Other languages
Chinese (zh)
Other versions
CN111242391A (en
Inventor
李波
张林山
罗永睦
周年荣
曹敏
王浩
轩辕哲
朱全聪
利佳
邹京希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power Grid Co Ltd filed Critical Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority to CN202010152521.4A priority Critical patent/CN111242391B/en
Publication of CN111242391A publication Critical patent/CN111242391A/en
Application granted granted Critical
Publication of CN111242391B publication Critical patent/CN111242391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

According to the machine learning model training method and system for identifying the power load, based on actually measured electrical parameter data, the basic electrical parameter data is unified in format, trained and input into a neural network model to be continuously optimized, parameters of the model are continuously adjusted in a verification data set to select an optimal model, meanwhile, the performance of the model is evaluated by utilizing a test data set to achieve an optimal effect, the model is further used for an identification system of the power load, the method can train the model according to the input sampling data to identify the electric equipment which is in use according to waveform sampling data of specific voltage, current and active power, and the like, therefore the method does not need to manually adjust parameters and conduct feature extraction, can learn autonomously and automatically obtain feature parameters required for identifying the power load, the application range of the model is improved, and the accuracy of identifying the power load is improved.

Description

Machine learning model training method and system for power load identification
Technical Field
The application relates to the technical field of power load monitoring, in particular to a machine learning model training method and system for power load identification.
Background
The power load is characterized by the law that the active power and reactive power extracted by the power load from the power supply of the power system change along with the voltage of the load endpoint and the change of the system frequency; the power load characteristics are an important component of the power system; the identification of the electric equipment through the power load characteristics plays an important role in the development of smart grid technology.
The most common methods of power load identification are invasive and non-invasive identification methods. The method can directly obtain the measurement data of the load, but has the advantages of high installation cost, complex installation process and relatively difficult maintenance; while the non-invasive identification method only needs to install monitoring equipment at the total inlet of the power supply to decompose, monitor and identify each load in the whole system. Specifically, the non-invasive identification method is based on the extraction and identification of the electrical load imprinting characteristics; the electric appliance load imprinting characteristics can reflect unique information of electric equipment in operation, such as voltage, active waveforms, starting current and the like; these load signature features are repeated during operation of the device, and based on this, the consumer can be identified.
The design and extraction of the load imprinting features are the main difficulties of the whole method; feature designs typically employ relatively simple current, voltage, active and reactive power stability/transient features and combinations thereof. The more complex method considers the time domain characteristics of the related signals, also considers the frequency domain characteristics, extracts valuable signal characteristics through mathematical tools such as Fourier transformation, wavelet transformation, harmonic analysis and the like, establishes a matching database, and matches load data acquired in real time according to the valuable signal characteristics, so as to detect electricity consumption conditions. However, manually designed signal characteristics require manual parameter adjustment, have the problem of low complexity and dimension, and often cannot fully reflect the inherent characteristics of signals and improve the resolution ratio between different signals; meanwhile, the traditional matching algorithm such as time domain waveform matching, characteristic point matching, spectrum analysis and other methods are low in matching accuracy, so that the electrical load identification accuracy is low, and the practical application effect is not ideal.
Disclosure of Invention
The application provides a machine learning model training method and system for power load identification, which are used for solving the technical problem that the power load identification accuracy is low due to the fact that manually-designed signal characteristics in the existing method are required to manually adjust parameters.
In order to solve the technical problems, the embodiment of the application discloses the following technical scheme:
in a first aspect, the present application provides a machine learning model training method for power load identification, the method comprising:
acquiring a historical electrical parameter data set of each electrical appliance;
cleaning the historical electrical parameter data sets of the electrical appliances;
dividing the cleaned historical electrical parameter data set of the single electric appliance into a training set, a verification set and a test set according to the proportion;
performing balancing treatment on the training set to obtain a new training set;
training parameters of a machine learning model of the target electric appliance by using the balanced new training set, and obtaining an optimized model according to the training set and the testing set;
and collecting current data of the power load of the user, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance, and outputting a class result of the power load.
Optionally, the performing balancing processing on the training set to obtain a new training set includes:
dividing a training set, a verification set and a test set from the cleaned user data fragment set according to a certain proportion;
dividing the data segments into two types according to the standard of whether the data segments contain the electric appliance on state or not;
presetting a target duty ratio p% of a data segment containing an on state, performing oversampling treatment on all the segments, and randomly copying the segments for multiple times to increase the duty ratio to p%;
the augmented and recombined data set is used as a new training set.
Optionally, the acquiring the historical electrical parameter data set of each electrical appliance includes:
the data format of each original public data set is unified to form a public data set, and the current public data sets are integrated and summarized to obtain a first data set;
installing an electric energy meter at a general incoming line end of a user to obtain electric parameters of the total and single electric loads in one or more spaces to obtain a second data set;
and summarizing the first data set and the second data set to obtain historical electrical parameter data sets of all the electric appliances.
Optionally, the unifying the data formats of the original public data sets to form the public data set includes:
extracting information in each original public data set, wherein the information comprises a time stamp, voltage, current, active power and reactive power;
generating a row of data by taking the time stamps as marks, wherein the data comprises voltage, current, active power and reactive power;
the metadata description file is used to describe the circuit, the power load represented by each original common data set.
Optionally, said cleaning said historical electrical parameter data set of each appliance includes: unification of data formats, downsampling to a specified frequency, voltage normalization.
Optionally, the unifying the data formats includes:
according toConverting active powerIs [0,1]Numerical values in between, wherein:
S[i]representing the sampled value, i.e. instantaneous active power, C being the type of electrical load, sa being the sample data, s c Is the active power of the electrical load c.
Optionally, the downsampling to a specified frequency includes:
if the sampling rate is lower than 1Hz, recording according to the original sampling rate;
if the sampling rate is higher than 1Hz, the sampling rate is downsampled to 1Hz;
wherein downsampling the sampling rate to 1Hz comprises:
discarding all other sampling values within 1 second by using the values of the sampling points at intervals of 1 second;
calculating an average value of original sampling points in adjacent 1 second as a 1 second boundary data value;
the median value of the original sampling points within 1 second is calculated as the 1 second boundary data value.
Optionally, the voltage normalization includes:
according toNormalizing the voltages to the same fluctuation range, wherein:
Power normalised representing the normalized Power value, power representing the measured Power value, voltagenal representing the nominal voltage value, voltageobserved representing the measured voltage value.
Optionally, the training parameters of the machine learning model of the target electric appliance by using the new balanced training set, and obtaining the optimization model according to the training set and the testing set, includes:
training parameters of a machine learning model by using a new training set of target electric appliances;
verifying and testing different models obtained in different training stages on a verification set until the effect is best as a corresponding model of the target electric appliance;
and performing performance test on the corresponding model of the target electric appliance by using the test set until the performance is optimal to obtain an optimized model of the target electric appliance.
In a second aspect, based on the above-mentioned machine learning model training method for power load identification, the present application further provides a machine learning model training system for power load identification, where the system includes:
the data acquisition module is used for acquiring historical electrical parameter data sets of all the electrical appliances;
the data cleaning module is used for cleaning the historical electrical parameter data sets of all the electrical appliances;
the data dividing module is used for dividing the cleaned historical electrical parameter data set of the single electric appliance into a training set, a verification set and a test set according to the proportion;
the data balancing module is used for carrying out balancing treatment on the training set to obtain a new training set;
the model optimization module is used for training parameters of a machine learning model of the target electric appliance by using the new balanced training set, and obtaining an optimization model according to the training set and the testing set;
the power load type output module is used for collecting current data of the power load of the user, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance and outputting a type result of the power load.
Compared with the prior art, the application has the beneficial effects that:
according to the technical scheme, the machine learning model training method and the system for identifying the electric load are based on the actually measured electrical parameter data including current, voltage, active power, reactive power and the like, the basic electrical parameter data are uniformly formatted, trained and input into the machine model to be continuously optimized, parameters of the model are continuously adjusted in a verification set to select an optimal model, and meanwhile, the performance of the model is evaluated by utilizing a test set to achieve an optimal effect, so that the model is further used for an identification system of the electric load.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a machine learning model training method for power load identification according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a training system for a machine learning model for power load identification according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a regional monitoring total ammeter and a regional ammeter according to an embodiment of the present application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
The power load imprinting features can reflect unique information reflecting the power utilization state, such as voltage, waveform of active power, current and the like, of one electric equipment in operation, and the load imprinting can repeatedly appear in the operation process of the equipment, so that each electric equipment can be identified.
The application provides a machine learning model training method for power load identification, and particularly relates to a figure 1, wherein the figure 1 is a flow diagram of the machine learning model training method for load identification, and the method comprises the following steps:
s110: a historical electrical parameter dataset of each appliance is obtained.
We propose to use the sampling, cleaning of basic electrical parameter data (including current, voltage, active power, reactive power) (as shown in table one), in combination with tuning of model parameters (parameters including but not limited to weights, kernel functions, dimensions, learning rate, regularization, etc.), to implement training of machine learning models, and store the trained models in a specific, unified file format, facilitating the models to be further used in recognition systems for electrical loads. In the training process, the system does not need to manually design the characteristic parameters of the power load, so that the application range of the model is improved, the accuracy of load identification is improved, and the utilization efficiency of basic electrical parameter data is increased.
Form one basic electrical parameter data
The acquiring the historical electrical parameter data set of each electrical appliance comprises the following steps:
the data format of each original public data set is unified to form a public data set, and the current public data sets are integrated and summarized to obtain a first data set;
installing an electric energy meter at a general incoming line end of a user to obtain electric parameters of the total and single electric loads in one or more spaces to obtain a second data set;
and summarizing the first data set and the second data set to obtain historical electrical parameter data sets of all the electric appliances.
The original public data set adopted in the embodiment of the application comprises a reference energy decomposition data set, a building-level electronic aggregation data set of data from a single family, a Smart data set of family total power data and high-frequency low-frequency data from three families, and a Pecan Street data set of summarized and divided metering power data from ten families. Since these common data sets originate from different universities and institutions, the reading and identification of each data set results in the use of disparate data formats, and thus the data formats of the original common data sets need to be unified. The specific data format unification steps are as follows:
extracting the following information in each data set, wherein the information comprises specific data such as time stamps, voltages, currents, apparent power, reactive power and the like;
taking the time stamp as a mark, and generating a row of data by each new time stamp, wherein the data comprises voltage, current, apparent power and reactive power;
the time stamp interval is 1 second, downsampling is carried out on the data set with the sampling interval lower than 1 second in the original data set, and a time stamp sequence with the interval of 1 second is obtained;
metadata description files (metadata) are used to describe the circuits, appliances, and their total scores, connection relationships represented by the respective data sets.
The real-time monitoring task is specifically performed on an area in a certain space, the space is monitored by integral power load and single electric equipment, and a house is taken as an example, a television, a kettle, a microwave oven, a refrigerator, an air conditioner, a washing machine and the like belong to single electric equipment, and specifically referring to fig. 3, fig. 3 is a schematic diagram of a total electric meter and a branch electric meter for area monitoring provided by the embodiment of the application.
Since in the transient and steady state signals of the total load obtained by data measurement there are several cases of measurement errors: firstly, the inconsistency of the measuring devices, namely, for the same electric equipment, different measuring devices have different measured values; secondly, the sensor can cause data loss in the process of compressing and transmitting the original data. Because data acquisition and transmission can cause data deviation or loss, the data is necessary to be processed to improve the noise resistance of the load identification method; thirdly, the influence of the period of data sampling on load identification is researched, and the balance point of the data sampling overhead and the system modeling complexity is discussed.
The collected data set is the basic electrical parameter data record of energy consumption, various electric appliances in a room or space are monitored by using a plurality of sets of monitoring instruments and equipment within a certain time range, and a low-frequency data collection mode and a high-frequency data collection mode are adopted. The acquisition frequency of the low frequency acquisition mode is 1Hz, and the acquisition frequency of the high frequency can reach 10kHz to 100kHz. The low-frequency signal is mainly extracted aiming at the load steady-state characteristic, and the high-frequency signal can obtain the load transient characteristic and the high-frequency harmonic characteristic. In general, the high-frequency signal can include more load electricity utilization characteristics, which is beneficial to training of a model and improvement of accuracy, but also puts higher demands on data acquisition, transmission, compression and processing capacity, and improves the complexity of the system. In the whole research process, the complexity and the accuracy of the system need to be selected and changed according to actual conditions.
In addition, need use single-phase multi-functional electric energy meter to monitor, this table adopts all circuit breakers on the RS485 remote link monitoring control panel, and data acquisition inquires once every minute and in time links data real-time acquisition server, and we can adopt single-phase multi-functional electric energy meter to cross the comparison electric current, voltage, power, and this equipment's advantage is: DIN35MM guide rail installation, has the characteristic of convenient assembly and disassembly; the communication rate can reach 9600bps, and the transmission rate is high; six paths of switching value input and output are adopted, so that the requirements on the input and output of measurement data are met; the DSP chip can be reconstructed and developed according to actual requirements, and the requirements of experimental environments are met.
In the system, the first step is to extract and sample the power load characteristics, which is also the key point of the system, through long-term monitoring of total voltage, current, power and divided voltage, current and power in a space area, a data set of real power load characteristics is obtained, through detailed information provided in the data set, we can compare the actual power (P) with the use of current (I), since the voltage loss comes from the voltage difference between the head end and the tail end of the line (which is related to the cross-sectional area and the size of the wire), when the fundamental voltage reaches the extreme value, the voltage is reduced, and therefore, a method of correcting the power factor by using a capacitor can be used, and the power factor cos (theta) is the ratio between the actual power (P) and the apparent power (S) in the circuit. The power formula:
S=I·V;
P=S·COS(θ),
where θ is the angle between the voltage (V) and the current (I). The power factor is 1 when the voltage and current are in phase, and zero when the current leads or lags the voltage.
We found that the actual power readings have a high degree of ripple (up to 10 times) compared to the current. This is because the meter uses two sensor readings that can be independently fluctuated to measure the actual power, measure the current on the same wire as the load, and measure the voltage on the circuit breaker power panel. When measuring the voltage at the top and bottom of the circuit breaker power panel, a significant voltage drop problem can occur. This means that the further from the transformer the reading is less accurate if the voltage at a single point is measured. This can result in less accurate power readings when calculating the associated power. In addition, the resistivity (R) of the load may also have problems with voltage drop due to factors such as wire size, but the current is not affected by these problems and does not drop due to these factors, from which we conclude that using current readings can better determine the load state from historical data, as well as accuracy with high precision.
S120: the historical electrical parameter data set of each appliance is cleaned.
After the information of the power load data is obtained, the power load data cleaning is a core technology of the system and the method, and mainly comprises automatic screening and abnormal data cleaning, noise identification and separation, downsampling, discarding rate, data normalization, missing data compensation and processing, top-k, error data elimination and the like.
The system needs to create a data CSV file, and the work of cleaning the data set is completed by deleting incomplete data, such as the situation that partial instrument and equipment data are incomplete or lost due to different time stamps. After creating the data set and generating the CSV file import, the data will reside in our memory data structure, which can be used throughout the training process. To address the problem of non-uniformity of data formats from different data sets, several preprocessing tasks are required.
The DSP single-phase multifunctional electric energy meter is used for collecting overall basic electric parameter data (including voltage, current, active power, reactive power and the like), and the sampling data in the same parameter are arranged in time sequence. Taking active power as an example, p i Representing the degree of voltage detected by the electric energy meter at the ith sampling moment, and arranging the originally acquired data into an active power sequence P according to time sequence active ={p 1 ,p 2 ,…p i …, forms the output of the electrical load characteristic data sampling module, and at the same time serves as the input of the electrical load data cleaning module.
In the data cleaning module, the electrical time sequence data is downsampled to the designated frequency f according to the actual application requirement, and the downsampled active power data sequence is P 'under the assumption that the frequency f corresponds to an interval period of 5 sampling points in the original sampled data' active ={p 1 ,p 6 ,p 11 ,…p 5i+1 ,…}={q 1 ,q 2 ,…q i …. And performing other preprocessing operations such as normalization on the obtained relatively low-frequency time sequence data to generate continuous time sequence data which can be directly used as training by the model. Because the electric load characteristic recognition model receives a plurality of electric parameter inputs at the same time, but the input dimension of a specific electric parameter is limited and is a fixed length n, all continuous time sequence data of the electric parameter cannot be input at one time, a sliding window with the length n is adoptedContinuously sliding in a step length l, and intercepting a data subsequence with a fixed length n in the electrical parameter sequence, such as an active power subsequence P i ={q in+1 ,q in+2 ,…q in+n And the power load characteristic identification module is used as an input of the power load characteristic identification module.
The method comprises the steps that different electrical appliances are provided with independent corresponding deep neural network models in a load characteristic identification module, data subsequences can be simultaneously input into the deep neural network models corresponding to the different electrical appliances, n sampling values of each electrical parameter in the same time period correspond to n corresponding nodes of a deep neural network input layer, and an mth target electrical appliance working state sequence S in the data subsequences is output through forward propagation operation of the network m,i ={s m,in+1 ,s m,in+2 ,…s m,in+n }. The state sequence length is n, but not limited to n, and is determined by the model architecture. All the kinds of electric appliances in the sequence can be judged based on the working state of each electric appliance.
The method specifically comprises the following steps:
(1) Unified format
Because the formats of the original data sets are not uniform, the method needs to extract the characteristics of each data set for evaluation, and in order to avoid great interference to judgment due to great difference of consumed power of different electric appliances, the data needs to be cleaned and normalized, namely, converted into [0,1 ]]Values between S [ i ]]Representing the sampled value, i.e. instantaneous active power, C being the type of electrical load, sa being the sample data, s c For the active power of the electrical load c, the formula is as follows:
(2) Downsampling
The sampling rate of the device monitor is between 0.008Hz and 16kHz in the data set so the system will downsample the data set to a specified frequency using an aggregate function such as average, mode and median.
If the sampling rate is lower than 1Hz, recording is carried out according to the original sampling rate.
If the sampling rate is higher than 1Hz, the sampling rate is downsampled to 1Hz. The specific downsampling method comprises the following steps:
1. discarding all other sampling values within 1 second by using the values of the sampling points at intervals of 1 second;
2. calculating an average value of original sampling points in adjacent 1 second as a 1 second boundary data value;
3. calculating the median value of original sampling points in adjacent 1 second as a 1 second boundary data value;
(3) Voltage normalization
Due to fluctuations in the acquired voltages, for example, the same dataset shows voltages varying from 180-250V, while the other dataset shows voltages varying in the range of 118-123V. The present system must take into account the effects of these voltage fluctuations, as they can significantly affect power consumption.
According toNormalizing the voltages to the same fluctuation range, wherein:
Power normalised representing the normalized Power value, power representing the measured Power value, voltagenal representing the nominal voltage value, voltageobserved representing the measured voltage value.
(4)Top-k
In general, our identification system targets the top k-bit (where k is an adjustable parameter) high-energy devices instead of all devices, because there are three points where, first, the top k-bit power consuming devices can already provide most of the reference information for the overall power consumption situation; second, these devices have the most prominent features, and the rest of the devices can be considered to generate noise only; third, modeling and identification for larger duty cycle power consuming devices can greatly improve the reliability of the data.
During the data cleansing process, the present system also solves other common problems of the data set, such as: the device sensor does not report readings, small data loss, removal of outliers such as observed voltages exceeding twice the rated voltage, loss of mains data, etc.
(5) Detecting gaps
Many algorithms today assume that the communication of each data acquisition device is continuous, however, in practice, it may happen that the data acquisition device is disconnected or fails, and if we set a parameter value, then a gap may be considered to exist in one continuous power data sample when the time of disconnection or failure is greater than the set parameter value. For example, we calculate the difference between the time stamps of adjacent samples and consider that there is a gap in the dataset if it is greater than a certain parameter, e.g., 10 seconds. For data sequences with gaps, the data sequences can not be directly used for a system training set and a test set, and all training data and test data sequences must be selected from data sequences with gaps in the middle.
(6) Normal run time
The uptime is the total time recorded by the sensor. It is the last timestamp, minus the first timestamp, minus the duration resulting from all existing gaps.
S130: the cleaned historical electrical parameter data set of the single electrical appliance is divided into a training set, a verification set and a test set according to the proportion.
In the present application, we divide the sample into three independent parts: training set (train set), validation set (validation set) and test set (test set). Wherein the training set is used to estimate the model, the validation set is used to determine parameters of the network structure or control the complexity of the model, and the test set is used to verify how well the optimal model was ultimately selected. We divided the training set, validation set and test set into 50% of the total samples and 25% of the other, all three randomly extracted from the power load samples we collected.
The training set is used to fit the model, and we will build the model using the data collected and cleaned from the actual electrical load.
The data set after power load acquisition and cleaning is input into a model for continuous learning, and the optimal parameters of the model are found through fitting.
The effect of the model only represents training data, but is not necessarily suitable for other data of the same class, so before modeling, the data is divided into two parts, one part is training data and the other part is verification data (the ratio of the two parts of data is approximately 7:3 or 6:4). Meanwhile, a plurality of models can be trained, and verification data can be input into different models for comparison to obtain an optimal model.
The biggest difference between the test set and the training and verification set is that the data of the training and verification set are used in the model training process, and the test set is only used as a reference basis for performance test after the model is finally trained.
The data set represented by the model is tested, and whether the model is good or not is judged according to errors (generally, the difference between the predicted output and the actual output), in the patent, the model is trained on a support vector machine, a decision tree, principal component analysis and an LSTM long-term memory network model, and meanwhile, the change of a loss value of the model and the convergence of the model are focused.
S140: and carrying out balancing treatment on the training set to obtain new training.
Because of the time imbalance of appliance usage (switch state), training a model directly with the original imbalance distribution data can affect its skewness, which in turn results in a model that is more prone to making the same decisions as most categories or states in the data set. The application adopts an innovative training data set balancing method to remarkably improve the performance of a machine learning model on power load identification, constructs a complete and commercially-realized model training system and method for power load identification together with innovative data acquisition, cleaning and other methods, realizes the training of a machine learning model for power load identification, automatically extracts power load characteristics through a large amount of training data by using a machine learning algorithm, and lays a foundation for further realizing the power load identification system.
The balancing processing is carried out on the training set to obtain a new training set, which comprises the following steps:
dividing a training set, a verification set and a test set from the cleaned user data fragment set according to a certain proportion;
dividing the data segments into two types according to the standard of whether the data segments contain the electric appliance on state or not;
presetting a target duty ratio p% of a data segment containing an on state, performing oversampling treatment on all the segments, and randomly copying the segments for multiple times to increase the duty ratio to p%;
the augmented and recombined data set is used as a new training set.
The application provides a data set balancing method, which has the core concept that a training set is subjected to augmentation and recombination, a load data segment containing an electric appliance on state is subjected to oversampling (oversampling) treatment, namely, all the data segments containing the electric appliance on state in the training set are copied for multiple times and are integrated into an original training set, so that the data segment containing the on state accounts for p percent, the data segment is taken as a new training set, the data segment is continuously extracted from the training set to be input into a model, and the iterative optimization is performed on model parameters through an optimization algorithm. The distribution unbalance problem of the power load data set has non-negligible influence on almost all non-inert machine learning models, so that the balancing method is applicable to various machine learning models, and the identification accuracy is obviously improved.
S150: and training parameters of a machine learning model of the target electric appliance by using the balanced new training set, and obtaining an optimized model according to the training set and the testing set.
S160: and collecting current data of the power load of the user, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance, and outputting a class result of the power load.
The method comprises the steps that different electrical appliances are provided with independent corresponding deep neural network models in a load characteristic identification module, data subsequences can be simultaneously input into the deep neural network models corresponding to the different electrical appliances, n sampling values of each electrical parameter in the same time period correspond to n corresponding nodes of a deep neural network input layer, and an mth target electrical appliance working state sequence S in the data subsequences is output through forward propagation operation of the network m,i ={s m,in+1 ,s m,in+2 ,…s m,in+n }. State sequenceThe column length is n, but is not limited to n, determined by the model architecture. All the kinds of electric appliances in the sequence can be judged based on the working state of each electric appliance.
We input into the power load identification module the basic electrical parameter data (including current, voltage, active power, reactive power) processed by the aforementioned power load acquisition and cleaning module as input data, with the objective of deducing the device class of each consumer from the electrical data.
The data and identification method are specifically as follows:
selecting data base electrical parameter data (including current, voltage, active power and reactive power) of a total circuit;
selecting the data from 1 minute to 5 minutes, taking a sampling rate of 1Hz as an example, and 60 to 300 sampling points are provided;
processing the data into a plurality of sequences with 16 data points as one sequence, wherein sequences with gaps are eliminated;
loading a machine learning model program for power load identification trained in advance, which can be realized by any existing machine learning or neural network programming framework, such as tensor flow, keras and the like;
the data sequence is input into a machine learning model, and the identification module outputs current device class data (expressed in integers, such as 1=air conditioner, 2=water kettle, 3=washing machine, 4=electric lamp, etc.) corresponding to each time sampling point.
The basic electrical parameter data processed by the power load acquisition and cleaning module comprise current, voltage, active power and reactive power as input data, and the aim is to infer the equipment category of each electric equipment through the electrical data.
The data and identification method are specifically as follows:
selecting data base electrical parameter data of the total circuit, such as current, voltage, active power and reactive power;
selecting the data from 1 minute to 5 minutes, taking a sampling rate of 1Hz as an example, and 60 to 300 sampling points are provided;
processing the data into a plurality of sequences with 16 data points as one sequence, wherein sequences with gaps are eliminated;
loading a machine learning model program which is trained in advance and used for power load identification can be realized by any existing machine learning or neural network programming framework, such as tensor flow, keras and the like;
the data sequence is input into a machine learning model, and the identification module outputs current device class data (expressed in integers, such as 1=air conditioner, 2=water kettle, 3=washing machine, 4=electric lamp, etc.) corresponding to each time sampling point.
Based on the inventive concept of the method, the embodiment also provides a machine learning model training system for power load identification, as shown in fig. 2, fig. 2 is a schematic structural diagram of the machine learning model training system for power load identification according to the embodiment of the application; the system comprises:
the data acquisition module is used for acquiring historical electrical parameter data sets of all the electrical appliances;
the data cleaning module is used for cleaning the historical electrical parameter data sets of all the electrical appliances;
the data dividing module is used for dividing the cleaned historical electrical parameter data set of the single electric appliance into a training set, a verification set and a test set according to the proportion;
the data balancing module is used for carrying out balancing treatment on the training set to obtain a new training set;
the model optimization module is used for training parameters of a machine learning model of the target electric appliance by using the new balanced training set, and obtaining an optimization model according to the training set and the testing set;
the power load type output module is used for collecting current data of the power load of the user, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance and outputting a type result of the power load.
According to the technical scheme, the machine learning model training method and the system for identifying the electric load are based on the actually measured electrical parameter data including current, voltage, active power, reactive power and the like, the basic electrical parameter data are uniformly formatted, trained and input into the machine model to be continuously optimized, parameters of the model are continuously adjusted in a verification set to select an optimal model, and meanwhile, the performance of the model is evaluated by utilizing a test set to achieve an optimal effect, so that the model is further used for an identification system of the electric load.
Since the foregoing embodiments are all described in other modes by reference to the above, the same parts are provided between different embodiments, and the same and similar parts are provided between the embodiments in the present specification. And will not be described in detail herein.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure of the application herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The embodiments of the present application described above do not limit the scope of the present application.

Claims (8)

1. A machine learning model training method for power load identification, the method comprising:
the data format of each original public data set is unified to form a public data set, and the current public data sets are integrated and summarized to obtain a first data set;
installing an electric energy meter at a general incoming line end of a user to obtain electric parameters of the total and single electric loads in one or more spaces to obtain a second data set;
summarizing according to the first data set and the second data set to obtain historical electrical parameter data sets of all the electric appliances;
cleaning the historical electrical parameter data sets of the electrical appliances;
dividing the cleaned historical electrical parameter data set of the single electric appliance into a training set, a verification set and a test set according to the proportion;
performing balancing treatment on the training set to obtain a new training set;
training parameters of a machine learning model of the target electric appliance by using the balanced new training set, and obtaining an optimized model according to the training set and the testing set;
collecting current data of a user power load, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance, and outputting a class result of the power load;
the unifying the data formats of the original public data sets to form the public data set includes:
extracting information in each original public data set, wherein the information comprises a time stamp, voltage, current, active power and reactive power;
generating a row of data by taking the time stamps as marks, wherein the data comprises voltage, current, active power and reactive power;
the metadata description file is used to describe the circuit, the power load represented by each original common data set.
2. The machine learning model training method for power load identification of claim 1, wherein said balancing the training set to obtain a new training set comprises:
dividing a training set, a verification set and a test set from the cleaned user data fragment set according to a certain proportion;
dividing the data segments into two types according to the standard of whether the data segments contain the electric appliance on state or not;
presetting a target duty ratio p% of a data segment containing an on state, performing oversampling treatment on all the segments, and randomly copying the segments for multiple times to increase the duty ratio to p%;
the augmented and recombined data set is used as a new training set.
3. The machine learning model training method for electrical load identification of claim 1, wherein said cleaning said historical electrical parameter data set of individual appliances comprises: unification of data formats, downsampling to a specified frequency, voltage normalization.
4. A machine learning model training method for electrical load identification as claimed in claim 3, wherein the unification of the data formats comprises:
according toConversion of active power to [0,1 ]]Numerical values in between, wherein:
S[i]representing the sampled value, i.e. instantaneous active power, C being the type of electrical load, sa being the sample data, s c Is the active power of the electrical load c.
5. A machine learning model training method for electrical load identification as claimed in claim 3, wherein said downsampling to a specified frequency comprises:
if the sampling rate is lower than 1Hz, recording according to the original sampling rate;
if the sampling rate is higher than 1Hz, the sampling rate is downsampled to 1Hz;
wherein downsampling the sampling rate to 1Hz comprises:
discarding all other sampling values within 1 second by using the values of the sampling points at intervals of 1 second;
calculating an average value of original sampling points in adjacent 1 second as a 1 second boundary data value;
the median value of the original sampling points within 1 second is calculated as the 1 second boundary data value.
6. A machine learning model training method for electrical load identification as claimed in claim 3, wherein the voltage normalization comprises:
according toNormalizing the voltages to the same fluctuation range, wherein:
Power normalised representing normalized Power value, power representing measured Power value, voltage nominal Representing nominal Voltage value, voltage observed Representing the measured voltage value.
7. The machine learning model training method for power load identification of claim 1, wherein training parameters of the machine learning model of the target appliance using the balanced new training set, and obtaining the optimization model according to the training set and the test set, comprises:
training parameters of a machine learning model by using a new training set of target electric appliances;
verifying and testing different models obtained in different training stages on a verification set until the effect is best as a corresponding model of the target electric appliance;
and performing performance test on the corresponding model of the target electric appliance by using the test set until the performance is optimal to obtain an optimized model of the target electric appliance.
8. A machine learning model training system for power load identification, the system comprising:
the data acquisition module is used for unifying the data formats of the original public data sets to form a public data set, and integrating and summarizing the current public data set to obtain a first data set; installing an electric energy meter at a general incoming line end of a user to obtain electric parameters of the total and single electric loads in one or more spaces to obtain a second data set; summarizing according to the first data set and the second data set to obtain historical electrical parameter data sets of all the electric appliances;
the data cleaning module is used for cleaning the historical electrical parameter data sets of all the electrical appliances;
the data dividing module is used for dividing the cleaned historical electrical parameter data set of the single electric appliance into a training set, a verification set and a test set according to the proportion;
the data balancing module is used for carrying out balancing treatment on the training set to obtain a new training set;
the model optimization module is used for training parameters of a machine learning model of the target electric appliance by using the new balanced training set, and obtaining an optimization model according to the training set and the testing set;
the power load class output module is used for collecting current data of the power load of the user, inputting the current data into an optimization model of each target electric appliance, separating out the working state of the electric appliance and outputting a class result of the power load;
the unifying the data formats of the original public data sets to form the public data set includes:
extracting information in each original public data set, wherein the information comprises a time stamp, voltage, current, active power and reactive power;
generating a row of data by taking the time stamps as marks, wherein the data comprises voltage, current, active power and reactive power;
the metadata description file is used to describe the circuit, the power load represented by each original common data set.
CN202010152521.4A 2020-03-06 2020-03-06 Machine learning model training method and system for power load identification Active CN111242391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010152521.4A CN111242391B (en) 2020-03-06 2020-03-06 Machine learning model training method and system for power load identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010152521.4A CN111242391B (en) 2020-03-06 2020-03-06 Machine learning model training method and system for power load identification

Publications (2)

Publication Number Publication Date
CN111242391A CN111242391A (en) 2020-06-05
CN111242391B true CN111242391B (en) 2023-10-31

Family

ID=70873388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010152521.4A Active CN111242391B (en) 2020-03-06 2020-03-06 Machine learning model training method and system for power load identification

Country Status (1)

Country Link
CN (1) CN111242391B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111830350B (en) * 2020-07-23 2022-02-11 珠海格力电器股份有限公司 Energy consumption metering method and device and electric appliance
CN112649644B (en) * 2020-12-22 2021-06-29 常州常工电子科技股份有限公司 Student apartment electricity utilization safety load learning method
CN112801115B (en) * 2021-01-26 2022-09-06 广西电网有限责任公司电力科学研究院 Power supply fluctuation control method based on microgrid source-load comprehensive characteristic image and application
CN113095425A (en) * 2021-04-21 2021-07-09 广东电网有限责任公司 Method, device, equipment and medium for automatically judging and deciding on site start-up conditions
CN113704780B (en) * 2021-07-16 2024-02-06 国网浙江省电力有限公司杭州供电公司 Power distribution network user side information self-adaptive encryption method based on model driving
CN115018011B (en) * 2022-07-19 2022-11-29 深圳江行联加智能科技有限公司 Power load type identification method, device, equipment and storage medium
CN114977176B (en) * 2022-07-19 2022-11-18 深圳江行联加智能科技有限公司 Power load decomposition method, device, equipment and storage medium
CN115085274B (en) * 2022-07-27 2023-01-20 北京智芯微电子科技有限公司 Automatic identification method and device for new energy equipment access, electronic equipment and medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138669A1 (en) * 2011-11-28 2013-05-30 Bin Lu System and method employing a hierarchical load feature database to identify electric load types of different electric loads
US10325150B2 (en) * 2012-08-29 2019-06-18 Eaton Intelligent Power Limited System and method for electric load identification and classification employing support vector machine
US9576245B2 (en) * 2014-08-22 2017-02-21 O Power, Inc. Identifying electric vehicle owners
CN106845553B (en) * 2017-02-04 2019-11-05 武汉阿帕科技有限公司 Load characteristic identification mechanism, method and auto-power-off device
CN107944495A (en) * 2017-12-04 2018-04-20 贵州电网有限责任公司电力科学研究院 A kind of household electricity load classification recognition methods based on deep layer forest algorithm
CN108899892A (en) * 2018-06-08 2018-11-27 上海电力学院 One kind being based on CNN non-intrusion type power load decomposition method
CN109145949A (en) * 2018-07-19 2019-01-04 山东师范大学 Non-intrusive electrical load monitoring and decomposition method and system based on integrated study
CN109359665B (en) * 2018-08-28 2020-11-06 中国农业大学 Household appliance load identification method and device based on support vector machine
CN109387712B (en) * 2018-10-09 2021-04-13 厦门理工学院 Non-invasive load detection and decomposition method based on state matrix decision tree
CN110188826A (en) * 2019-05-31 2019-08-30 湖南科技大学 Household electrical appliance operating status non-invasive inspection methods based on intelligent electric meter data
CN110376457B (en) * 2019-06-28 2020-10-02 同济大学 Non-invasive load monitoring method and device based on semi-supervised learning algorithm
CN110514889A (en) * 2019-07-19 2019-11-29 浙江万胜智能科技股份有限公司 A kind of method and system of non-intrusion type household electricity remained capacity

Also Published As

Publication number Publication date
CN111242391A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111242391B (en) Machine learning model training method and system for power load identification
CN111382789B (en) Power load identification method and system based on machine learning
CN111026927B (en) Low-voltage transformer area running state intelligent monitoring system
CN107025365B (en) A kind of non-intruding load discrimination method for user side
CN108021736B (en) Load switching action monitoring method based on sliding window residual error model
US9658261B2 (en) Current pattern matching method for non-intrusive power load monitoring and disaggregation
Yu et al. Nonintrusive appliance load monitoring for smart homes: Recent advances and future issues
WO2009061291A1 (en) Improvements in hierarchy determination for power monitoring systems
EP1913523A1 (en) Automated hierarchy classification in utility monitoring systems
CN107741577B (en) Gateway meter accuracy online monitoring and analyzing method and system
CN109767054A (en) Efficiency cloud appraisal procedure and edge efficiency gateway based on deep neural network algorithm
CN109284933B (en) Electronic transformer state evaluation system and method based on mathematical statistics
CN106093565A (en) A kind of electricity subentry measurement method and device based on steady state characteristic Waveform Matching
CN113791374B (en) Remote evaluation method for operation state of electric energy meter by element comparison
Zhang et al. An improved non-intrusive load monitoring method for recognition of electric vehicle battery charging load
CN113036786A (en) Low-voltage distribution transformer user phase sequence identification and three-phase imbalance adjustment method
CN109146265A (en) Three-phase imbalance source source tracing method based on Dynamic Programming Time Series Similarity
Hernandez et al. Development of a non-intrusive load monitoring (nilm) with unknown loads using support vector machine
CN103018611A (en) Non-invasive load monitoring method and system based on current decomposition
CN101707371A (en) Method for identifying equivalent parameters of power system load model under small disturbance condition
CN112182499A (en) Low-voltage distribution network topological structure identification method based on time sequence electric quantity data
CN111199014B (en) Time sequence based seq2point NILM method and device
CN110865328B (en) Intelligent electric meter phase identification, topology identification and impedance estimation method based on AMI
Li Abnormal energy consumption analysis based on big data mining technology
CN110658385B (en) Power consumption classification metering method based on complex trapezoid method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant