CN116304295A - User energy consumption portrait analysis method based on multivariate data driving - Google Patents

User energy consumption portrait analysis method based on multivariate data driving Download PDF

Info

Publication number
CN116304295A
CN116304295A CN202211630066.XA CN202211630066A CN116304295A CN 116304295 A CN116304295 A CN 116304295A CN 202211630066 A CN202211630066 A CN 202211630066A CN 116304295 A CN116304295 A CN 116304295A
Authority
CN
China
Prior art keywords
load curve
user
algorithm
index
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211630066.XA
Other languages
Chinese (zh)
Inventor
窦真兰
张春雁
孙沛
王永利
滕越
周含芷
袁博
王一诺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
North China Electric Power University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University, State Grid Shanghai Electric Power Co Ltd filed Critical North China Electric Power University
Priority to CN202211630066.XA priority Critical patent/CN116304295A/en
Publication of CN116304295A publication Critical patent/CN116304295A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A user energy image analysis method based on multi-element data driving comprises the following steps: performing dimension reduction on the load curve by using a time sequence symbol aggregation approximate SAX algorithm and extracting features; converting the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression into a multi-objective optimization problem based on a simulated annealing particle swarm algorithm; according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm; and according to the clustering result, analyzing the energy utilization behaviors of various users. According to the current energy utilization state of the user side, a reasonable image information acquisition algorithm and an improved AP clustering algorithm are adopted to mine effective information in energy utilization data, a user energy consumption behavior portraits set is constructed, and the user energy consumption behavior portraits set is applied to multi-element energy utilization behavior analysis of the user, so that the energy utilization characteristics of the user are mastered.

Description

User energy consumption portrait analysis method based on multivariate data driving
Technical Field
The invention relates to an analysis method, in particular to a user energy portrait analysis method based on multivariate data driving and application thereof.
Background
The user side resources are generally utilized in three modes of peak clipping, valley filling and accurate real-time load control, so that the investment of a power system can be slowed down, the balance of the load of a source network can be kept, new energy consumption can be promoted, and the risk of environmental accidents can be resisted. The intelligent energy taking the big data technology as the core can better grasp the user demands, can reasonably distribute the energy, ensure to meet the daily production life of the user, pay more attention to the experience of the user, and realize the complementary advantages among individual users. The method constructs a new model for changing the energy data into the public value of society, reasonably adjusts the energy supply and demand, and helps the industry to upgrade and the civilian development.
In the aspect of demand response of a user side, a plurality of domestic and foreign experts are used for developing the research, and make important contributions in discussing the demand optimization problem of the user side, so that the problems of weak participation will of the user, poor economy of the user side project, immature business mode and the like are discovered. In terms of user behavior feature analysis methods, common methods for data extraction include PCA evolutionary transformation methods, k-means algorithms, and the like. The technologies are analyzed by domestic and foreign experts, wherein the PCA evolution technology can realize mass analysis, save key data of original data, reduce dimensionality and improve clustering quality; the K-means method is simple and convenient, the success rate of clustering is good, and the expandability is strong. At present, scientific research mainly focuses on data analysis of comprehensive energy utilization behaviors of clients, but research and development of a data analysis model for comprehensive energy utilization behaviors of end users from the view of integration capability are still in an exploration stage. In order to effectively solve the problem of the current integrated energy system user side, the blank of the research in the direction is made up.
Disclosure of Invention
In order to solve the defects in the prior art, the invention discloses a user energy portrait analysis method based on multi-element data driving, which has the following technical scheme:
a method for analyzing user energy image based on multi-element data driving is characterized in that: the method comprises the following steps:
step 1: performing dimension reduction on the load curve by using a time sequence symbol aggregation approximate SAX algorithm and extracting features;
step 2, optimizing the extracted features by using an annealing particle swarm algorithm;
step 3: according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm;
step 4: and according to the clustering result, analyzing the energy utilization behaviors of various users.
The invention also discloses a nonvolatile storage medium, which is characterized in that the nonvolatile storage medium comprises a stored program, wherein the program controls equipment where the nonvolatile storage medium is located to execute the method when running.
The invention also discloses an electronic device which is characterized by comprising a processor and a memory; the memory has stored therein computer readable instructions, the processor is configured to execute the computer readable instructions, wherein the computer readable instructions execute the method described above when executed
The invention also discloses a user energy portrait analysis device based on the multi-element data driving, which is characterized in that: the device comprises the following modules:
the dimension reduction feature extraction module: the method is used for reducing the dimension of the load curve and extracting the characteristics by utilizing a time sequence symbol aggregation approximate SAX algorithm;
the simulated annealing particle swarm algorithm module is used for converting the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression into a multi-objective optimization problem based on a simulated annealing particle swarm algorithm;
and a cluster analysis module: according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm;
and the energy consumption analysis module for various users: according to the clustering result, analyzing the energy consumption behaviors of various users
Advantageous effects
According to the current energy utilization state of the user side, a reasonable image information acquisition algorithm and an improved AP clustering algorithm are adopted to mine effective information in energy utilization data, the effective information is applied to multi-element energy utilization behavior analysis of the user, energy utilization characteristics of the user are mastered, and a user energy consumption behavior portrait set is constructed.
Drawings
FIG. 1 is a flow chart of an improved AP clustering algorithm of the present invention.
FIG. 2 is a graph of a user dataset cluster center of the present invention.
Detailed Description
Example 1
The invention discloses a user energy image analysis method based on multi-element data driving, which comprises the following steps:
(1) Time sequence symbol aggregation approximation method based on particle swarm optimization
(1.1) principle of time-series symbol aggregation approximation algorithm
The time sequence symbol aggregation approximation (SAX) is a method for representing a continuous time sequence by using a symbolization method, and is a method for converting the time sequence into a character string, and the method has a better dimension reduction effect on a high-dimension sequence. The method comprises the following specific steps:
step one: will nThe dimensional time sequence is converted into a vector with w dimensions, and the original load curve X= [ X ] 1 ,x 2 K x n ]Using piecewise aggregation approximation, the data is piecewise approximated as w segments
Figure SMS_1
Wherein the i->
Figure SMS_2
The calculation formula of (2) is as follows:
Figure SMS_3
dividing the n-dimensional original time sequence vector into w segments to reduce to w-dimensional and x j Is the original load curve column vector;
Figure SMS_4
is the mean of the ith fragment; />
Figure SMS_5
Is the compression ratio.
Step two: the sequence data obtained through the Piecewise Aggregated Approximation (PAA) is symbolized to achieve each time series normalization, which is then converted into a Piecewise Aggregated Approximation (PAA) representation.
Figure SMS_6
Wherein,,
Figure SMS_7
is a subcolumn element of length n; alpha j Is the i-th element in the alphabet; beta j-1 、β j The j-1 and j probability values in the Gaussian distribution breakpoint list are respectively corresponding.
Step three: after the time sequence is dimension reduced, the problem of missing report easily occurs in the characteristic space inquiry. The following definition theory is applied to ensure no report missing, n-dimensional time sequences C and Q are converted into w-dimensional vectors in SAX, PAA expression is obtained, and a dimension reduction formula is substituted into Euclidean distance to obtain a distance measurement formula of the PAA:
Figure SMS_8
wherein,,
Figure SMS_9
q, C time series after dimension reduction, < >>
Figure SMS_10
Respectively->
Figure SMS_11
Figure SMS_12
Is the i-th element of (c). Further converting the data into a symbolic representation, defining a MINDIST function that returns the minimum distance between the original time sequences of the two words as:
Figure SMS_13
step four: there is an optimization direction, namely improving the lower bound compactness (Tightness of Lower Bound, TLB), expressed herein as:
Figure SMS_14
d (Q, C) represents the euclidean distance of the time series Q and C. Obviously, the TLB takes a value between 0 and 1, the closer the value is to 1, the closer the lower bound distance is to the true distance measure, i.e., the smaller the error.
(1.2) simulated annealing particle swarm algorithm
The particle swarm optimization algorithm is an optimization algorithm with a global optimization function based on a group. The optimal value is searched by adopting an iterative method, the system is initialized to a group of random solutions, and particles (potential solutions) are used for searching the optimal particle swarm in the solutions, but the particle swarm optimization method can generate a local extreme point phenomenon, so that the defects of slow convergence in the later period of evolution, poor precision and the like exist. In order to solve the problems of optimization calculation of the traditional particle swarm, a particle swarm algorithm based on simulated annealing is adopted, the algorithm maintains the unique global optimizing technology of the traditional particle swarm algorithm, is simple and convenient, and can effectively avoid the problems that the particle swarm algorithm falls into local extreme points and the like.
Based on a simulated annealing particle swarm algorithm, the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression is converted into a multi-objective optimization problem, and the objective function is as follows:
Figure SMS_15
wherein:
Figure SMS_16
Figure SMS_17
Figure SMS_18
2≤l≤l m (10)
2≤w≤w m (11)
wherein A is accuracy, and represents the characterization function of the segmented load curve to the original load curve; e is information quantity, the information entropy is used for measuring, the smaller the information entropy is, the greater the accuracy is when the existing signal is used for prediction, and the greater the information quantity is contained; r is reduction rate and reflects the compression degree of the original load curve.
Figure SMS_19
Values approximated for the section of the load curve PPA +.>
Figure SMS_20
And the original load curve X i Is related to (a)Coefficients. Due to the different dimensions +.>
Figure SMS_21
After spline interpolation, form a spline with X i And (3) carrying out correlation coefficient calculation on the sequences with the same dimension: p is p i For character i at X i The occurrence probability of (a) is determined; l (L) m Is the maximum number of characters, w m For the set maximum number of segments, take l herein m =w m μ is a weight coefficient for two parameterizations, here μ=0.5.
And (3) evaluating the algorithm effect through three indexes A, R, E, and when the comprehensive effect is optimal, obtaining the optimal load curve expression.
(2) Description of user energy consumption characteristic index based on optimized time sequence symbol aggregation approximation algorithm and AP clustering algorithm (2.1) of energy consumption characteristic index
In the process of processing the user energy data, a proper and proper feature extraction technology is adopted, so that an effective operation result can be ensured, and the calculated amount can be reduced. When data mining is carried out, the method has more definite physical significance on the data to be acquired, so that the method can help power enterprises to better study and process related data, and early warning, abnormal data analysis, demand side management and the like are realized by analyzing the energy consumption data. Meanwhile, through the key data characteristics acquired from the demand side, the discrete characteristics and the time domain characteristics acquired by utilizing a time sequence symbol aggregation approximation technology are combined, the dimension of the load curve is reduced, so that the internal meaning of the load curve is more efficiently and intuitively analyzed, and the load curve is more completely evaluated.
The user energy consumption characteristic index is a reflection of the internal rule of the load curve, and can rapidly and efficiently extract useful information in the high-dimensional load curve. The method comprises the steps of introducing 3 typical energy utilization characteristic indexes, namely energy utilization load level, energy utilization stability and energy utilization interaction capacity, selecting specific indexes comprising daily average load, daily load rate, peak-time energy consumption rate, valley electricity coefficient and the like as characteristic vectors, and clustering load curves. And taking the index as a main data feature vector, comprehensively reflecting the time domain and state characteristics of the load curve according to the discrete characteristics of SAX optimization, and taking the index as a clustering basis of the load curve. The index selections are shown in table 1.
Table 1 comprehensive energy system user energy performance index
Figure SMS_22
Figure SMS_23
(2.2) CRITIC weighting method
In order to avoid subjectivity of user energy utilization characteristic index setting, a CRITIC weighting method is adopted to evaluate contribution of each characteristic index to a clustering result, and the index weight of energy utilization characteristics is objectively determined. The basic idea is to comprehensively measure the objective weight of the index according to the contrast strength of the evaluation index and the conflict between indexes. Wherein the contrast intensity refers to the mean square error idea and characterizes the variability of the evaluation index. I.e. the larger the mean square value, the larger the amount of information the index contains; the conflict represents the relevance among different indexes, and if the correlation coefficient of 2 indexes is larger, the relevance is stronger, and the corresponding conflict is lower.
The CRITIC weighting method comprises the following specific steps of:
1) And (5) index normalization processing. And setting m evaluation objects and n evaluation indexes, and normalizing the different indexes by adopting a forward/reverse normalization method in view of different action trends of the different indexes on the final evaluation result.
The forward index is as shown in (12):
Figure SMS_24
the reverse index is shown as (13):
Figure SMS_25
wherein: i=1, 2,. -%, m;j=1,2,...,n;a ij a j-th index actual value representing an i-th user; b ij And (5) representing the j-th index value of the i-th user after normalization.
2) And calculating the correlation coefficient of the evaluation index matrix. The correlation coefficient can describe the conflict between the indexes, and if the two indexes have obvious positive correlation, the smaller the conflict is, the lower the weight is. The correlation coefficient is calculated as shown in formula (14):
Figure SMS_26
wherein: i=1, 2,. -%, n; j=1, 2,. -%, n; r is (r) ij Is the correlation coefficient between the ith index and the jth index.
3) Weights are calculated. The contrast intensity and the conflict of each evaluation index are calculated by using the obtained correlation coefficient matrix, as shown in the formula (15):
Figure SMS_27
wherein: j=1, 2,. -%, n; sigma (sigma) j Is the correlation coefficient between the ith index and the jth index.
Figure SMS_28
For the contrast intensity of the j-th index, +.>
Figure SMS_29
And a quantization index indicating the conflict between the jth index and other indexes. Based on the contrast intensity and the conflict of the indexes, the information quantity size contained in the indexes is calculated as shown in a formula (16):
Figure SMS_30
wherein G is j The larger the value is, the larger the information contained in the j index is, and the larger the weighting is.
Objective weight W of final jth index j The method comprises the following steps:
Figure SMS_31
(2.3) improving the AP clustering Algorithm
The AP clustering algorithm has the advantages of no need of specifying the number of clusters, quadratic error and minimum error of the clustering result, and the like, but the complexity of the algorithm is higher. In processing multidimensional data, a long time of calculation is often required. Therefore, the method improves the calculation speed of the AP clustering similarity matrix by selecting the discrete state quantity of the load curve and reducing the dimension of the load curve by using the energy characteristic index, and adjusts the deviation parameter so as to improve the clustering efficiency.
1) Improving similarity matrix
s(i,j)=-[αd dij +(1-α)d tij ]i≠j (18)
Figure SMS_32
Wherein s (i, j) is an element that improves the similarity matrix; d, d dij And d tij The distance between the discrete state characteristic d and the energy utilization characteristic index t of the load curve i and the load curve j after SAX calculation is respectively represented by the Euclidean distance; alpha is a characteristic weight coefficient.
2) Improving deflection parameters
The element value s (i, i) on the main diagonal of the similarity matrix is a bias parameter, and the value of the element value s (i, i) is related to the number of clustering results. Reasonable deviation parameter values are selected by using the clustering evaluation indexes, so that the iteration times of the algorithm can be effectively reduced, and the clustering precision is improved.
The AP clustering algorithm has good stability and small index range variation for multiple iterative clustering effect evaluation (DB). Therefore, DB index is used as a bias parameter selection and convergence criterion of the AP clustering algorithm, as shown in the formula.
s(i,i)=p m +δDB min (20)
Wherein p is m An initial value of the median of all numbers on the non-main diagonal; DB (database) min DB minimum value under the calculation of the current algorithm; delta is a search threshold, delta > 0 represents a forward search, delta < 0 is a backward search; as shown in (21), the smaller the value of the DB index calculation is, the lower the similarity between classes is, and the better the clustering effect is.
Figure SMS_33
Wherein n is a cluster number; w (W) i 、W j Respectively, i and j-th class data points are respectively sent to a clustering center C j Average distance of (2); c (C) ij Is the distance between cluster centers i and j.
The flow of the improved AP clustering algorithm is shown in fig. 1.
(3) Calculation case analysis
The section selects user data of a certain comprehensive energy system park, 2000 load curves are randomly selected from the user data, and initial energy utilization characteristic index weights are processed by adopting equal weights. After solving and optimizing by adopting a particle swarm algorithm based on simulated annealing, the optimal segmentation number w=3 and the optimal character number l=6 are obtained. The final cluster center obtained by adopting the optimized AP cluster algorithm is of 4 types, as shown in figure 2:
as can be seen from FIG. 2, the load curves have large differences, the energy consumption of various typical users is obviously changed, and each cluster center represents the energy consumption of one type of users. As can be seen from FIG. 2, the load curves have large differences, the energy consumption of various typical users is obviously changed, and each cluster center represents the energy consumption of one type of users. The class A users have larger energy consumption level in the morning and evening, have obvious fall back in the noon, and possibly belong to office workers; B. the energy consumption level of the class C users is improved after 7 points and is lowered after 20 points, and the energy consumption behavior accords with the daily work and rest rules of most residents; the energy consumption level of the B class users is average, the energy consumption level of the morning and evening is slightly larger, the characteristics of continuous energy consumption are presented, but no obvious peak-valley characteristics exist; the daytime energy consumption level of the class C users is higher than that of the class B users, and the class C users respectively have two peaks in the midday and the evening, which belongs to bimodal load; class D users use low energy levels due in large part to equipment loss and possibly due to non-electricity-consuming residents throughout the day, such as empty room customers, business travelers, etc. And according to the extracted load characteristics, the user energy utilization behavior can be deeply analyzed.
Class D users use too low a level of energy and are therefore not analyzed. The method evaluates the energy consumption levels of three A, B, C users respectively, the daily peak-valley difference of the A-class users is large, peak clipping and valley filling are needed, and the method is a potential group for demand response. B. The class C users have higher daily load rate, can be used as resident demand response representatives, and can formulate higher peak-hour electricity prices for the class C users, guide the class C users to execute peak clipping and valley filling, and promote the optimal configuration of power resources. In addition, the peak regulation capability of the class B users is larger, the daily energy consumption is more stable, and the system can be matched with the class D users to carry out scheduling and arrangement so as to fill the load valley.
The characteristic index of the clustering center is shown in table 2, and the corresponding initial weight and the improved final weight of A are shown in table 3. To simplify the analysis, the cluster center is used as a representative load on the load curve. As shown in table 3, the daily average load weight was highest, and it was mainly considered in the analysis.
TABLE 2 clustering center characteristic index
Figure SMS_34
Table 3 initial weights and update results
Figure SMS_35
Meanwhile, according to the discrete state characteristics of each representative load, the CRITIC weighting method can be utilized to analyze the energy consumption characteristics, and the qualitative analysis of the energy consumption characteristic indexes is combined to further analyze the demand response potential of the user. According to the formula (16) of the CRITIC weighting method, the larger the information amount contained in the index is, the larger the weight is; the conflict represents the relevance among different indexes, and the relevance coefficient is used for representing the relevance among the indexes, so that the stronger the relevance among the indexes and other indexes is, the smaller the conflict among the indexes and other indexes is, the more the same information is reflected, the more the repeated the embodied evaluation content is, the evaluation strength of the indexes is weakened to a certain extent, and the weight distributed to the indexes is reduced. Therefore, it can be considered that a user with a large amount of information is suitable for price type demand response, and a user with a small amount of information is suitable for incentive type demand response. Assuming that the correlation coefficient is unchanged, the larger the collision, i.e., the standard deviation, the larger the amount of information contained. The index conflict calculations for each user are shown in table 4. The information quantity contained by the B-class users is larger, the overall energy utilization level is average, the B-class users are suitable for being used as price type demand response clients, and flexible electricity prices are formulated to guide the users to change energy utilization behaviors; and the class A and class C users have higher energy and smaller information content, can be used as motivation type demand response clients, and reduce the power demand when the system needs or the power is in tension by combining the satisfaction degree of different users.
TABLE 4 index conflict for each user
User' s Index conflict
A 59.91
B 89.81
C 46.16
Example two
Based on the same inventive concept, the present application further provides a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and the program controls a device where the nonvolatile storage medium is located to execute the method in the first embodiment.
Example III
Based on the same inventive concept, the application also provides an electronic device, which comprises a processor and a memory; the memory stores computer readable instructions, and the processor is configured to execute the computer readable instructions, where the computer readable instructions execute the method in the first embodiment.
Example IV
Based on the same inventive concept, the application also provides a user energy figure analysis device based on multi-element data driving, which comprises the following modules:
the dimension reduction feature extraction module: the method is used for reducing the dimension of the load curve and extracting the characteristics by utilizing a time sequence symbol aggregation approximate SAX algorithm;
the simulated annealing particle swarm algorithm module is used for converting the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression into a multi-objective optimization problem based on a simulated annealing particle swarm algorithm;
and a cluster analysis module: according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm;
and the energy consumption analysis module for various users: and according to the clustering result, analyzing the energy utilization behaviors of various users.
In conclusion, the algorithm can not only efficiently and accurately cluster the load curves, but also extract important features of the load curves, and is beneficial to analysis of user behavior. An improved AP clustering algorithm based on SAX discrete state features and weighted energy utilization characteristic indexes is provided, and objective weights of the energy utilization characteristic indexes are determined by using a CRITI C weighting method. The computing case demonstrates that the extracted features not only can ensure the clustering precision, but also can be helpful for analyzing the user energy consumption behavior. Can be popularized and applied to various occasions such as demand response and the like.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A method for analyzing user energy image based on multi-element data driving is characterized in that: the method comprises the following steps:
step 1: performing dimension reduction on the load curve by using a time sequence symbol aggregation approximate SAX algorithm and extracting features;
step 2, converting the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression into a multi-objective optimization problem based on a simulated annealing particle swarm algorithm;
step 3: according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm;
step 4: and according to the clustering result, analyzing the energy utilization behaviors of various users.
2. The multi-data-driven user-friendly image analysis method as claimed in claim 1, wherein: the step 1 further comprises the following steps:
step one: converting the n-dimensional time sequence into a w-dimensional vector, and converting the original load curve X= [ X ] 1 ,x 2 K x n ]Using piecewise aggregation approximation, the data is piecewise approximated as w segments
Figure FDA0004005414380000011
Wherein the i->
Figure FDA0004005414380000012
The calculation formula of (2) is as follows:
Figure FDA0004005414380000013
dividing the n-dimensional original time sequence vector into w segments to reduce to w-dimensional and x j Is the original load curve column vector;
Figure FDA0004005414380000014
is the mean of the ith fragment; />
Figure FDA0004005414380000015
Is the compression ratio.
Step two: the sequence data obtained through the Piecewise Aggregation Approximation (PAA) is subjected to character to realize normalization of each time sequence, and then the normalized time sequence is converted into the Piecewise Aggregation Approximation (PAA) representation;
Figure FDA0004005414380000021
wherein,,
Figure FDA0004005414380000022
is a subcolumn element of length n; alpha j Is the i-th element in the alphabet; beta j-1 、β j The probability values are the j-1 and j corresponding to the Gaussian distribution breakpoint list;
step three: after the time sequence is dimension reduced, the problem of missing report easily occurs in the characteristic space inquiry; the following definition theory is applied to ensure no report missing, n-dimensional time sequences C and Q are converted into w-dimensional vectors in SAX, PAA expression is obtained, and a dimension reduction formula is substituted into Euclidean distance to obtain a distance measurement formula of the PAA:
Figure FDA0004005414380000023
wherein,,
Figure FDA0004005414380000024
q, C, respectively, are time sequences after dimension reduction; />
Figure FDA0004005414380000025
Respectively->
Figure FDA0004005414380000026
Figure FDA0004005414380000027
Is the i-th element of (c).
3. The multi-data-driven user-friendly image analysis method as claimed in claim 1, wherein: the step 2 further comprises the following steps:
the objective function is as follows:
Figure FDA0004005414380000028
wherein:
Figure FDA0004005414380000029
Figure FDA00040054143800000210
Figure FDA00040054143800000211
2≤l≤l m (10)
2≤w≤w m (11) Wherein A is accuracy, and represents the characterization function of the segmented load curve to the original load curve; e is information quantity, the information entropy is used for measuring, the smaller the information entropy is, the greater the accuracy is when the existing signal is used for prediction, and the greater the information quantity is contained; r is a reduction rate and reflects the compression degree of an original load curve;
Figure FDA0004005414380000031
values approximated for the section of the load curve PPA +.>
Figure FDA0004005414380000032
And the original load curve X i Is a correlation coefficient of (2); due to the different dimensions +.>
Figure FDA0004005414380000033
After spline interpolation, form a spline with X i And (3) carrying out correlation coefficient calculation on the sequences with the same dimension: p is p i For character i at X i The occurrence probability of (a) is determined; l (L) m Is the maximum number of characters, w m For the set maximum number of segments, take l herein m =w m =10, μ is the weight coefficient of two parameterizations.
4. The multi-data-driven user-friendly image analysis method as claimed in claim 1, wherein: the step 3 further comprises the following steps: 3 types of typical energy utilization characteristic indexes, namely energy utilization load level, energy utilization stability and energy utilization interaction capability, are introduced, specific indexes including daily average load, daily load rate, peak time energy consumption rate, valley electricity coefficient and the like are selected as characteristic vectors, load curves are clustered, the indexes are used as main data characteristic vectors, and according to SAX optimized discrete characteristics, time domain and state characteristics of the load curves are comprehensively reflected and are used as clustering basis of the load curves.
5. The method for analyzing the user-friendly image based on the multi-element data driving according to claim 4, wherein the method comprises the following steps: the contribution of each characteristic index to the clustering result is evaluated by using a CRITIC weighting method, and the index weight of the energy consumption characteristic is objectively determined, wherein the objective weight of the index is comprehensively measured according to the contrast intensity of the evaluation index and the conflict between indexes, and the contrast intensity characterizes the difference of the evaluation indexes: i.e. the larger the mean square value, the larger the amount of information the index contains; the conflict represents the relevance among different indexes, and if the correlation coefficient of 2 indexes is larger, the relevance is stronger, and the corresponding conflict is lower.
6. The method for analyzing the user-friendly image based on the multi-element data driving according to claim 5, wherein the method comprises the following steps: the CRITIC weighting method comprises the following specific steps of:
1) Index normalization: setting m evaluation objects and n evaluation indexes, and normalizing the different indexes by adopting a forward/reverse normalization method in view of different action trends of the different indexes on the final evaluation result;
2) Calculating the correlation coefficient of the evaluation index matrix: the correlation coefficient can describe the conflict among the indexes, and if the two indexes have obvious positive correlation, the smaller the conflict is, the lower the weight is;
3) Calculating weights: and calculating the contrast strength and the conflict of each evaluation index by using the obtained correlation coefficient matrix.
7. The multi-data-driven user-friendly image analysis method as claimed in claim 1, wherein: the improved AP clustering algorithm further includes the following;
1) The discrete state quantity of the load curve and the energy consumption characteristic index are selected to reduce the dimension of the load curve, so that the calculation speed of the AP clustering similarity matrix is improved, and the deviation parameters are adjusted to improve the clustering efficiency:
s(i,j)=-[αd dij +(1-α)d tij ]i≠j (18)
Figure FDA0004005414380000041
wherein d dij And d tij The distance between the discrete state characteristic d and the energy utilization characteristic index t of the load curve i and the load curve j after SAX calculation is respectively represented by the Euclidean distance; alpha is a characteristic weight coefficient;
2) Improvement of bias parameters: the element value s (i, i) on the main diagonal of the similarity matrix is a deflection parameter, and the value of the element value s (i, i) is related to the number of clustering results;
and using a clustering effect evaluation (DB) index as a bias parameter selection and convergence criterion of an AP clustering algorithm, wherein the bias parameter selection and convergence criterion is shown in the formula:
s(i,i)=p m +δDB min (20)
wherein p is m The median of all numbers on the non-main diagonal is the initial value; DB (database) min DB minimum value under the calculation of the current algorithm; delta is a search threshold, if the search is to be carried out forwards, delta is greater than 0, otherwise delta is less than 0, and DB index calculation is shown as (21):
Figure FDA0004005414380000051
wherein n is a cluster number; w (W) i For data points within class i to cluster center C j Average distance of (2); c (C) ij Is the distance between cluster centers i and j.
8. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein the program, when run, controls a device in which the non-volatile storage medium is located to perform the method of any one of claims 1 to 7.
9. An electronic device comprising a processor and a memory; the memory has stored therein computer readable instructions for executing the processor, wherein the computer readable instructions when executed perform the method of any of claims 1 to 7.
10. A user is with can figure analytical equipment based on many data drive, characterized by: the device comprises the following modules:
the dimension reduction feature extraction module: the method is used for reducing the dimension of the load curve and extracting the characteristics by utilizing a time sequence symbol aggregation approximate SAX algorithm;
the simulated annealing particle swarm algorithm module is used for converting the optimization problem of the load curve time sequence symbol aggregation approximation (SAX) expression into a multi-objective optimization problem based on a simulated annealing particle swarm algorithm;
and a cluster analysis module: according to the user energy consumption characteristic index, carrying out cluster analysis on the load curve by utilizing an improved AP cluster algorithm;
and the energy consumption analysis module for various users: and according to the clustering result, analyzing the energy utilization behaviors of various users.
CN202211630066.XA 2022-12-19 2022-12-19 User energy consumption portrait analysis method based on multivariate data driving Pending CN116304295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211630066.XA CN116304295A (en) 2022-12-19 2022-12-19 User energy consumption portrait analysis method based on multivariate data driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211630066.XA CN116304295A (en) 2022-12-19 2022-12-19 User energy consumption portrait analysis method based on multivariate data driving

Publications (1)

Publication Number Publication Date
CN116304295A true CN116304295A (en) 2023-06-23

Family

ID=86834805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211630066.XA Pending CN116304295A (en) 2022-12-19 2022-12-19 User energy consumption portrait analysis method based on multivariate data driving

Country Status (1)

Country Link
CN (1) CN116304295A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076990A (en) * 2023-10-13 2023-11-17 国网浙江省电力有限公司 Load curve identification method, device and medium based on curve dimension reduction and clustering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076990A (en) * 2023-10-13 2023-11-17 国网浙江省电力有限公司 Load curve identification method, device and medium based on curve dimension reduction and clustering
CN117076990B (en) * 2023-10-13 2024-02-27 国网浙江省电力有限公司 Load curve identification method, device and medium based on curve dimension reduction and clustering

Similar Documents

Publication Publication Date Title
Wang et al. Load profiling and its application to demand response: A review
CN112561156A (en) Short-term power load prediction method based on user load mode classification
CN108805213B (en) Power load curve double-layer spectral clustering method considering wavelet entropy dimensionality reduction
CN113112090B (en) Space load prediction method based on principal component analysis of comprehensive mutual informativity
CN116304295A (en) User energy consumption portrait analysis method based on multivariate data driving
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN116307059A (en) Power distribution network region fault prediction model construction method and device and electronic equipment
CN115660855A (en) Stock closing price prediction method fusing news data
CN117151770A (en) Attention mechanism-based LSTM carbon price prediction method and system
CN117390550A (en) Low-carbon park carbon emission dynamic prediction method and system considering emission training set
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
Obst et al. Textual data for time series forecasting
CN116151464A (en) Photovoltaic power generation power prediction method, system and storable medium
CN113780686A (en) Distributed power supply-oriented virtual power plant operation scheme optimization method
CN110852628A (en) Rural medium and long term load prediction method considering development mode influence
CN111353523A (en) Method for classifying railway customers
Wang et al. Analysis of user’s power consumption behavior based on k-means
Mougeot et al. Forecasting intra day load curves using sparse functional regression
CN113673579B (en) Small sample-based electricity load classification algorithm
Li et al. Research on power customer segmentation based on big data of intelligent city
Guan et al. Stock prediction via time series clustering and image feature extraction
CN115271274B (en) Short-term daily load prediction method for power system and related equipment
CN117670066B (en) Questor management method, system, equipment and storage medium based on intelligent decision
CN118133063A (en) User electricity behavior feature analysis method and system based on demand response
Lou Massive Ship Fault Data Retrieval Algorithm Supporting Complex Query in Cloud Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination