CN113344073A - Daily load curve clustering method and system based on fusion evolution algorithm - Google Patents

Daily load curve clustering method and system based on fusion evolution algorithm Download PDF

Info

Publication number
CN113344073A
CN113344073A CN202110613240.9A CN202110613240A CN113344073A CN 113344073 A CN113344073 A CN 113344073A CN 202110613240 A CN202110613240 A CN 202110613240A CN 113344073 A CN113344073 A CN 113344073A
Authority
CN
China
Prior art keywords
individual
data
population
load curve
individuals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110613240.9A
Other languages
Chinese (zh)
Inventor
覃日升
李胜男
况华
姜訸
段锐敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power Grid Co Ltd filed Critical Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority to CN202110613240.9A priority Critical patent/CN113344073A/en
Publication of CN113344073A publication Critical patent/CN113344073A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application belongs to the technical field of power system analysis and control, and provides a daily load curve clustering method and system based on a fusion evolutionary algorithm, wherein the daily load curve clustering method based on the fusion evolutionary algorithm comprises the following steps: acquiring and preprocessing original daily load curve loads of a plurality of users, and randomly generating S genetic algorithm individuals to form an initialization population, wherein the individuals are formed by clustering center codes; updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object to the cluster center, selecting a fitness function, and calculating an individual fitness value; circularly repeating the genetic operation of individuals in the population and the cluster center code updating operation of the individuals until the current annealing temperature is less than the annealing termination temperature; and selecting the cluster center code value of the individual with the maximum fitness value in the population to determine the cluster center as the final cluster center. The daily load curve clustering method based on the fusion evolutionary algorithm effectively improves the accuracy of the daily load curve clustering method.

Description

Daily load curve clustering method and system based on fusion evolution algorithm
Technical Field
The application belongs to the technical field of power system analysis and control, and particularly relates to a daily load curve clustering method and system based on a fusion evolutionary algorithm.
Background
With the continuous promotion of smart power grid construction, the data acquisition equipment can collect the power utilization condition of a large number of users. Different types of users, such as civilian, commercial, industrial, and agricultural, have large differences in power consumption patterns, and their power patterns may differ even for the same type of user. How to adopt an effective data mining technology and finely divide mass user load curve data of different types under the background of big data so as to mine the information such as internal relation among loads of different types, corresponding power utilization behavior and power utilization characteristics and the like, and undoubtedly, the method has certain guiding significance on load prediction, power grid planning and demand side response.
The traditional daily load curve clustering method mainly comprises a direct clustering method based on original load data and an indirect clustering method based on dimension reduction. The direct clustering method generally includes normalizing the load values of each sampling time point of the daily load curve, and clustering by using algorithms such as a K mean value, a fuzzy C mean value, self-organizing map and the like. The fuzzy C-means algorithm is a fuzzy clustering method based on division, objective classification is carried out on membership degrees of different classes through description samples, the algorithm is simple and fast in searching speed, but clustering results excessively depend on an initial clustering center, the clustering results are easy to converge on local extreme points and fall into local optimal solutions, and daily load curve classification results are deviated.
In order to overcome the defects of the fuzzy C-means algorithm, the fuzzy C-means algorithm can be improved by combining with the genetic algorithm, for example, a fuzzy C-means operator can be used for replacing a crossover operator in the genetic algorithm, a hybrid genetic clustering algorithm is provided, a floating point coding mode of a clustering center can be adopted, and a floating point number crossover and mutation algorithm is designed to improve the search efficiency.
However, when the number of samples, the sample dimension, and the number of sample classes are large, these algorithms often suffer from premature convergence to local excellence. When the algorithm is premature, the local extreme advantage is difficult to jump out by only depending on small mutation probability. Moreover, the evolutionary algorithm may generate a degradation phenomenon in the evolutionary process, which may result in too long iteration times and low clustering accuracy.
Disclosure of Invention
The application provides a daily load curve clustering method and system based on a fusion evolutionary algorithm, and provides the daily load curve clustering method and system with higher accuracy.
The first aspect of the application provides a daily load curve clustering method based on a fusion evolutionary algorithm, and the daily load curve clustering method based on the fusion evolutionary algorithm comprises the following steps:
step 1: acquiring original daily load curve loads of a plurality of users, preprocessing the original daily load curve loads to obtain a load data set, wherein the load data set consists of a plurality of data objects, and one data object represents the load of one original daily load curve;
step 2: initializing the current annealing temperature and the annealing termination temperature of a simulated annealing algorithm, initializing a genetic algorithm individual based on the number C of cluster pre-classified by a load data set, and randomly generating S individuals to form an initialized population, wherein the individuals are formed by C cluster center codes;
and step 3: updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating an individual fitness value;
and 4, step 4: carrying out genetic operation on individuals in the population according to a genetic algorithm, updating the cluster center codes of the individuals in the population according to the current annealing temperature and the individual fitness value, and then carrying out cooling operation on the current annealing temperature to obtain the updated current annealing temperature;
and 5: repeating the step 4 until the updated current annealing temperature is less than the annealing termination temperature;
step 6: and selecting the individual with the maximum fitness value in the population as an optimal individual, and determining the clustering center code value of the optimal individual as the final C clustering centers.
Optionally, the step of obtaining the original daily load curve loads of the multiple users, preprocessing the original daily load curve loads, and obtaining the load data set specifically includes:
searching missing and abnormal data in the load of each original daily load curve, wherein the abnormal data comprises data with sudden drop, sudden increase or negative value, and if the load abnormal data of the original daily load curve reaches 10% of the acquisition amount, removing the original daily load curve to obtain first spare load data;
supplementing and correcting missing and abnormal data in the first spare load data to obtain second spare load data;
and performing normalization processing on the second spare load data by adopting a linear function normalization method to obtain a load data set.
Optionally, the supplementing and correcting missing data and abnormal data in the first spare load data adopts a barycentric lagrangian interpolation method, and the barycentric lagrangian interpolation method defines a lagrangian interpolation basis function according to a barycentric weight.
Optionally, the individuals are coded by C cluster centers and binary coding is adopted.
Optionally, the cluster center code of each individual is updated according to the fuzzy C-means algorithm, and the membership function adopted to obtain the membership matrix of the data object relative to the cluster center is:
Figure BDA0003096873050000021
in the formula uikThe membership degree of the ith data object belonging to the kth class, c is the number of clustering centers, dikThe distance from the ith data object to the kth class is defined, and r is a fuzzy index;
the fitness function is selected, and the fitness function adopted for calculating the individual fitness value is as follows:
fi=ranking(Jr);
in the formula (f)iRepresenting the fitness value of the ith individual in the population, ranking () is a ranking-based distribution function, JrComprises the following steps:
Figure BDA0003096873050000022
wherein U is membership matrix, V is clustering center matrix, c is clustering center number, n-x is data object number, UikDegree of membership, d, for the ith data object belonging to the kth classikIs the distance from the ith data object to the kth class, and r is the fuzzy index.
Optionally, the performing genetic operation on individuals in the population according to a genetic algorithm, updating the cluster center codes of the individuals in the population according to the current annealing temperature and the individual fitness value, and then performing a cooling operation on the current annealing temperature to obtain an updated current annealing temperature specifically includes:
step 601: selecting, crossing and mutating the individuals in the population to generate new individuals;
step 602: calculating a fitness value of a new individual, if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual, and if the fitness value of the new individual is less than the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual according to a preset probability, wherein the preset probability is as follows:
Figure BDA0003096873050000031
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature;
step 603: repeating the steps 601 to 602 until the cycle number is larger than the set maximum cycle number;
step 604: updating the current annealing temperature according to a cooling formula, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
Optionally, in the step of performing selection, crossing and mutation genetic operations on individuals in the population, the selection operator adopts random traversal sampling, the crossing operator adopts a multipoint crossing operator, and the mutation operator adopts a base bit mutation operator.
The second aspect of the present application provides a daily load curve clustering system based on a fusion evolutionary algorithm, where the daily load curve clustering system based on the fusion evolutionary algorithm is used to execute the daily load curve clustering method based on the fusion evolutionary algorithm provided by the first aspect of the present application, and the daily load curve clustering system based on the fusion evolutionary algorithm includes:
the data acquisition module is used for acquiring the original daily load curve loads of a plurality of users;
the data preprocessing module is used for preprocessing the original daily load curve load to obtain a load data set;
the initialization module is used for initializing the current annealing temperature and the annealing termination temperature of the simulated annealing algorithm, initializing the genetic algorithm individuals and randomly generating S individuals to form an initialization population, wherein the individuals are formed by cluster center codes;
the fuzzy C mean module is used for updating the cluster center code of each individual, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function and calculating an individual fitness value;
the genetic annealing module is used for carrying out genetic operation on individuals in the population, updating the cluster center codes of the individuals in the population according to the individual fitness value, then carrying out cooling operation on the current annealing temperature to obtain the updated current annealing temperature, and judging whether the updated current annealing temperature is less than the annealing termination temperature or not;
and the screening module is used for selecting the individual with the maximum fitness value in the population as the optimal individual, and the clustering center code value of the optimal individual is determined as the final C clustering centers.
Optionally, the data preprocessing module specifically includes:
the data cleaning unit is used for searching missing data and abnormal data in the load of each original daily load curve, the abnormal data comprise data with sudden drop, sudden increase or negative values, and if the load abnormal data of the original daily load curve reach 10% of the collection amount, the original daily load curve is removed to obtain first spare load data;
the data interpolation unit is used for supplementing and correcting missing data and abnormal data in the first spare load data by adopting a gravity center Lagrange interpolation method to obtain second spare load data;
and the data normalization unit is used for performing normalization processing on the second spare load data by adopting a linear function normalization method to obtain a load data set.
Optionally, the genetic annealing module specifically comprises:
the genetic operation unit is used for carrying out selection, crossing and variant genetic operation on individuals in the population to generate new individuals;
a fitness screening unit, configured to calculate a fitness value of a new individual, update a cluster center code value of the individual in the population with the cluster center code value of the new individual if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, and update the cluster center code value of the individual in the population with the cluster center code value of the new individual according to a preset probability if the fitness value of the new individual is less than the fitness value of the individual in the population, where the preset probability is:
Figure BDA0003096873050000041
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature;
the circulation judging module is used for judging whether the circulation times are larger than the set maximum circulation times or not;
the annealing unit is used for updating the current annealing temperature according to a cooling formula, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
The application provides a daily load curve clustering method and system based on a fusion evolutionary algorithm, wherein the daily load curve system based on the fusion evolutionary algorithm is used for executing the steps of the daily load curve clustering method based on the fusion evolutionary algorithm, acquiring original daily load curve loads of a plurality of users, preprocessing the original daily load curve loads to obtain a load data set, initializing the current annealing temperature and the annealing termination temperature of a simulated annealing algorithm, initializing genetic algorithm individuals, and randomly generating S individuals to form an initialization population, wherein the individuals are formed by clustering center codes; updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating an individual fitness value; circularly repeating the genetic operation of individuals in the population and the cluster center code updating operation of the individuals until the updated current annealing temperature is less than the annealing termination temperature; and selecting the individual with the maximum fitness value in the population as an optimal individual, and determining the clustering center code value of the optimal individual as the final C clustering centers.
According to the daily load curve clustering method based on the fusion evolutionary algorithm, the fuzzy C-means algorithm, the genetic algorithm and the simulated annealing algorithm are combined to update the clustering center, so that the phenomenon that the daily load curve clustering method falls into local optimization is effectively avoided, and the accuracy of the daily load curve clustering method is improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a daily load curve clustering method based on a fusion evolutionary algorithm provided in an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a daily load curve clustering system based on a fusion evolutionary algorithm provided in the embodiment of the present application.
Fig. 3 is a comparison diagram before and after filling data by the barycentric lagrangian interpolation method according to the embodiment of the present application.
Fig. 4 is a daily load curve of different industries after normalization by the embodiment of the application.
Fig. 5 is a daily load curve clustering result according to the embodiment of the present application.
Fig. 6 shows the daily load curve classification result according to the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments.
As shown in fig. 1, a schematic flow chart of a daily load curve clustering method based on a fusion evolutionary algorithm provided in the embodiment of the present application is shown, where the daily load curve clustering method based on the fusion evolutionary algorithm includes steps 1 to 6.
Step 1, acquiring original daily load curve loads of a plurality of users, preprocessing the original daily load curve loads, and acquiring a load data set.
Many clustering algorithms are sensitive to abnormal and missing data, and abnormal data in load data may affect the clustering effect and generate wrong classification, so that the load data needs to be preprocessed. The loss of the load and the generation of abnormal data are caused by various reasons, firstly, the data loss may be caused by the damage and the abnormality of the data measuring device, secondly, the load data may be caused by the normal activities of the power grid such as line maintenance or security inspection, and the data abnormality such as outlier, noise, deviation and the like may be caused by the transmission of the load data from the measuring device to the analysis end. The load data abnormality and loss preprocessing method includes an empirical correction method, a threshold discrimination method, a curve replacement method, and the like.
The method includes the steps of selecting 412 original daily load curves, enabling 96 load sampling points to be arranged on each original daily load curve, enabling the time interval of the sampling points to be 15 minutes, preprocessing load data after the load data are acquired, and specifically including the steps from S101 to S103.
Step S101, missing and abnormal data in each original daily load curve load are searched, the abnormal data comprise data with sudden drop, sudden increase or negative values, if the load abnormal data of the original daily load curves reach 10% of the sampling number, the original daily load curves are considered invalid, the original daily load curves are removed, first spare load data are obtained, for example, n original daily load curves are obtained, if x original daily load curves are invalid, the effective original daily load curves are n-x, and a matrix of (n-x) x m is formed as the first spare load data.
In the 412 original daily load curves of the embodiment of the application, the total number of curves with missing data and abnormal data at 10 sampling points is 12, and the 12 curves are removed for the next operation.
And S102, supplementing and correcting missing data and abnormal data in the first spare load data, and obtaining second spare load data by adopting a gravity lagrange interpolation method.
The lagrangian interpolation method is convenient to apply in theoretical analysis due to the fact that the formula structure is neat and compact, but when interpolation points are increased or decreased, corresponding basis functions need to be recalculated, and the process is complex, so that the gravity center lagrangian interpolation method is provided in the embodiment of the application, and as shown in fig. 3, a comparison graph before and after data is filled in the gravity center lagrangian interpolation method in the embodiment of the application. The gravity center Lagrange interpolation method does not need to calculate a basis function during interpolation calculation, can greatly reduce the calculation amount, and is provided with a polynomial function (x) of k +1 nodes0,y0)(x1,y1)...(xk,yk) Defining the gravity center weight as:
Figure BDA0003096873050000061
the lagrange basis function can be defined as:
Figure BDA0003096873050000062
wherein l (x) is (x-x)0)…(x-xk);
The barycentric lagrange interpolation formula is:
Figure BDA0003096873050000063
in the formula, xjIs an independent variable, yjIs a dependent variable.
Step S103, carrying out normalization processing on the second spare load data to obtain a load data set, wherein the load data set is composed of a plurality of data objects, and one data object represents a daily load curve.
The daily load curves are different due to different dimensions of user attributes, and the influence of the dimensions can be eliminated through data normalization processing, so that the analysis result is more accurate. As shown in fig. 4, a linear function normalization method is adopted for daily load curves of different industries after normalization according to the embodiment of the present application, and a linear function normalization formula is as follows:
Figure BDA0003096873050000064
in formula (II), X'iFor normalized load data, XiFor load data before normalization, XminFor minimum load data before normalization, XmaxThe maximum load data before normalization.
And 2, initializing the current annealing temperature and the annealing termination temperature of the simulated annealing algorithm, initializing the genetic algorithm individuals based on the pre-classified cluster number C of the load data set, and randomly generating S individuals to form an initialization population.
The simulated annealing algorithm is a greedy algorithm and accepts poor solutions under a preset probability. According to the embodiment of the application, the simulated annealing algorithm and the genetic algorithm are combined, so that the local optimal solution can be skipped, the global optimal solution can be found, and the convergence phenomena of early precocity and late evolution stagnation of the genetic algorithm are avoided. In the present embodiment, the current annealing temperature is set to 100, and the annealing end temperature is set to 1.
There are various methods for determining the number of pre-classified clusters of the load data set, such as a gap statistic method, an elbow criterion method, an effectiveness function index, and the like, and the user sample selected in the embodiment of the application is from the industries, businesses, agriculture and education, so that the number C of the initially determined clusters is 4. 4 96-dimensional data objects are randomly generated to serve as 4 initial clustering centers of the load data set, binary coding is adopted, the 4 randomly generated data objects represent individuals of a genetic algorithm, and S individuals are repeatedly and randomly generated to form an initialization population.
And 3, updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating the fitness value of the individual.
Fuzzy mean clustering fuses the essence of a fuzzy theory, combines the fuzzy theory and the clustering theory, divides samples into a plurality of fuzzy groups according to the internal rules of the samples, determines the similarity between the samples through a distance function, and obtains an optimal clustering result by utilizing a mathematical programming theory. The step of updating the cluster center code of each individual according to the fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating the individual fitness value specifically comprises the steps of S301 to S303.
Step S301, calculating the membership of the data object relative to each clustering center according to the initial clustering center and the membership function of the individual to obtain the membership matrix of each individual, and updating the clustering center code of the individual according to the membership matrix and the clustering center updating formula, wherein the membership function is as follows:
Figure BDA0003096873050000071
in the formula uikThe membership degree of the ith data object belonging to the kth class, c is the number of clustering centers, dikThe Euclidean distance from the ith data object to the kth clustering center is defined, r is a fuzzy index, and the membership degree needs to satisfy the following formula:
Figure BDA0003096873050000072
the cluster center updating formula is as follows:
Figure BDA0003096873050000073
in the formula, VkIs the k-th cluster center, ziIs the ith data object, uikAnd n-x is the number of the data objects, wherein the ith data object belongs to the k-th class of membership degree.
And step S302, obtaining a membership matrix of the data object relative to the updated clustering center according to the membership function.
And after the cluster center is updated, the membership degree of the data object relative to the cluster center is changed, and the membership degree is recalculated to obtain a membership degree matrix to which the updated cluster center belongs.
Step S303, selecting a fitness function, and calculating an individual fitness value, wherein the fitness function is as follows:
fi=ranking(Jr);
in the formula (f)iRepresenting the fitness value of the ith individual in the population, ranking () is a ranking-based distribution function, JrComprises the following steps:
Figure BDA0003096873050000074
in the formula, U is membership momentArray, V is a cluster center matrix, c is the number of cluster centers, n-x is the number of data objects, uikDegree of membership, d, for the ith data object belonging to the kth classikIs the distance from the ith data object to the kth class, and r is the fuzzy index.
Calculating J according to the updated clustering center and the membership matrix corresponding to the clustering centerrAnd the values are sequenced to obtain fitness values.
And 4, performing genetic operation on the individuals in the population according to a genetic algorithm, updating the cluster center codes of the individuals in the population according to the current annealing temperature and the individual fitness value, and performing cooling operation on the current annealing temperature to obtain the updated current annealing temperature.
Step 401: and carrying out selection, crossover and variant genetic operation on individuals in the population to generate new individuals.
The method comprises the steps of carrying out selection, crossing and mutation genetic operation on S individuals in a population to generate new individuals corresponding to the individuals in the population one by one, wherein a selection operator in the genetic operation adopts random traversal sampling, a multi-point crossing operator is adopted as the crossing operator, the crossing probability is 0.7, a basic bit mutation operator is adopted as the mutation operator, and the mutation probability is 0.01.
Step 402: calculating a fitness value of a new individual, if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual, and if the fitness value of the new individual is less than the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual according to a preset probability, wherein the preset probability is as follows:
Figure BDA0003096873050000081
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature.
Step 403: and circularly repeating the steps 401 to 402 until the circulation number is larger than the set maximum circulation number.
Step 404: updating the current annealing temperature according to a cooling formula, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
In the present embodiment, the cooling coefficient is 0.8.
And 5, repeating the step 4 until the updated current annealing temperature is less than the annealing termination temperature.
And 6, selecting the individual with the maximum fitness value in the population as the optimal individual.
And decoding the clustering center code values of the optimal individuals to obtain the final 4 clustering centers, and classifying the data objects according to Euclidean distances. As shown in fig. 5, which is a daily load curve clustering result of the embodiment of the present application, as shown in fig. 6, which is a daily load curve classification result of the embodiment of the present application, wherein the user category i is in a bimodal state, such users are mostly education industries, such industries start to load in the early morning, the load in the morning and afternoon is high, a rest is needed in the noon, and the load is slightly reduced; the user category II is mostly agricultural, most agricultural units run in the daytime, the running period is indefinite, and the time is short, such as irrigation and livestock raising; the user type III is mostly commercial, the load is started at 9 am and is continued to 10 pm, and the commercial operation mode is met; the user class IV is in a peak-flat state, most industries in the same category comprise various large machines, the load of the industries is high, and the industries need to operate all day long to ensure the benefit, so the industries are in a high-load peak-flat state.
The second aspect of the embodiments of the present application provides a daily load curve clustering system based on a fusion evolutionary algorithm, where the daily load curve clustering system based on the fusion evolutionary algorithm is used to execute the daily load curve clustering method based on the fusion evolutionary algorithm provided by the first aspect of the embodiments of the present application, and for details disclosed in the clustering system provided by the second aspect of the embodiments of the present application, please refer to the daily load curve clustering method based on the fusion evolutionary algorithm provided by the first aspect of the embodiments of the present application.
As shown in fig. 2, a schematic structural diagram of a daily load curve clustering system based on a fusion evolutionary algorithm is provided in the embodiment of the present application. The daily load curve clustering system based on the fusion evolutionary algorithm comprises a data acquisition module, a data preprocessing module, an initialization module, a fuzzy C mean value module, a genetic annealing module and a screening module.
And the data acquisition module is used for acquiring the original daily load curve loads of a plurality of users.
And the data preprocessing module is used for preprocessing the original daily load curve load to obtain a load data set.
The initialization module is used for initializing the current annealing temperature and the annealing termination temperature of the simulated annealing algorithm, initializing the genetic algorithm individuals and randomly generating S individuals to form an initialization population, wherein the individuals are formed by cluster center codes.
And the fuzzy C mean module is used for updating the cluster center code of each individual according to a fuzzy C mean algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function and calculating the individual fitness value.
And the genetic annealing module is used for carrying out genetic operation on the individuals in the population, updating the cluster center codes of the individuals in the population according to the individual fitness value, then carrying out cooling operation on the current annealing temperature to obtain the updated current annealing temperature, and judging whether the updated current annealing temperature is less than the annealing termination temperature.
And the screening module is used for selecting the individual with the maximum fitness value in the population as the optimal individual, and the clustering center code value of the optimal individual is determined as the final C clustering centers.
Further, the data preprocessing module specifically includes:
and the data cleaning unit is used for searching missing data and abnormal data in the load of each original daily load curve, the abnormal data comprises data with sudden drop, sudden increase or negative value, and if the load abnormal data of the original daily load curve reaches 10% of the acquisition amount, the original daily load curve is removed to obtain first spare load data.
And the data interpolation unit is used for supplementing and correcting missing data and abnormal data in the first spare load data by adopting a gravity center Lagrange interpolation method to obtain second spare load data.
And the data normalization unit is used for performing normalization processing on the second spare load data by adopting a linear function normalization method to obtain a load data set.
Further, the genetic annealing module specifically comprises:
and the genetic operation unit is used for carrying out selection, crossing and mutation genetic operation on the individuals in the population to generate new individuals.
A fitness screening unit, configured to calculate a fitness value of a new individual, update a cluster center code value of the individual in the population with the cluster center code value of the new individual if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, and update the cluster center code value of the individual in the population with the cluster center code value of the new individual according to a preset probability if the fitness value of the new individual is less than the fitness value of the individual in the population, where the preset probability is:
Figure BDA0003096873050000091
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature.
And the circulation judging module is used for judging whether the circulation times are larger than the set maximum circulation times.
The annealing unit is used for updating the current annealing temperature according to a cooling formula, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
The application provides a daily load curve clustering method and system based on a fusion evolutionary algorithm, wherein the daily load curve system based on the fusion evolutionary algorithm is used for executing the steps of the daily load curve clustering method based on the fusion evolutionary algorithm, acquiring original daily load curve loads of a plurality of users, preprocessing the original daily load curve loads to obtain a load data set, initializing the current annealing temperature and the annealing termination temperature of a simulated annealing algorithm, initializing genetic algorithm individuals, and randomly generating S individuals to form an initialization population, wherein the individuals are formed by clustering center codes; updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating an individual fitness value; circularly repeating the genetic operation of individuals in the population and the cluster center code updating operation of the individuals until the updated current annealing temperature is less than the annealing termination temperature; and selecting the individual with the maximum fitness value in the population as an optimal individual, and determining the clustering center code value of the optimal individual as the final C clustering centers.
According to the daily load curve clustering method based on the fusion evolutionary algorithm, the fuzzy C-means algorithm, the genetic algorithm and the simulated annealing algorithm are combined to update the clustering center, so that the phenomenon that the daily load curve clustering method falls into local optimization is effectively avoided, and the accuracy of the daily load curve clustering method is improved.
The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure. The protection scope of this application is subject to the appended claims.

Claims (10)

1. A daily load curve clustering method based on a fusion evolutionary algorithm is characterized by comprising the following steps:
step 1: acquiring load data of original daily load curves of a plurality of users, preprocessing the load data to obtain a load data set, wherein the load data set consists of a plurality of data objects, and one data object represents the load of one original daily load curve;
step 2: initializing the current annealing temperature and the annealing termination temperature of a simulated annealing algorithm, initializing a genetic algorithm individual based on the number C of cluster pre-classified by a load data set, and randomly generating S individuals to form an initialized population, wherein the individuals are formed by C cluster center codes;
and step 3: updating the cluster center code of each individual according to a fuzzy C-means algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function, and calculating an individual fitness value;
and 4, step 4: carrying out genetic operation on individuals in the population according to a genetic algorithm, updating the cluster center codes of the individuals in the population according to the current annealing temperature and the individual fitness value, and then carrying out cooling operation on the current annealing temperature to obtain the updated current annealing temperature;
and 5: repeating the step 4 until the updated current annealing temperature is less than the annealing termination temperature;
step 6: and selecting the individual with the maximum fitness value in the population as an optimal individual, and determining the clustering center code value of the optimal individual as the final C clustering centers.
2. The daily load curve clustering method based on the fusion evolutionary algorithm as claimed in claim 1, wherein the step of obtaining the original daily load curve loads of a plurality of users, preprocessing the original daily load curve loads to obtain a load data set specifically comprises:
searching missing and abnormal data in the load data of each original daily load curve, wherein the abnormal data comprises data with sudden drop, sudden increase or negative value, and if the load abnormal data of the original daily load curve reaches 10% of the acquisition amount, removing the original daily load curve to obtain first spare load data;
supplementing and correcting missing and abnormal data in the first spare load data to obtain second spare load data;
and performing normalization processing on the second spare load data by adopting a linear function normalization method to obtain a load data set.
3. The method as claimed in claim 2, wherein the supplementing and correcting of missing and abnormal data in the first backup load data is performed by using a barycentric lagrangian interpolation method, and the barycentric lagrangian interpolation method defines lagrangian interpolation basis functions according to the barycentric weights.
4. The daily load curve clustering method based on the fusion evolutionary algorithm as claimed in claim 1, wherein the individuals are coded by C cluster centers by binary coding.
5. The daily load curve clustering method based on the fusion evolutionary algorithm as claimed in claim 1, wherein the cluster center code of each individual is updated according to the fuzzy C-means algorithm, and the membership function adopted to obtain the membership matrix of the data object relative to the cluster center is:
Figure FDA0003096873040000011
in the formula uikThe membership degree of the ith data object belonging to the kth class, c is the number of clustering centers, dikThe distance from the ith data object to the kth class is defined, and r is a fuzzy index;
the fitness function is selected, and the fitness function adopted for calculating the individual fitness value is as follows:
fi=ranking(Jr);
in the formula (f)iRepresenting the fitness value of the ith individual in the population, ranking () is a ranking-based distribution function, JrComprises the following steps:
Figure FDA0003096873040000021
wherein U is membership matrix, V is clustering center matrix, c is clustering center number, n-x is data object number, UikDegree of membership, d, for the ith data object belonging to the kth classikIs the distance from the ith data object to the kth class, and r is the fuzzy index.
6. The daily load curve clustering method based on the fusion evolutionary algorithm as claimed in claim 1, wherein the steps of performing genetic operation on individuals in a population according to the genetic algorithm, updating cluster center codes of the individuals in the population according to the current annealing temperature and the individual fitness value, and performing a cooling operation on the current annealing temperature to obtain the updated current annealing temperature specifically comprise:
step 601: selecting, crossing and mutating the individuals in the population to generate new individuals;
step 602: calculating a fitness value of a new individual, if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual, and if the fitness value of the new individual is less than the fitness value of the individual in the population, updating the cluster center code value of the individual in the population by using the cluster center code value of the new individual according to a preset probability, wherein the preset probability is as follows:
Figure FDA0003096873040000022
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature;
step 603: repeating the steps 601 to 602 until the cycle number is larger than the set maximum cycle number;
step 604: updating the current annealing temperature according to a cooling formula to obtain the updated current annealing temperature, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
7. The daily load curve clustering method based on the fusion evolutionary algorithm as claimed in claim 6, wherein in the step of selecting, crossing and mutating the individuals in the population, the selection operator adopts random ergodic sampling, the crossing operator adopts a multi-point crossing operator, and the mutation operator adopts a base bit mutation operator.
8. A daily load curve clustering system based on a fusion evolutionary algorithm, wherein the daily load curve clustering system based on the fusion evolutionary algorithm is used for executing the daily load curve clustering method based on the fusion evolutionary algorithm in any one of claims 1 to 7, and comprises the following steps:
the data acquisition module is used for acquiring the original daily load curve loads of a plurality of users;
the data preprocessing module is used for preprocessing the original daily load curve load to obtain a load data set;
the initialization module is used for initializing the current annealing temperature and the annealing termination temperature of the simulated annealing algorithm, initializing the genetic algorithm individuals and randomly generating S individuals to form an initialization population, wherein the individuals are formed by cluster center codes;
the fuzzy C mean value module is used for updating the cluster center code of each individual according to a fuzzy C mean value algorithm, obtaining a membership matrix of the data object relative to the cluster center, selecting a fitness function and calculating an individual fitness value;
the genetic annealing module is used for carrying out genetic operation on individuals in the population, updating the cluster center codes of the individuals in the population according to the individual fitness value, then carrying out cooling operation on the current annealing temperature to obtain the updated current annealing temperature, and judging whether the current annealing temperature is less than the annealing termination temperature or not;
and the screening module is used for selecting the individual with the maximum fitness value in the population as the optimal individual, and the clustering center code value of the optimal individual is determined as the final C clustering centers.
9. The daily load curve clustering system based on the fusion evolutionary algorithm as claimed in claim 8, wherein the data preprocessing module specifically comprises:
the data cleaning unit is used for searching missing data and abnormal data in the load data of each original daily load curve, the abnormal data comprise data with sudden drop, sudden increase or negative values, and if the load abnormal data of the original daily load curve reach 10% of the collection amount, the original daily load curve is removed to obtain first spare load data;
the data interpolation unit is used for supplementing and correcting missing data and abnormal data in the first spare load data by adopting a gravity center Lagrange interpolation method to obtain second spare load data;
and the data normalization unit is used for performing normalization processing on the second spare load data by adopting a linear function normalization method to obtain a load data set.
10. The daily load curve clustering system based on the fusion evolutionary algorithm as claimed in claim 8, wherein the genetic annealing module specifically comprises:
the genetic operation unit is used for carrying out selection, crossing and variant genetic operation on individuals in the population to generate new individuals;
a fitness screening unit, configured to calculate a fitness value of a new individual, update a cluster center code value of the individual in the population with the cluster center code value of the new individual if the fitness value of the new individual is greater than or equal to the fitness value of the individual in the population, and update the cluster center code value of the individual in the population with the cluster center code value of the new individual according to a preset probability if the fitness value of the new individual is less than the fitness value of the individual in the population, where the preset probability is:
Figure FDA0003096873040000031
in the formula (f)i' is the fitness value of a New individual, fiIs the fitness value of an individual in the population, and T is the current annealing temperature;
the circulation judging module is used for judging whether the circulation times are larger than the set maximum circulation times or not;
the annealing unit is used for updating the current annealing temperature according to a cooling formula, wherein the cooling formula is as follows:
Ti+1=p×Ti
in the formula, Ti+1Is an updated current annealing temperature value, TiAnd p is the current annealing temperature value and the cooling coefficient.
CN202110613240.9A 2021-06-02 2021-06-02 Daily load curve clustering method and system based on fusion evolution algorithm Pending CN113344073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110613240.9A CN113344073A (en) 2021-06-02 2021-06-02 Daily load curve clustering method and system based on fusion evolution algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110613240.9A CN113344073A (en) 2021-06-02 2021-06-02 Daily load curve clustering method and system based on fusion evolution algorithm

Publications (1)

Publication Number Publication Date
CN113344073A true CN113344073A (en) 2021-09-03

Family

ID=77474612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110613240.9A Pending CN113344073A (en) 2021-06-02 2021-06-02 Daily load curve clustering method and system based on fusion evolution algorithm

Country Status (1)

Country Link
CN (1) CN113344073A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037006A (en) * 2021-11-01 2022-02-11 北方工业大学 Typical daily load curve generation method for power system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003100557A2 (en) * 2002-05-20 2003-12-04 Rosetta Inpharmatics Llc Computer systems and methods for subdividing a complex disease into component diseases
EP1452993A1 (en) * 2002-12-23 2004-09-01 STMicroelectronics S.r.l. Method of analysis of a table of data relating to expressions of genes and relative identification system of co-expressed and co-regulated groups of genes
US7707148B1 (en) * 2003-10-07 2010-04-27 Natural Selection, Inc. Method and device for clustering categorical data and identifying anomalies, outliers, and exemplars
US20140147120A1 (en) * 2012-11-25 2014-05-29 Nec Laboratories America, Inc. Grooming Multicast Traffic in Flexible Optical Wavelength Division Multiplexing WDM Networks
CN104615869A (en) * 2015-01-22 2015-05-13 广西大学 Multi-population simulated annealing hybrid genetic algorithm based on similarity expelling
CN105488589A (en) * 2015-11-27 2016-04-13 江苏省电力公司电力科学研究院 Genetic simulated annealing algorithm based power grid line loss management evaluation method
CN106778826A (en) * 2016-11-25 2017-05-31 南昌航空大学 Based on the hereditary Hybrid Clustering Algorithm with preferred Fuzzy C average of self adaptation cellular
US20170359754A1 (en) * 2016-06-09 2017-12-14 The Regents Of The University Of California Learning-constrained optimal enhancement of cellular networks capacity
CN109711460A (en) * 2018-12-25 2019-05-03 中科曙光国际信息产业有限公司 The acquisition methods and device of initial cluster center
CN110188785A (en) * 2019-03-28 2019-08-30 山东浪潮云信息技术有限公司 A kind of data clusters analysis method based on genetic algorithm
CN111737924A (en) * 2020-08-17 2020-10-02 国网江西省电力有限公司电力科学研究院 Method for selecting typical load characteristic transformer substation based on multi-source data

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003100557A2 (en) * 2002-05-20 2003-12-04 Rosetta Inpharmatics Llc Computer systems and methods for subdividing a complex disease into component diseases
EP1452993A1 (en) * 2002-12-23 2004-09-01 STMicroelectronics S.r.l. Method of analysis of a table of data relating to expressions of genes and relative identification system of co-expressed and co-regulated groups of genes
US7707148B1 (en) * 2003-10-07 2010-04-27 Natural Selection, Inc. Method and device for clustering categorical data and identifying anomalies, outliers, and exemplars
US20140147120A1 (en) * 2012-11-25 2014-05-29 Nec Laboratories America, Inc. Grooming Multicast Traffic in Flexible Optical Wavelength Division Multiplexing WDM Networks
CN104615869A (en) * 2015-01-22 2015-05-13 广西大学 Multi-population simulated annealing hybrid genetic algorithm based on similarity expelling
CN105488589A (en) * 2015-11-27 2016-04-13 江苏省电力公司电力科学研究院 Genetic simulated annealing algorithm based power grid line loss management evaluation method
US20170359754A1 (en) * 2016-06-09 2017-12-14 The Regents Of The University Of California Learning-constrained optimal enhancement of cellular networks capacity
CN106778826A (en) * 2016-11-25 2017-05-31 南昌航空大学 Based on the hereditary Hybrid Clustering Algorithm with preferred Fuzzy C average of self adaptation cellular
CN109711460A (en) * 2018-12-25 2019-05-03 中科曙光国际信息产业有限公司 The acquisition methods and device of initial cluster center
CN110188785A (en) * 2019-03-28 2019-08-30 山东浪潮云信息技术有限公司 A kind of data clusters analysis method based on genetic algorithm
CN111737924A (en) * 2020-08-17 2020-10-02 国网江西省电力有限公司电力科学研究院 Method for selecting typical load characteristic transformer substation based on multi-source data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙宏: ""基于数据流的模糊聚类算法分析与优化"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
殷旅江等: "基于混合遗传模拟退火的模糊C-均值聚类算法", 《湖北汽车工业学院学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037006A (en) * 2021-11-01 2022-02-11 北方工业大学 Typical daily load curve generation method for power system
CN114037006B (en) * 2021-11-01 2024-03-15 北方工业大学 Method for generating typical daily load curve of power system

Similar Documents

Publication Publication Date Title
Adedeji et al. Wind turbine power output very short-term forecast: A comparative study of data clustering techniques in a PSO-ANFIS model
CN113378954B (en) Load curve clustering method and system based on particle swarm improvement K-means algorithm
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN110245783B (en) Short-term load prediction method based on C-means clustering fuzzy rough set
CN106095639A (en) A kind of cluster subhealth state method for early warning and system
CN109344990A (en) A kind of short-term load forecasting method and system based on DFS and SVM feature selecting
CN118114955B (en) Power scheduling method of virtual power plant and related equipment
CN118411003B (en) Load control method, system, device and storage medium for multi-class power device
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
CN113344073A (en) Daily load curve clustering method and system based on fusion evolution algorithm
CN110298506A (en) A kind of urban construction horizontal forecast system
CN114548575A (en) Self-adaptive building day-ahead load prediction method based on transfer learning
CN117458544B (en) Optimization cooperative regulation and control method based on multi-type energy storage resource dynamic aggregation
CN116365519B (en) Power load prediction method, system, storage medium and equipment
CN116628488A (en) Training method of wind power prediction model, wind power prediction method and device
CN115313522A (en) Spare capacity configuration method and system considering new energy output prediction deviation
CN114091782B (en) Medium-long term power load prediction method
CN115081551A (en) RVM line loss model building method and system based on K-Means clustering and optimization
KR20230066927A (en) Method and apparatus for predicting power generation
CN111046321A (en) Photovoltaic power station operation and maintenance strategy optimization method and device
CN118627857B (en) Water resource management device based on energy-saving guiding
CN117200210B (en) Power distribution method and device based on smart grid
CN116632842B (en) Clustering characteristic-based method and system for predicting distribution type photovoltaic load probability of platform
Cheng et al. Hybrid K-means algorithm and genetic algorithm for cluster Analysis
Wang et al. Application of a class of density peak clustering algorithm in short-term smart grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903