CN115438727A

CN115438727A - Time sequence Gaussian segmentation method based on improved image group algorithm

Info

Publication number: CN115438727A
Application number: CN202211048686.2A
Authority: CN
Inventors: 牛晓东; 石振锋; 肖红彬; 崔鲲
Original assignee: Beijing Thinking Shichuang Technology Co ltd
Current assignee: Beijing Thinking Shichuang Technology Co ltd
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-12-06

Abstract

A time sequence Gaussian segmentation method based on an improved image group algorithm relates to the field of time sequences. Aiming at the problem that the solution of the Gaussian segmentation model of the time sequence in the prior art is still in a blank stage, the technical scheme provided by the invention is as follows: a time series Gaussian segmentation method based on an improved image group algorithm comprises the following steps: initializing a time series as an initial population; acquiring fitness function values and positions of variables in all time sequences in an initial population; iteration is carried out; acquiring the position of a variable in a time sequence with the fitness value closest to a preset optimal fitness value; updating operators of the segments in the time series; segment separation operators in the time sequence are executed on the current group; performing population segregation on a current population; updating the position of the optimal elder in the current group; segmenting the time series according to a log-likelihood segmentation criterion. Suitable for application in the segmentation of time series.

Description

Time sequence Gaussian segmentation method based on improved image group algorithm

Technical Field

Relates to the field of time sequences, in particular to a time sequence segmentation method.

Background

Due to their unique ordering and relevance, time series are the subject of interest to a large number of researchers. The research of the time sequence comprises the aspects of time sequence segmentation, similarity measurement, time sequence clustering, data mining, time sequence prediction, time sequence compression and the like. Wherein time series segmentation refers to segmenting time series data so that they do not overlap with each other, and data in a subsequence has a certain property. The different characteristics of the time series are obtained by the segmentation, and the more compact expression forms of the series are obtained at the same time. Therefore, the time series segmentation is a more basic operation step, and the result can support various tasks such as indexing, prediction, clustering, classification, rule discovery, compression and the like, so that the research on the time series segmentation is a very practical matter.

The current time series segmentation research results include a number of approaches. Model-based, such as linear models, hidden markov models, autoregressive moving average models, dynamic factor models, and the like. The methods mainly segment the time sequence by assuming that the time sequence meets certain requirements, so that the segmented sequence segments have some characteristics of the model.

Optimization-based segmentation methods such as segmentation using ant colony, particle swarm, genetic algorithms, and the like. Such methods generally use an optimization algorithm as a means for optimal segmentation segments to correspond to optimal objective functions. The Liu Hui Bin provides a window type ant colony sequence segmentation algorithm by summarizing the characteristics of traffic data, introduces a divide and conquer thought for solving the time efficiency problem, and can greatly improve the time efficiency. Dur n-Rosal et al propose a new time series segmentation algorithm based on a particle swarm algorithm that reduces the number of points in the time series by minimizing the approximation error for each linear interpolation and verifies the applicability of the method in comparison with other state-of-the-art algorithms. Pwint et al describe the speech segmentation problem as an optimization problem and use a genetic algorithm to detect the boundaries of speech segments, implementing a method for automatically segmenting speech signals in a noisy environment.

Clustering-based methods, such as fuzzy clustering, K-means clustering, gath-Geva clustering, and the like. Such methods aim to obtain more similar fragments. Song et al designs a fuzzy C-means clustering objective function considering both the attribute function relationship and the data space distance for the data of the tunnel boring machine, and introduces the sequence relationship of the data to obtain the optimal segmentation result. Tseng et al find the segmentation points by genetic algorithm, cluster the subsequences by K-means, and transform the subsequences with different lengths into subsequences with the same length by discrete wavelet transform. Wu Da Hua et al only consider the intra-class distance and make the segmentation effect of the time series better. Wangni improves the defects that most time sequence segmentation algorithms independently solve the parameter estimation and the segmentation number, gath-Geva clustering is sensitive to the initial value and the like, and the improved method has better result.

The above methods are basically relatively complete. Whereas the method proposed by Hallac et al involves recent advances in optimization and time series segmentation. And (3) assuming that time series variables at each time point are independent to each other and obey Gaussian distribution, and maximizing the possibility of each time series segment through a greedy algorithm to obtain optimal sequence segmentation. However, the algorithm may not obtain an optimal solution. Lim et al propose an improved genetic algorithm for segmenting the time sequence aiming at the defect, and carry out experimental verification through four data sets, so that the optimal segmentation of the time sequence can be better found. This method still has the disadvantage of being inefficient when data is large. In addition, the solution of the gaussian segmentation model for time series is still in a relatively blank stage.

Disclosure of Invention

Aiming at the problem that the solution of the Gaussian segmentation model of the time sequence in the prior art is still in a blank stage at present, the invention provides the technical scheme that:

a time series Gaussian segmentation method based on an improved image group algorithm comprises the following steps:

step 1: initializing a time series as an initial population;

step 2: acquiring fitness function values of variables in all time sequences in the initial population, the position of a segmentation point of a segment in each time sequence and the position of a variable with the maximum population fitness function value;

and step 3: performing iterative operations on the population;

and 4, step 4: acquiring the positions of variables in a time sequence with preset number and the fitness value closest to the preset optimal fitness value;

and 5: updating operators for the segments in the current group execution time sequence, and updating the position of the segmentation point in the segment in each time sequence;

step 6: segment separation operators in the current group execution time sequence are subjected to updating of variable positions in the time sequence in the segments in each time sequence;

and 7: performing population separation on the current population, and updating the fitness value of the variable in the time sequence in the segment in each time sequence;

and 8: updating the optimal positions of the ages in the current group according to the positions acquired in the step 4;

and step 9: judging whether the current iteration times reach a preset maximum iteration time or not, and if so, outputting a current optimal position value and an optimal fitness function value; if not, repeating the step 4 to the step 9;

step 10: segmenting the time series according to a log-likelihood segmentation criterion.

Further, a preferred embodiment is provided, and the step 5 specifically includes:

the method comprises the following steps:

step 5.1: calculating the update position of each dimension of a variable j in the time sequence in a segment i in the time sequence;

step 5.2: judging whether a variable j in the time sequence in the segment i in the time sequence is consistent with the position of a segmentation point of the segment i in the time sequence;

if the current position of the variable j is consistent with the current position of the segment i, updating the position of the segmentation point of the segment i according to the current position of the variable j;

step 5.3: according to step 5.1 and step 5.2, the location update is done for all variables within all segments in the time series.

Further, a preferred embodiment is provided, and in step 5.1, the method for calculating the update position of each dimension of the variable j in the time series in the segment i in the time series comprises: by the formula:

wherein, the first and the second end of the pipe are connected with each other,

represents the updated position of the jth variable in the segment i, x _i，j Indicating the position of the jth variable in segment i,

indicates the position of the variable with the best fitness function value among all variables,

the influence of the variable with the optimal fitness function value in all the variables on other variables is shown, certain disturbance is added,

expressing the influence of the variable with the optimal fitness function value in the segment i on other variables, adding certain disturbance, and expressing a variation mechanism by Levy (lambda);

in step 5.2, the method for updating the segmentation point position of the segment i according to the current position of the variable j comprises the following steps: by the formula:

and

wherein the content of the first and second substances,

representing the updated position of the variable with the best fitness function value in segment i, n _i The number of variables in the segment i is represented,

represents the center position or the mean position of the segment i, and is beta ∈ [0,1 ]]Representing the updated position of the variable with the best fitness function value in segment i

The degree of influence of the position of the center or the mean of the segment i.

Further, a preferred embodiment is provided, where step 6 specifically is:

the method comprises the following steps:

step 6.1: calculating the fitness function value of the variable in the segment i;

step 6.2: updating the position of the variable with the worst fitness function value in the segment i, and calculating the fitness function value of the variable after the position is updated;

step 6.3: steps 6.1 to 6.2 are performed for each segment, and the location update is done for all variables within all segments in the time series.

Further, a preferred embodiment is provided, and in step 6.1, the method for calculating the fitness function value of the variable in the segment i includes: according to the formula:

representing a simplified log-likelihood function, K representing the number of segmentation points, K representing the sequence number of time series segments, S _k Represents the kth time series segment, Σ _k Denotes the kth covariance, lambda denotes the regularization coefficient,

the trace-taking operation is carried out on the k-th covariance inversion;

in step 6.2, the method for updating the position of the variable with the worst fitness function value in the segment i includes: by the formula:

wherein the content of the first and second substances,

representing the update position, x, of the variable in segment i having the worst fitness function value _min Minimum value, x, representing optional position of variable _max Represents the maximum value of the optional positions of the variables, and T represents the length of the time sequence;

the method for calculating the fitness function value of the variable after the position is updated comprises the following steps: by the formula:

further, a preferred embodiment is provided, and the step 7 specifically includes:

the method comprises the following steps:

step 7.1: collecting the positions of variables in a time sequence with the fitness value closest to the preset lowest fitness value in a preset number, and recording the positions as the positions of the variables to be separated;

step 7.2: and updating the positions of the variables to be separated according to the positions of the other variables, and calculating the fitness function value of the variables.

Further, a preferred embodiment is provided, and in the step 1, a specific method for initializing the time sequence is as follows:

step 1.1: initializing the number of time sequence segments, the number of variables in each segment and influence parameters;

step 1.2: randomly initializing the positions of all variables in a time sequence, wherein each variable has K dimensions, and the dimension values are complementary and identical and are arranged from small to large.

Further, a preferred embodiment is provided, and the method for randomly initializing the positions of all variables in the time sequence in step 1.2 specifically includes: by the formula:

the update position of the variable representing the worst fitness function value in segment i, x _min Minimum value, x, representing optional position of variable _max Represents the maximum value of the optional positions of the variables, and T represents the length of the time sequence;

based on the same inventive concept, the present invention further provides a computer storage medium for storing a computer program, wherein when the computer program stored in the storage medium is read by a processor of a computer, the computer executes the time series gaussian segmentation method based on the improved image group algorithm.

Based on the same inventive concept, the invention also provides a computer, which comprises a processor and a storage medium, wherein the storage medium contains a computer program, and when the computer program stored in the storage medium is read by the processor of the computer, the computer executes the time series Gaussian segmentation method based on the improved image group algorithm.

The invention has the advantages that:

the time sequence Gaussian segmentation method based on the improved elephant trunk algorithm considers the influence of a local optimal value and the influence of a global optimal value, fully utilizes the characteristic that the Levy flies long and short and has uniform rain and dew, can lead an elephant individual to approach to the optimal direction, can expand the optimization range, is easy to jump out a local extreme value, and accelerates convergence;

the time sequence Gaussian segmentation method based on the improved image group algorithm provided by the invention can not achieve the optimal effect of a genetic algorithm when the segmentation number is larger, but can not influence the selection of the optimal segmentation number, and can identify the optimal segmentation number no matter whether the EHO algorithm and the improved EHO algorithm can achieve the optimal segmentation when the segmentation number is larger, thereby being capable of well segmenting the time sequence data.

Suitable for application in the segmentation of time series.

Drawings

FIG. 1 is a schematic view of a flow chart of steps one to nine in a time series Gaussian segmentation method based on an improved image group algorithm according to an embodiment;

FIG. 2 is a graph showing the iterative variation of fitness values of the first dimension Gesture data experiment according to several algorithms mentioned in the eleventh embodiment, wherein GA is a genetic algorithm, EHO is an image group algorithm, and EHO is an improved image group algorithm;

FIG. 3 is a result of segmenting Gesture first dimension data according to the eleventh embodiment;

FIG. 4 is a graph showing the variation of the log-likelihood value with the number of divisions according to the eleventh embodiment;

FIG. 5 is a graph showing the results of evaluation criteria of the number of divisions for 3 types of GA, EHO and EHO improvement methods according to the eleventh embodiment; wherein the upper graph is a graph showing the results of 3 evaluation criteria under the GA method, the middle graph is a graph showing the results of 3 evaluation criteria under the EHO method, and the lower graph is a graph showing the results of 3 evaluation criteria under the improved EHO algorithm;

FIG. 6 is a time-series segmentation diagram of the first dimension data of Gesture corresponding to BIC according to the eleventh embodiment;

FIG. 7 shows the result of dividing la data in PSCADA according to the eleventh embodiment;

FIG. 8 is a graph showing the experimental fitness values of several algorithms mentioned in the eleventh embodiment for la data in PSCADA; wherein the upper picture is a global change picture, and the lower picture is a left partial enlarged picture;

FIG. 9 is a diagram showing a variation of a log likelihood value with a division number according to an eleventh embodiment;

FIG. 10 is a graph showing the results of evaluation criteria of the number of divisions for 3 GA, EHO and EHO modified methods according to the eleventh embodiment; wherein the upper graph is a graph of results of 3 evaluation criteria under the GA method, the middle graph is a graph of results of 3 evaluation criteria under the EHO method, and the lower graph is a graph of results of 3 evaluation criteria under the improved EHO method;

Detailed Description

In order to make the advantages and benefits of the technical solutions provided by the present invention more clear, the technical solutions of the present invention will be further described in detail with reference to the accompanying drawings, specifically:

first embodiment, the present embodiment is described with reference to fig. 1, and the present embodiment provides a time-series gaussian segmentation method based on an improved image group algorithm, the method including:

step 1: initializing a time series as an initial population;

step 2: acquiring fitness function values of variables in all time sequences in the initial population, the position of a segmentation point of a segment in each time sequence and the position of the variable with the maximum population fitness function value;

and step 3: performing iterative operations on the population;

and 5: updating the segment updating operators in the current group execution time sequence, and updating the position of the segmentation point in the segment in each time sequence;

and 6: segment separation operators in the current group execution time sequence are subjected to updating of variable positions in the time sequence in the segments in each time sequence;

step 10: the time series is segmented according to a log-likelihood segmentation criterion.

In the second embodiment, the present embodiment is further limited to the time-series gaussian segmentation method based on the improved image group algorithm provided in the first embodiment, and the step 5 specifically includes:

the method comprises the following steps:

if the current position of the variable j is consistent with the current position of the segment i, updating the segmentation point position of the segment i according to the current position of the variable j;

In a third embodiment, the present embodiment is further limited to the time-series gaussian segmentation method based on the improved image group algorithm provided in the second embodiment, and in step 5.1, the method for calculating the update position of each dimension of the variable j in the time series in the segment i in the time series comprises: by the formula:

indicates the updated position of the jth variable in the segment i, x _i，j Indicating the position of the jth variable in segment i,

and

wherein the content of the first and second substances,

representing the update position of the variable with the best fitness function value in the segment i, n _i The number of variables in the segment i is represented,

represents the center position or the mean position of the segment i, and beta is [0,1 ]]Update position of variable expressing optimum fitness function value in segment i

The degree of influence by the center position or the mean position of the segment i.

In a fourth embodiment, the present embodiment is further limited to the time-series gaussian segmentation method based on the improved image group algorithm provided in the first embodiment, and the step 6 specifically includes:

the method comprises the following steps:

In a fifth embodiment, the present embodiment is further limited to the time-series gaussian segmentation method based on the improved image group algorithm provided in the fourth embodiment, and the method for calculating the fitness function value of the variable in the segment i in the step 6.1 includes: according to the formula:

representing a simplified log-likelihood function, K representing the number of segmentation points, K representing the sequence number of time series segments, S _k Represents the kth time series segment, ∑ _k Denotes the kth covariance, lambda denotes the regularization coefficient,

indicating that the trace-taking operation is carried out on the k covariance inverse;

wherein the content of the first and second substances,

sixth, the present embodiment is further limited to the time-series gaussian segmentation method based on the improved image group algorithm provided in the first embodiment, and the step 7 specifically includes:

the method comprises the following steps:

and 7.2: and updating the positions of the variables to be separated according to the positions of the other variables, and calculating the fitness function value of the variables.

Seventh, the present embodiment is further limited to the method for gaussian segmentation of time series based on the improved image group algorithm provided in the first embodiment, and the specific method for initializing the time series in step 1 is as follows:

step 1.2: and randomly initializing the positions of all variables in the time sequence, wherein each variable has K dimensions, and the dimension values are complementary and identical and are arranged from small to large.

Eighth embodiment and the present embodiment are further limited to the method for gaussian segmentation of time series based on improved image group algorithm provided in the seventh embodiment, and in the step 1.2, the method for randomly initializing the positions of all variables in the time series specifically includes: by the formula:

representing the update position, x, of the variable in segment i having the worst fitness function value _min Minimum value, x, representing optional position of variable _max Represents the maximum value of the optional positions of the variables, and T represents the length of the time series;

ninth embodiment provides a computer storage medium storing a computer program, wherein when the computer program stored in the storage medium is read by a processor of a computer, the computer executes the time-series gaussian separation method based on the improved image group algorithm provided in any one of the first to eighth embodiments.

Tenth embodiment provides a computer, comprising a processor and a storage medium, wherein the storage medium contains a computer program, and when the computer program stored in the storage medium is read by the processor of the computer, the computer executes the method for separating time series gaussian based on improved image group algorithm provided in any one of the first to eighth embodiments.

The eleventh embodiment is described with reference to fig. 2 to 10, and the first embodiment provides a specific embodiment for the method for separating time-series gaussian based on improved image group algorithm provided in the first embodiment, and is also used to explain the first to eighth embodiments, specifically:

the tribes mentioned in this embodiment are the segments in the time series, the elephants mentioned are the variables in the time series, the family lengths mentioned are the variables with the highest fitness function values in the time series segments, and the ages mentioned are the variables with the highest fitness function values in the time series.

The optimization content of the image group algorithm is as follows:

the Elephant Herding algorithm (EHO) is a swarm Optimization algorithm, which was proposed in 2016 and has received attention from many researchers. As a meta-heuristic algorithm, for most problems, the method can solve the global unconstrained optimization problem, has rapid local searching capability, and is worthy of global searching capability. In addition, the algorithm parameters are few, the internal structure is clear at a glance, and the method is very suitable for application in some projects. The algorithm mainly comprises two parts, namely a clan update operator and a clan separation operator.

(1) Radical update operator

The behavior and life style of the elephant in the tribe are influenced by the family, and the influence is as follows

Wherein i represents the clan numbered i, j represents the elephant numbered j, x _i,j The original position of the jth head elephant in the clan i is represented, namely the position which is not updated;

showing the updated position of the jth head elephant in the clan i;

representing the elephant corresponding to the optimal fitness value in the clan i, which is the position of the best elephant, namely the position of the family length of the clan i; alpha is an element of [0,1 ]]Is an influence parameter representing the length of the elephant family in the tribe i

The degree of influence of (c); r obeys 0,1]Is a uniform distribution of points representing some other disturbance to which the elephant in clan i is subjected;

the elephant in a clan may be affected by the clan captain of the clan, and correspondingly, the clan captain of the clan may also be affected by the elephant in the clan. Since the elephants live together within the tribe, there is a greater tendency to update the location of the girth with the mean of the locations of all the elephants, as shown below:

wherein n is _i Representing the number of elephants, including the family length, in the tribe i;

in representing tribe iHeart position or mean position;

the updated position of the best elephant in clan i, i.e. the new position of clan i's family;

representing the new position of the represented family

The degree of influence of the center position of the tribe i is also an influence parameter, as is α;

position x above _i，j ，

Either one-dimensional or multi-dimensional, each being calculated in the same way. Therefore, the image group optimization algorithm is suitable for one-dimensional and multi-dimensional optimization.

(2) Radical split operator

Each clan has a corresponding number of males leaving the clan, and for the purpose of algorithm optimization, the worst elephant of each clan leaves the clan, thereby introducing a more effective solution within the full solution range.

x _min A minimum value representing the location of the elephant; x is the number of _max Maximum value of elephant position; x is the number of _i ^worst Representing the position of the elephant corresponding to the worst adaptation value in the clan i, namely the position of the worst elephant;

representing the updated position of the worst elephant in the clan i; rand denotes the random generation of a [0,1 ] word]Thereby ensuring

Within the location boundary.

Algorithm limitation

The clan update operator like the cluster optimization algorithm determines the searching direction and aims to complete more detailed local search, and the clan separation operator is more prone to global search, so that all solutions in the universe can become optimal solutions, and the diversity of the clusters is increased.

Although the cluster optimization algorithm is simple in structure and cheap in operation, the local optimum value is easy to trap, and once the local extreme value is trapped, the local extreme value can be trapped by a plurality of iterations, or the local extreme value cannot be trapped directly. This is largely due to the lack of more rational guidance and mutation mechanisms for the local optimization process of the clan update operator.

Time series Gaussian segmentation based on improved image group optimization algorithm

And (3) a time series Gaussian segmentation model, namely a segmented Gaussian model. Solving this model, i.e., solving equation (2-5):

representing a simplified log-likelihood function, K representing the number of segmentation points, K representing the sequence number of the time series segment, S _k Represents the kth time series segment, ∑ _k Denotes the kth covariance, lambda denotes the regularization coefficient,

is essentially an optimization problem. It can be optimized using an optimization algorithm.

The cluster optimization algorithm is often applied to engineering projects due to the characteristics of few parameters, simple structure, easy operation and the like, but if the local optimal value is included or the iteration times are overlarge, the loss on the engineering or the loss is overlarge, and serious consequences can be caused, so that the improvement of the cluster optimization algorithm is necessary.

And, for a given time series X = { X = ₁ ，…，x _T Therein of

the d-dimensional real number full set is represented, and the segmentation point of the time sequence is an integer between 1 and T, so that the original image group optimization algorithm cannot be completely applied to solving of time sequence Gaussian segmentation, and certain defects exist in the image group optimization algorithm, and therefore the time sequence Gaussian segmentation based on the improved image group optimization algorithm is provided in the embodiment.

Algorithm improvement

The improved image group optimization algorithm adapted to the time series gaussian segmentation mainly has 3 points of improvement or adjustment, as described below.

(1) Improvements in clan update operators.

Because the clan update operator in the image group optimization algorithm only considers the influence of the clan growth on the internal elephant of the clan, neglects the influence of the best elephant in the group on the individual, and the searching capability is still to be improved, the following improvements are made to the clan update operator:

wherein

The position of the elephant with the optimal fitness function value in all the elephants is expressed and called as the senior citizen;

represents the influence of the family of the clan i on the elephant individual, and is added with oneFixed disturbance;

the influence of the optimal population on the elephant individual is also added with certain disturbance, and the elephant individual is influenced mainly by the family and the elderly, so that the influence parameters are alpha and 1-alpha respectively; considering other burst factors, a Levy (lambda) variation mechanism is added, so that the algorithm can jump out local extreme points more easily; round indicates an operation of rounding the value in parentheses, which satisfies the requirement that the time-series division point is an integer.

The improvement strategy considers the influence of the local optimal value and the influence of the global optimal value, fully utilizes the characteristic that the Levy flight distance is uniformly stained with rain and dew, can lead the elephant individual to approach to the optimal direction, can also enlarge the optimization range, is easy to jump out of the local extreme value, and accelerates convergence.

(2) And adjusting a clan separation operator.

To be suitable for the optimization operation for the integer position, the following form of adjustment is made to the equation (2-4).

Wherein the content of the first and second substances,

since the segmentation point of the segmented Gaussian model is left-closed and right-open, the internal segmentation point T of the time sequence with the length of T can be obtained, and the position updating value of the formula (2-7), namely the integer position is selected from 2,L, T.

(3) And (4) separating the populations.

In addition, in order to accelerate the algorithm to approach to the optimal solution, besides the separation operation updating position is performed on the individual with the worst fitness function value in each clan, the whole elephant population can be sorted after the operation of the clan updating operator and the clan separation operator is completed, a certain number of individuals with the worst fitness function values are selected to be separated, and the individuals are updated to the positions of the excellent individuals reserved by the last iteration.

Basic flow

The basic steps of time series gaussian segmentation based on the improved image cluster optimization algorithm are as follows.

TABLE 2-1 basic flow of improved image group algorithm part in time series Gaussian segmentation method based on improved image group algorithm

FIG. 1 is a block diagram of a flow chart of a portion of an improved image group algorithm in a time series Gaussian segmentation method based on the improved image group algorithm; selection of the number of divisions

For time series segmentation, the fewer the segments are better, and the fewer segments cannot segment the patterns therein; the more the segmentation is, the better the segmentation is, the more the redundancy is, and the more the waste of time and space is caused; the reasonable time series segmentation number can be effectively segmented without causing waste, so that the selection of the segmentation number is a problem worthy of exploration. This section proposes a Criterion for selecting the number of time series partitions, which is called a Log likelihood partitioning Criterion (LSC), as shown in the following formula:

in the above formula

And K represents the internal division number of the time sequence.

Regularized log-likelihood function values

Will grow with increasing number of splits, but considering that it is not a practical application criterion that more splits are better, so

The value of (A) cannot be used as a criterion for selecting the division number K, but the measurement of the time series division number K does need to be considered

The influence of (c). Regularized log-likelihood function values from a rate of change perspective

The rate of change of (c) becomes smaller as the division number K becomes larger,

since the variation ratio of (b) becomes larger as the division number K becomes larger, the maximum value of equation (2-8) is always present, that is, the increase in the division number is not enough to significantly affect the log-likelihood value. Therefore, the currently obtained division number K can be used as the optimal division number of the time series.

The equation (2-8) is given above as the reason for the optimal segmentation number measurement standard, and experimental verification is given later, and the standard is rationalized by combining the actual situation, so as to further verify the accuracy of the equation (2-8).

Experimental simulation and result analysis

In order to verify the effect and effectiveness of time series gaussian segmentation based on the improved image group optimization algorithm, the embodiment uses two real data for experimental verification. The first is a set of data named Gesture obtained from the UCI machine learning repository, and the second is PSCADA data of the project in which the topic is located.

Gesture dataset validation

Experimental validation of improved algorithms

The Gesture data set comprises the hand Gesture features of the person extracted from seven videos. The data is original data with 18 attributes, and is therefore multi-dimensional time series data. And selecting the attribute data of the first dimension, and carrying out experimental verification on the one-dimensional time sequence. The number of time-series internal divisions K =4.

From the comparison of the effects of the experiments in table 2-2, it can be seen that the cluster optimization algorithm, the genetic algorithm, and the improved cluster optimization algorithm proposed in this embodiment all achieve the same optimal segmentation, i.e. when the attribute data of the first dimension of the gettrue is segmented by 95, 158, 229, 347, an optimal effect can be achieved, and the reason why K =4 is achieved is shown later. The most adaptive function value at this time is 599.510385826285.

From the viewpoint of achieving the optimal number of iterations, the improved image group optimization algorithm provided by the embodiment is used for segmenting the time sequence, the number of iterations for achieving the optimal number is the minimum, and the original image group optimization algorithm is the maximum, which shows that the improvement provided by the embodiment can obviously accelerate the iterations, and the unnecessary loss can be reduced for engineering application. Furthermore, as is apparent from fig. 2-2, the method proposed in this embodiment rapidly reaches the optimal value, whereas the original image group optimization algorithm undergoes many climbers and iterations at the local extremum, and undergoes many iterations although the position of the optimal value is finally reached. Moreover, although the fitness value iteration map of the genetic algorithm in fig. 2-2 appears to be superior to that presented in the present embodiment, the map of the genetic algorithm appears to exhibit a horizontal line trend before 100 iterations, but is somewhat inferior. Therefore, it can be said that the improved cluster optimization algorithm proposed by the present embodiment can achieve the same effect as or slightly better than the genetic algorithm, which also indicates the good performance of the method proposed by the present embodiment.

TABLE 2-2 comparison of the Effect of several algorithms on Gesture first dimension data experiment

FIG. 2 is a graph of fitness value of several algorithms to Gesture first-dimension data experiment as a function of iteration;

in terms of runtime, 1000 iterations are performed, and initialization is added, the runtime of the method proposed in this embodiment is centered, like the shortest of the cluster algorithm, but three of them are also not very different. If the time for initially reaching the optimal value is calculated according to the proportion, the time for improving the image group optimization algorithm provided by the embodiment is shortest. This indicates that this method is superior to the other two methods.

Fig. 3 is a time-series segmentation chart of the optimal segmentation of the Gesture first-dimension data when K =4, and it can be seen that each segmented time-series segment has a certain pattern, in which the time-series pattern is conformed, which illustrates the effectiveness of using gaussian to segment the time series, and furthermore, illustrates the reasonableness of the number of segments.

Experimental validation of the number of splits

The selection criterion of the time series segmentation number K provided by the embodiment is verified through experiments, and compared with a regularized log-likelihood function value change diagram, an AIC criterion and a BIC criterion, the experiments are respectively carried out under a genetic algorithm, an image group optimization algorithm and an improved algorithm provided by the embodiment, and the comparison shows that the selection criterion of the time series segmentation number K provided by the embodiment is effective.

FIG. 4 is a graph of log-likelihood values as a function of number of segmentations; is a graph of log-likelihood as a function of the number of segments, where the green dots represent the maximum of the corresponding log-likelihood values at these time-series segment points. As can be seen from the figure, the log-likelihood value is increasing with the increase of the division number K, and therefore, cannot be taken as a basis for selecting the division number, and the rate of change thereof is decreasing, confirming the theory described in the present embodiment when proposing the selection criterion of the division number.

Furthermore, the three log-likelihood function value curves of fig. 4 do not completely coincide, coincide when the number of divisions is small, and the optimum number of divisions is included therein. The image group optimization algorithm firstly separates the variation from the overlapped curves, and then is the improved method provided by the embodiment, the genetic algorithm is the optimal of the three methods, and the robustness is the best; when the number of segments gradually increases, the log-likelihood function value obtained by the method proposed in the present embodiment is in the middle of the three methods, and the robustness is enhanced, so that the method is better than the original image group optimization algorithm. However, the total time for performing the optimization process for 80 different segmentation numbers, such as the group optimization algorithm and the method proposed in this embodiment, is much less than that of the genetic algorithm, and especially, the time for performing the optimization process for a larger segmentation number is less than that of the genetic algorithm. Therefore, the method proposed in the present embodiment cannot achieve the optimal effect of the genetic algorithm when the number of segmentations is large, but does not affect the selection of the optimal number of segmentations, so that the time series gaussian segmentation can be performed by the method proposed in the present embodiment.

FIG. 5 is a graph showing the results of evaluation criteria for the number of 3 segmentations under the GA, EHO and modified EHO methods; is a result graph of several model parameter selection criteria under a genetic algorithm, an elephant trunk optimization algorithm and an improved elephant trunk optimization algorithm. The three figures are all double y-axis figures, where AIC and BIC are ordinate on the left y-axis and LSC is ordinate on the right y-axis. It can be seen from the graph of the LSC evaluation value changing with the number of segments that the LSC evaluation values under the three methods all rise first and then fall, and obviously have a maximum value, i.e. the solid point on the upper left of the LSC line, where the number of segments is 4, which shows that there is an obvious difference between the regularized log-likelihood value and the number of segments at this time, and the contradiction between the log-likelihood value and the number of segments can be reflected even more by selecting the number of segments K at this time; and no matter whether the EHO algorithm and the improved EHO algorithm can achieve the optimal segmentation point number when the segmentation point number is large, the optimal segmentation point number can be identified, and the time sequence data can be segmented well.

FIG. 6 is a graph showing the time-series segmentation of the first dimension data of Gesture corresponding to BIC;

the other two dotted lines in fig. 5 are graphs of the variation of the corresponding values with the division number K under the AIC criterion and the BIC criterion, respectively. In the three methods, the AIC is in a monotonous descending state, and then the AIC is likely to continuously descend, but when the segmentation number is 80, each time series segment has 5 data points on average, and the number of the data points is very small, if the optimal segmentation number corresponding to the AIC criterion appears along with the increase of the segmentation number, the data points of the time series segment are only fewer, so that the AIC criterion is not proper to determine the optimal segmentation number; however, the optimal segmentation number of the graph of the BIC criterion under different methods is inconsistent because the EHO and the improved EHO methods cannot achieve the optimal effect when the segmentation number is large. The BIC criterion under GA algorithm has a minimum value when the number of segmentation points is 38, and the corresponding time series segmentation is as shown in fig. 6, it can be seen that the sequence segmentation is very fine, and the adjacent sequence segments have similar features, such as about 100, about 150, about 200, and some segments have only 3 data points, and the too fine segmentation cannot perform the function of pattern recognition, and is not beneficial to the data processing in the future, so the LSC criterion is suitable in comparison.

In addition to the explanation of the time-series division effect, in contrast, the selection criterion LSC of the time-series division number proposed in the present embodiment has a good effect of identifying the division number, and it is further verified that the previous selection of the division number K =4 is a good decision.

Validation of PSCADA datasets

Experimental validation of improved algorithms

The PSCADA data set is derived from a project, is multidimensional time series data, includes dozens of kinds of attribute data, and basically is values of some attributes such as voltage, current, power and the like. Compared with the Gesture data, the PSCADA data has certain periodicity, but is not complete periodic data, and has significance of experimental verification. Similarly, one-dimensional attribute data and current la are selected to perform experimental verification on the one-dimensional time sequence. The optimal number of internal divisions K =5 for this time series was verified by later experiments.

TABLE 2-3 comparison of the Effect of several algorithms on la data experiments in PSCADA

FIG. 7 shows the result of dividing la data in PSCADA

Observing tables 2-3 and fig. 7, the image group optimization algorithm, the genetic algorithm and the image group optimization algorithm based on improvement proposed by the present embodiment can all achieve the same optimal segmentation and the corresponding optimal fitness function value, wherein the optimal segmentation is 18, 35, 52, 69, 86, and the optimal fitness function value is 386.109707579763, and it can be seen from the figure that this internal segmentation point really divides the data, and the time series data segment after segmentation has certain pattern or morphological characteristics, and for the data in PSCADA, the periodicity of the data is well represented. Meanwhile, the fact that the Gaussian partition is used for partitioning the time sequence is also described, and the data with certain periodicity or the data without periodicity can produce good effects, and the performance of the data is worthy of proposition.

"reaching the optimal number of iterations" includes initializing the optimal fitness function value. For this index, the number of times that the improved image group optimization algorithm proposed by this embodiment reaches the optimal iteration is the minimum, only 13 times, more precisely, it should mean that 12 iterations are performed, and the optimal segmentation state is reached, and this index value of the original image group optimization algorithm is 626, which is much higher than the method proposed by this embodiment, which shows that for data with a certain periodicity, a good time series segmentation can still be achieved by using the method proposed by this embodiment. In addition, as for the widely used genetic algorithm, "the number of iterations to be optimized" is higher than that of the method proposed in the present embodiment, and the applicability of the method proposed in the present embodiment is also described from another point of view.

With respect to the index of "running time", the time of use of the method proposed in the present embodiment is the longest, the time of genetic algorithm is the shortest, and the time of cluster optimization algorithm is in the middle, so the method proposed in the present embodiment does not take advantage of the evaluation of the index, but the value of "time to reach the optimum value" is smaller than the other two methods, especially much smaller than the cluster optimization algorithm, because the number of iterations to reach the optimum segmentation state is small. If the number of iterations is reduced, the method proposed in this embodiment can be implemented in a total time shorter than the other two methods.

FIG. 8 is a graph of the fitness value of several algorithms experimenting with la data in PSCADA; the normalized log-likelihood function graphs under the GA algorithm, the EHO algorithm and the improved EHO algorithm test proposed in the present embodiment when the division number K =5 are shown, the upper graph represents a global graph, and the lower graph is a local enlarged graph when 0 to 200 iterations are cut. It can be seen that the genetic algorithm is slowed down first, but is faster than the EHO algorithm because the EHO algorithm is trapped in local extrema. The cluster optimization algorithm falls into a local extreme value for multiple times, so that the optimal value is found only when the iteration times reach more than 600; the genetic algorithm also has the situation of falling into a local extremum, but an optimal value is quickly found through 20 iterations; the improved image group optimization algorithm provided by the embodiment has short time and iteration times of local extremum, and an optimal value is quickly found. The method provided by the embodiment is superior to genetic algorithm and cluster optimization algorithm.

Experimental validation of the number of splits

The selection criterion of the time series division number K provided by the embodiment is verified by using la data in a PSCADA data set with certain periodicity, and experiments are respectively performed under a genetic algorithm, a cluster optimization algorithm and an improved algorithm provided by the embodiment, so that the comparison shows that the selection criterion of the time series division number K provided by the embodiment is effective.

Fig. 2-9 show several algorithms for log-likelihood versus number of splits, where the green dots represent the maximum of the corresponding log-likelihood values for these time series split number of dots. As can be seen from the figure, for data with a certain periodicity, as the segmentation number K continuously increases, the log-likelihood value increases first, and starts to gradually decrease after reaching the maximum value, wherein the segmentation point number corresponding to the maximum log-likelihood value is the optimal segmentation number determined by the formula (2-13), the AIC criterion and the BIC criterion. Therefore, for periodic data, the log-likelihood function value can be used as a basis for selecting the number of divisions, and the division number corresponding to the maximum value of the log-likelihood function value just proves the effectiveness of the division number criterion provided by the embodiment.

FIG. 9 is a graph of log-likelihood values as a function of number of segmentations; it can be seen that, for different segmentation numbers, the log-likelihood function values obtained by the genetic algorithm are all optimal, and have no fluctuation and strong robustness; the image cluster optimization algorithm cannot achieve the optimal segmentation effect at each segmentation point, and has high fluctuation, which indicates that the image cluster optimization algorithm has certain defects and poor robustness; the improved image group optimization algorithm can achieve the optimal segmentation effect when the segmentation number is small, even the improved image group optimization algorithm is slightly better than the genetic algorithm, when the segmentation number is large, the segmentation effect is slightly inferior to the genetic algorithm, and the improved image group optimization algorithm has small fluctuation, which indicates that the robustness is strong. Therefore, the segmentation effect of the method proposed in this embodiment is stronger than that of the clustering optimization algorithm, and is comparable to the genetic algorithm.

FIG. 10 is a graph of the results of several model parameter selection criteria under genetic algorithms. The three graphs are each a dual y-axis graph, where AIC and BIC are ordinate on the left y-axis and LSC is ordinate on the right y-axis. It can be seen from the graph of the LSC evaluation value along with the variation of the number of segments under the GA algorithm and the improved EHO algorithm that the LSC evaluation value is first rising and falling, and the variation trend of the log-likelihood value is expanded to have a maximum value, i.e., a solid point on the line of the LSC, where the number of segments is 5, which shows that the influence caused by the variation of the regularized log-likelihood value at this time cannot be summarized with the influence caused by the variation of the number of segments, so that the number of segments K at this time can reflect the contradiction between the log-likelihood value and the number of segments, and further, the time series data can be well segmented; the robustness of the EHO algorithm is poor, so that the fluctuation of the 3-term evaluation value is large, and a certain basis cannot be provided for the optimal segmentation number.

The two dotted lines in fig. 10 are respectively the graphs of the variation of the corresponding value with the segmentation number K under the AIC criterion and the BIC criterion, which are the variation trends of monotonically decreasing and then monotonically increasing, the optimal segmentation numbers corresponding to the minimum values of the two evaluation values under the improved EHO algorithm and the improved GA algorithm are both 5, and the optimal segmentation numbers determined by the LSC evaluation value are consistent, so that it can be proved that the LSC evaluation criterion proposed by the embodiment has its rationality.

In summary of the above analysis, for the data with periodicity, the selection criterion LSC of the division number of the time series proposed in this embodiment has a good effect of identifying the division number, and it is further verified that the selection of the division number K =5 in this subsection is a good decision.

Conclusion

The embodiment mainly provides an improved image group optimization algorithm, applies the improved image group optimization algorithm to Gaussian segmentation of a time sequence, provides a selection criterion of a basic flow and the segmentation number of the time sequence, and correspondingly carries out experimental verification.

The time series Gaussian segmentation based on the improved elephant trunk optimization algorithm is compared with experiments based on the genetic algorithm and the elephant trunk algorithm, and the Gesture data and the PSCADA data are selected for experiments, so that the improved elephant trunk optimization algorithm provided by the embodiment can obtain a good effect when being applied to the Gaussian segmentation of the time series, and is stronger than the elephant trunk algorithm and the genetic algorithm. In addition, experiments show that the selection criterion LSC of the number of time series divisions proposed in the present embodiment can obtain a good result compared to the conventional AIC criterion and BIC criterion, which indicates the applicability of the criterion.

Although the time-series gaussian segmentation method based on the improved image group algorithm provided by the present invention is described in further detail by using several embodiments, the several embodiments described above are not intended to limit the present invention, and any modifications and improvements, combinations of embodiments, improvements, equivalents, and the like based on the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The time series Gaussian segmentation method based on the improved image group algorithm is characterized by comprising the following steps:

step 1: initializing a time series as an initial population;

and step 3: performing iterative operations on the population;

and step 9: judging whether the current iteration times reach a preset maximum iteration time, and if so, outputting a current optimal position value and an optimal fitness function value; if not, repeating the step 4 to the step 9;

2. The time-series gaussian separation method based on the improved image group algorithm according to claim 1, wherein the step 5 specifically comprises:

the method comprises the following steps:

3. The method for Gaussian segmentation of time series based on improved image group algorithm as claimed in claim 2, wherein in step 5.1, the method for calculating the updated position of each dimension of the variable j in the time series in the segment i in the time series comprises: by the formula:

wherein the content of the first and second substances,

represents the updated position of the jth variable in the segment i, x _i，j Indicates the position of the jth variable in segment i,

the influence of the variable with the optimal fitness function value in the fragment i on other variables is expressed, certain disturbance is added, and Levy (lambda) expresses a variation mechanism;

and

representing the update position of the variable with the best fitness function value in the segment i, n _i Indicates the number of variables in the segment i,

represents the center position or the mean position of the segment i, and is beta ∈ [0,1 ]]Update position of variable expressing optimum fitness function value in segment i

4. The time-series gaussian segmentation method based on the improved image group algorithm according to claim 1, wherein the step 6 specifically comprises:

the method comprises the following steps:

5. The method for time series gaussian segmentation based on improved image group algorithm according to claim 4, wherein in step 6.1, the method for calculating the fitness function value of the variable in the segment i is as follows: according to the formula:

wherein the content of the first and second substances,

the trace-taking operation is carried out on the k-th covariance inversion;

6. the time-series gaussian separation method based on the improved image group algorithm according to claim 1, wherein the step 7 specifically comprises:

the method comprises the following steps:

step 7.1: acquiring the positions of a preset number of variables in a time sequence with the fitness value closest to the preset lowest fitness value, and recording the positions as the positions of the variables to be separated;

7. The method for separating gaussians in time series based on improved image group algorithm as claimed in claim 1, wherein the specific method for initializing time series in step 1 is:

8. The time series gaussian separation method based on the improved image group algorithm as claimed in claim 7, wherein in the step 1.2, the method for randomly initializing the positions of all variables in the time series specifically comprises: by the formula:

wherein the content of the first and second substances,

the update position of the variable representing the worst fitness function value in segment i, x _min Minimum value, x, representing optional position of variable _max Represents the maximum value of the optional positions of the variables, and T represents the length of the time series.

9. Computer storage medium for storing a computer program, wherein the computer executes the method for time series gaussian separation based on modified constellation algorithm according to any one of claims 1 to 8 when the computer program stored in the storage medium is read by a processor of the computer.

10. Computer comprising a processor and a storage medium containing a computer program, wherein the computer program stored in the storage medium is read by the processor of the computer and executes the method for separating time series gaussians based on an improved image group algorithm as claimed in any one of claims 1 to 8.