CN110554838B

CN110554838B - Thermal data prediction method based on joint optimization echo state network

Info

Publication number: CN110554838B
Application number: CN201910566123.4A
Authority: CN
Inventors: 罗旗舞; 王玥童; 阳春华; 桂卫华; 周灿
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2020-08-14
Anticipated expiration: 2039-06-27
Also published as: WO2020259543A1; CN110554838A

Abstract

The invention discloses a thermal data prediction method based on a joint optimization echo state network, which creatively uses quantum particle swarm optimization to calculate storage layer parameters of the echo state network to obtain optimal storage layer parameters, and combines L in the particle position updating process₂+ adaptive L_1/2Outputting a weight value and calculating a global optimal adaptation value in the regularized constrained echo state network calculation process, taking a particle position corresponding to the global optimal adaptation value as an optimal storage layer parameter when iteration is finished, and finally adopting L based on the optimal storage layer parameter₂+ adaptive L_1/2The method comprises the steps of calculating a final output weight value by a regularized constrained echo state network, predicting thermal data by using the final output weight value and a logic block address where the input historical thermal data is located, and predicting the data on the logic block address to be the thermal data.

Description

Thermal data prediction method based on joint optimization echo state network

Technical Field

The invention belongs to the technical field of chaotic time sequence prediction, and particularly relates to a thermal data prediction method based on a joint optimization echo state network.

Background

As a non-volatile storage technology, NAND flash memory is widely used in communication systems and consumer electronics products, and has higher access speed and power efficiency compared to hard disk drives. In NAND flash based consumer electronics, a large number of applications rely on NAND flash for data exchange, file storage and video storage. The NAND flash memory is mainly used for storing large-capacity data, and the NAND structure can provide extremely high unit density and can achieve high storage density, high writing-in speed and high erasing speed; therefore, NAND flash memories are widely used for large-capacity data storage, such as solid-state disks. The demand of the NAND flash memory will continue to increase in the future, and the NAND flash memory is mainly applied to the fields of cloud computing, the Internet of things, data centers and the like.

However, NAND flash memory faces at least two challenges, namely displaced updating and limited endurance, which limits its large scale application. NAND flash memory suffers from the drawback of not being able to override an operation, i.e. not being able to perform a new write operation on a page of flash memory before the page is erased. Improper updates will generate many invalid and dead pages, which can reduce efficiency and performance. Furthermore, NAND flash memory has a limited lifetime because a flash block can only withstand a limited number of erasures, and if the number of erasures of a block is greater than the maximum number of erasures of the block, it will not be usable. Garbage Collection (GC) and Wear Leveling (WL) are design ideas of allocating frequently written data (i.e., hot data) to blocks with a small number of erase times and allocating the least recently used data (i.e., cold data) to blocks with a large number of erase times, and have important effects on solving these two challenges, while the efficiency and performance of GC and WL depend on Hot Data Identification (HDI) to a large extent. The nature of HDI is to try to understand well the access behavior of hot data in order to intelligently assign different data to the appropriate blocks, but conventional HDI has two problems, one of which is that memory overhead is large. At present, most of hot data identification mechanisms adopt the idea of identifying hot data pages in a NAND flash memory, and the core principle of the mechanisms is a page counter, which records the read-write operation times of logical pages corresponding to the NAND flash memory pages within a certain time, if the read-write times are greater than a set threshold, the request page is judged to be a hot page, otherwise, the request page is a cold page. Another serious problem is that the recognition accuracy is not high, and the Bloom filter-based hot data recognition mechanism is widely applied to the cold and hot data recognition of SSD, but the Bloom filter has the inherent defect of false positive, that is, the data not belonging to the set is erroneously determined to be in the set. In addition, thermal data identification mode considerations such as load request size and load access mode are relatively single, and the locality characteristics of the load cannot be completely and comprehensively considered, so that the accuracy of thermal data identification is not high.

Disclosure of Invention

The invention aims to provide a thermal data prediction method based on a joint optimization echo state network, which creatively proposes to replace the traditional thermal data identification with thermal data prediction and constructs the joint optimization echo state network, thereby ensuring that the predicted thermal data has more real-time property and reliability.

The invention provides a thermal data prediction method based on a joint optimization echo state network, which comprises the following steps:

s1: initializing parameters required by a quantum particle swarm algorithm and position information of each particle;

the position information of the particles comprises initial positions and position ranges of the particles, and the position of each particle is represented by storage layer parameters in an echo state network;

s2: iterative optimization is carried out by utilizing a quantum particle group algorithm to determine the optimal storage layer parameters;

updating the positions of the particles by adopting a quantum particle swarm algorithm based on the position range of each particle, and utilizing L in each updating process₂+ adaptive L_1/2Outputting a weight value in the regularized and constrained echo state network calculation process to obtain a global optimal adaptation value, and taking a particle position corresponding to the global optimal adaptation value as an optimal storage layer parameter when iteration is finished;

s3: optimal storage layer parameter adoption L in network based on echo state₂+ adaptive L_1/2The regularized and constrained echo state network calculates the final output weight;

s4: and predicting the hot data by using the final output weight and the address of the logic block where the input historical hot data is located, wherein the prediction formula is as follows:

y＝x*W_out(1)

wherein y represents the obtained predicted logic block address, the data at the predicted logic block address is thermal data, x is the logic block address of the input historical thermal data, W_out(1)Indicating the output weight, the logical block address where the historical thermal data is located is used in the echo state network training process in step S2 and step S3.

The invention innovatively predicts the thermal data used by the thermal data identification module in the flash memory conversion layer while continuing to use the solid state disk structure frameworkInstead, the thermal data is predicted by using a joint optimization echo state network, the joint optimization comprises two parts, the first part uses a quantum particle swarm algorithm to iteratively optimize and determine the optimal storage layer parameters of the echo state network, and the second part uses L₂+ adaptive L_1/2The regularization constrained ESN obtains the output weight of high sparsity, and the invention iteratively optimizes the quantum particle swarm algorithm and L₂+ adaptive L_1/2The optimal storage layer parameters are obtained by combining regularization constraints, and the joint optimization echo state network for prediction has high instantaneity and reliability. The invention trains the echo state network by using the logic block address where the historical thermal data is located to obtain the final output weight value, and then predicts the final output weight value as the logic block address where the thermal data is located by using the final output weight value.

Further preferably, the iterative optimization for determining the optimal storage layer parameters in step S2 is performed as follows:

s21: the position of each particle is sequentially used as a storage layer parameter in an echo state network, and L is respectively adopted₂+ adaptive L_1/2The echo state network of regularization constraint calculates the output weight of the process corresponding to each particle;

the current position of each particle is sequentially used as a storage layer parameter in the echo state network and a process output weight is calculated;

s22: calculating the adaptive value of each particle by using the process output weight corresponding to each particle;

s23: selecting an individual optimal adaptive value, an individual optimal parameter, a global optimal adaptive value and a global optimal parameter of each particle according to the adaptive value of each particle based on a minimum adaptive value principle;

wherein, the position of the particle selected as the global optimum adaptive value is the global optimum parameter;

s24: updating the position of each particle in the position range of the particle, recalculating the adaptive value of each particle based on the updated position of each particle, and updating the individual optimal adaptive value, the individual optimal parameter, the global optimal adaptive value and the global optimal parameter of each particle based on the minimum adaptive value principle;

s25: judging whether the iteration times reach the maximum iteration times, if not, returning to the step S24 for the next iteration calculation; otherwise, the current global optimum parameter is taken as the optimum storage layer parameter.

Further preferably, the position of any particle j is updated according to the following formula:

wherein,

in the formula, P_j(t+1)、P_j(t) represents the positions of the particles j after and before the update, respectively,

and u_jAre all random numbers, sbest_j、sbest_iThe individual optimal parameters of the jth and ith particles are shown, the mbest is the average value of the current individual optimal parameters of all the particles, iter and iter_maxRespectively the current iteration number and the maximum iteration number, omega_max、ω_minThe maximum inertia factor and the minimum inertia factor are respectively, and N is the total number of particles.

Further preferably, the calculation formula of the adaptive value of any particle j is:

where Fitness denotes the adaptation value, λ, of the current particle j₁、λ₂Are all the coefficients of the regularization, and,

outputting the weight W for the process corresponding to the current particle j_out(2)(ii) a Y represents the rear section of the logic block address where the historical thermal data used for network training is located, and X represents the storage layer updated based on the front section of the logic block address where the historical thermal data used for network training is locatedThe state information of (a) is stored in the memory,

representing a process output weight corresponding to a current particle j

And the prediction result corresponding to the later section of the logic block address of the historical thermal data.

Further preferably, L is used₂+ adaptive L_1/2The process of calculating the final output weight or the process output weight by the regularized constrained echo state network is as follows:

u401: acquiring an input layer-storage layer weight matrix in an echo state network, wherein the internal connection of the storage layer is the weight matrix, and the front section of a logic block address where historical thermal data is located is used as an input variable U, and the rear section is used as an actual result Y;

wherein, the input layer-storage layer weight matrix and the storage layer internal connection weight matrix are related to the storage layer parameters in the echo state network;

u402: updating state information X of the storage layer based on the input variable U, wherein the state information X is composed of state node information X (t);

X(t)＝log sig(U(t)W_in+X(t-1)W_x)

wherein, U (T) represents the T-th data in the input variable U, X (T) and X (T-1) represent the T-th and T-1-th state node information respectively, the maximum value T of T is determined by the data length of the input variable U, W_in、W_xRespectively representing an input layer-storage layer weight matrix in the echo state network, wherein the storage layer is internally connected with the weight matrix, and logsig (phi) represents an activation function;

u403: based on L₂+ adaptive L_1/2Obtaining an output weight value under the minimum value of the loss function by the loss function under the regularization constraint;

in the formula, E represents a loss function, λ₁、λ₂Are all regularization coefficients. Wherein, if the process output weight in step S2 is calculated, the weight W is output_outEquals to the process output weight; if the final output weight is calculated in step S3, the weight W is output_outEqual to the final output weight.

Further preferably, the acquiring process in step U403 is: simplifying the loss function, and calculating an output weight by adopting a coordinate descent algorithm;

wherein the simplified loss function is represented as:

there are:

wherein I is an identity matrix;

solving matrix W'_outIs calculated for each element separately, W'_outThe value of the kth element of the mth row is as follows:

wherein,

of formula (II) to (III)'_k(t) denotes the kth element, X 'of the kth line of Y'_j(t) denotes the t-th element of the jth line of X'.

Represents matrix W'_outJ (j) th line k (k) th element of (2)>When m is greater than the total number of the carbon atoms,

is zero; l is the number of output layer nodes, and n is the number of storage layer nodes.

Finally using matrix W'_outAnd the output weight W_outCalculating the output weight W according to the relationship_out。

Further preferably, the method further includes performing adaptive optimization on the output weight obtained in step U403, where the optimization process is as follows:

converting the loss function, and calculating the weight W' by adopting a coordinate descent algorithm_outThen, the weight W ″, is utilized_outCalculating an optimized output weight;

the loss function after conversion is:

weight W_outAnd the output weight W_outThe relationship of (1) is:

wherein,

k is the number of output layer nodes.

Further preferably, the storage layer parameters in the echo state network comprise four key parameters of internal connection spectrum radius, storage layer scale, input layer proportion coefficient and storage layer sparsity.

Further preferably, the parameters required for initializing the quantum-behaved particle swarm algorithm in step S1 include a total number N of particles, and a maximum iteration number iter_maxAnd a maximum inertia factor omega_maxAnd minimum inertia factor omega_min。

Further preferably, when the particle position is updated, if the particle movement distance exceeds the position range corresponding to the particle, the particle position parameter is set to exceed a boundary value corresponding to the position range.

Advantageous effects

1. The invention innovatively provides that the traditional hot data identification is replaced by hot data prediction, the disclosed hot data prediction technology can predict the property of the next data in advance by one or even a few beats according to historical access behaviors, and actively allocates and stores the next data to the corresponding block (hot/cold data block) of the Solid State Disk (SSD). Meanwhile, the neural network method reserves more characteristic information for input quantity and classifies the thermal data more comprehensively.

2. The invention performs joint optimization, L, on the echo state network₂Regularization constraint achieves good generalization ability through a trade-off between model bias and prediction variance to be able to obtain weights for successive contractions, however it does not produce sparse solutions; adaptive L_1/2Regularization can generate very sparse solutions, but when there is a high correlation between the predictor variables, L_1/2Cannot well play a regulating role, and the invention comprehensively adopts L₂+ adaptive L_1/2The regularization training least square method can obtain the advantages of two regularizations, and further improves the prediction precision of the thermal data. In addition, echo state network storage layer parameters are optimized based on a QPSO algorithm, and the problem that the storage layer parameters cannot be determined when a model is built can be solved. Compared with the traditional PSO algorithm, the algorithm removes the speed information of particles based on the wave-particle duality, only retains the position information, can effectively reduce the complexity of calculation, and simultaneously obtains the storage layer parameters of the adaptive model, thereby further improving the prediction precision; furthermore, the invention provides L₂+ adaptive L_1/2The regularization and the QPSO algorithm are combined to obtain the optimal storage layer parameters, and the prediction precision is improved.

Drawings

FIG. 1 is a typical architecture of a NAND flash memory system;

FIG. 2 is a flow chart of a method for hot data prediction based on joint optimization echo state network according to an embodiment of the present invention;

FIG. 3 is a flow chart of the iterative optimization algorithm of the quantum-behaved particle swarm optimization of the present invention;

FIG. 4 shows the use of L in the present invention₂+ adaptive L_1/2And (3) a specific algorithm flow chart for computing the output weight value by the constrained echo state network.

FIG. 5 is a graph comparing the performance of four actual workloads according to one embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the following examples.

The invention provides a hot data prediction method based on a joint optimization echo state network, which is mainly applied to a NAND flash memory system, and as shown in FIG. 1, the hot data prediction method is a typical architecture of the NAND flash memory system, and comprises a module B101 (operation of a user), a module B102 (file system) and a module B103 (solid state disk). The actual operation of the user will affect the solid state disk through the file system. The solid state disk further comprises a Flash memory conversion layer, a Flash controller and a NAND Flash array, wherein the Flash memory conversion layer comprises an address allocation unit, a garbage recovery unit, a wear leveling unit and a hot data prediction unit, the invention innovatively provides that the hot data prediction unit replaces a traditional hot data identification unit, a traditional hot data identification method is used for passively analyzing user access behaviors, corresponding data are allocated and stored to corresponding blocks (hot/cold data blocks) of the Solid State Disk (SSD) through a Flash transport layer protocol (FTL), and the method has higher hot data missing detection or false alarm when the method is used for responding to a request with complex access behaviors. The hot data prediction technology disclosed by the invention can predict the property of the next data in advance by one or even several beats according to historical access behaviors, actively distributes and stores the data to a corresponding block (hot/cold data block) of a Solid State Disk (SSD), and is compatible with secondary verification of a traditional hot data identification scheme. Accordingly, the thermal data prediction method proposed by the present invention is essentially "predictive thermal data identification". The finally obtained address information of the prediction logic block is used for garbage recovery and wear leveling processing.

In the process, the wear leveling and the garbage collection have larger influence in the solid state disk. The traditional thermal data identification is to accurately and efficiently distinguish which data belongs to valid data. The invention provides a hot data prediction method based on a joint optimization echo state network, which replaces hot data identification with hot data prediction, has high precision prediction and specifically comprises the following steps:

s1: initializing parameters required by the quantum particle swarm algorithm and position information of each particle.

The position information of the particles comprises initial positions and position ranges of the particles, the position of each particle is represented by storage layer parameters in an Echo State Network (ESN), the storage layer parameters in the ESN comprise internal connection spectrum radius, storage layer scale, input layer proportion coefficient and storage layer sparsity, the dimension of each particle is initialized to 4 in the example, namely each particle is a matrix of 1 × 4 and represents 4 parameters of the ESN storage layer respectively. Determining the range of 4 parameters, defining the range of the parameters as the position range of all the particles, randomly assigning a value to each particle in the position range during initialization, considering that the particles continuously move towards the optimal direction in the specified range in the subsequent updating process, and updating the position information of the particles into a boundary value if the particles exceed the specified range in the movement process. Wherein each particle location represents a specific value of an ESN storage layer parameter.

The parameters required by the quantum particle swarm optimization comprise the total number N of particles and the maximum iteration number Iter_maxMaximum inertia factor omega_maxAnd minimum inertia factor omega_min(for subsequent updating of the particle position information).

updating the positions of the particles by adopting a quantum particle swarm algorithm based on the position range of each particle, and utilizing L in each updating process₂+ adaptive L_1/2And outputting the weight value in the regularized and constrained echo state network calculation process to obtain a global optimal adaptive value, and taking the particle position corresponding to the global optimal adaptive value as an optimal storage layer parameter when iteration is finished. The specific process comprises the following steps：

S21: taking the position of each particle as a storage layer parameter in an echo state network, and respectively adopting L₂+ adaptive L_1/2The echo state network of regularization constraint calculates the output weight of the process corresponding to each particle;

Based on the above logic, an embodiment of the present invention provides an example flowchart as shown in fig. 3, which includes the following steps:

u301: and (4) iteration initialization, setting the current iteration number iter to be 1, and setting the particle label j to be 1.

U302: setting the position of the jth particle as ESN storage layer parameter, and utilizing L₂+ adaptive L_1/2Least square calculation occurring in regularization constraint training to obtain process output weight W with higher sparsity_out(2). By L₂+ adaptive L_1/2Regularized constrained ESN calculation output weight W_outThe detailed steps are as followsAs shown in fig. 4, and described in more detail below.

U303: process output weight W based on jth particle_out(2)Calculating an adaptive value corresponding to the jth particle, wherein the calculation formula is as follows:

in the formula, λ₁、λ₂Are all the coefficients of the regularization, and,

outputting the weight W for the process corresponding to the current particle j_out(2)(ii) a Y represents the rear section of the logical block address where the historical thermal data used for network training is located, X represents the state information of the storage layer updated based on the front section of the logical block address where the historical thermal data used for network training is located,

representing a process output weight corresponding to a current particle j

D301: and judging whether all the particles finish the calculation of the adaptive values, if not, adding 1 to j, returning to the step U302 to calculate the adaptive value of the next particle, and if all the particles finish the calculation of the adaptive values, performing the step U304.

U304: and selecting the individual optimal adaptive value, the individual optimal parameter, the global optimal adaptive value and the global optimal parameter of each particle according to the adaptive value of each particle based on the minimum adaptive value principle. After all the particles calculate the adaptive values, comparing and judging, recording the adaptive value of each particle as an individual optimal adaptive value fsbest, and recording the position of each particle as an individual optimal parameter sbest; and recording the minimum particle adaptive value in all the particle adaptive values as a global optimal adaptive value fgbest, and the corresponding position of the minimum particle adaptive value is a global optimal parameter gbest. These parameters obtained will be used for subsequent iteration optimization.

U305: the iteration starts and the particle index j is reset to 1.

U306: and (3) calculating the mbest corresponding to the jth particle, wherein the calculation formula is as follows:

in the formula, sbest_iAnd (4) representing the individual optimal parameters of the ith particle, wherein mbest is the average value of the current individual optimal parameters of all the particles, namely, the average value of each dimension parameter of all the particles is respectively taken for updating the position information of the particles.

U307: and updating the position information of the jth particle of the particle, wherein the updating formula is as follows:

wherein, P_j(t+1)、P_j(t) represents the positions of the particles j after and before the update, respectively,

and u_jAre random numbers between (0,1), wherein β has the formula:

as can be seen from the calculation formula of beta, in the earlier stage of iteration, the parameter beta representing the step length of the movement of the particles is larger, and the particles can move to the optimal position more quickly; and the later period of the iteration beta is smaller, which means that the particle size is reduced near the optimal position, and each movement is more accurate to be close to the optimal position.

After updating the location information, the newly obtained storage layer parameters are reused with L₂+ adaptive L_1/2The ESN of the regularization constraint recalculates the adapted value, so D302, D303, U308, U309 are: updating the individual best and the global best according to the newly calculated adaptive value, if the newly calculated adaptive value is less than theUpdating the individual optimal adaptation value of the particle to a newly calculated adaptation value and updating the individual optimal parameter of the particle to be the parameter of the current particle; and if the newly calculated adaptive value is smaller than the global optimal adaptive value, updating the adaptive value to be the global optimal adaptive value, and updating the global optimal parameter to be the parameter of the particle.

D304: and judging whether all the particles are updated, if not, j +1, returning to U306, recalculating mbest by using the updated particle parameters, updating the position information of the next particle, and if all the particles are updated, performing D305.

D305: judging whether the iteration times reach the maximum iteration times, if not, adding 1 to iter, returning to U305, carrying out the next iteration, and if the maximum iteration times are reached, deriving the final global optimal parameter for the subsequent training of the joint optimization echo state network to predict the logic block address.

S3: optimal storage layer parameter adoption L in network based on echo state₂+ adaptive L_1/2And calculating the final output weight value by the regularized echo state network. The process of calculating the final output weight is shown in fig. 4, which will be described in detail below.

y＝x*W_out(1)

wherein y represents the obtained predicted logic block address, the data at the predicted logic block address is thermal data, x is the logic block address of the input historical thermal data, W_out(1)Representing the final output weights. Where y is the predicted access address, note that x and W_out(1)All can be multidimensional variables, and the obtained y is a one-dimensional variable, and the obtained data on the logical block address is classified as hot data for garbage recovery and wear leveling processing.

When calculating the output weight, a group of storage layer parameters, namely the radius of an internal connection spectrum, the scale of the storage layer, the proportional coefficient sum of an input layer are determinedStorage layer sparsity. As shown in fig. 4, L is used in the present invention₂+ adaptive L_1/2The process of computing the output weight value by the regularized constrained echo state network is as follows:

u401: and acquiring an input layer-storage layer weight matrix in the echo state network, wherein the storage layer is internally connected with the weight matrix, and the front section of the logic block address where the historical thermal data is located is used as an input variable U, and the rear section is used as an actual result Y.

Specifically, an Echo State Network (ESN) is a low-complexity and fast-convergence calculation scheme, and is suitable for time data classification and prediction tasks. The ESN network architecture includes three layers: an input layer, a storage layer and an output layer, wherein the input layer-storage layer weight is W_inThe internal connection weight of the storage layer is W_xThe storage layer-output layer weight is Wout. The number of nodes in the input layer, the storage layer and the output layer is initialized to be K, n and L, and the number n of the nodes in the storage layer is the storage layer scale determination in the storage layer parameters. And initializing the input layer-storage layer weight W_in∈R^n×KI.e. random assignment; initializing memory layer internal connection weight W_x∈R^n×nThat is, n × n × storage layer sparsity obtains non-zero number, and then the connection weight W is processed_xThe positions and sizes of the non-zero elements in (1) are randomly assigned, and other elements are all zero. When the sparsity of the storage layer is larger, the nonlinear approximation capability is stronger; then, an internal connection weight W is determined by utilizing the internal connection spectrum radius_xThe maximum eigenvalue of (2) ensures network stability only if the internal connection spectral radius is less than 1. Thus, the input layer-storage layer weight W is determined based on the storage layer parameters_in∈R^n×KInternal connection weight W of storage layer_x∈R^n×n. In this embodiment, L is also initialized_1/2And L₂Coefficient lambda₁＝5*10^-7,λ₂＝1*10^-5For regularization calculation, front 2/3 of the logic block address where the input historical thermal data is located is used for constructing an input variable U, and rear 1/3 of the logic block address where the input historical thermal data is located is used for constructing an actual result YAccording to the logical block address, in other feasible embodiments, the selected length may be other, and the present invention does not specifically limit the length, and the general idea is to predict the address of the next segment by using the address of the previous segment, and then compare the predicted address of the next segment with the actual address to adjust the network, which is the original characteristic of the echo state network, and the present invention does not specifically describe the length.

X(t)＝log sig(U(t)W_in+X(t-1)W_x)

wherein, U (t) represents the t-th data in the input variable U, X (t) and X (t-1) represent the t-th and t-1-th state node information respectively, the number of nodes is determined by the data length of the input variable U, W_in、W_xThe method respectively represents an input layer-storage layer weight matrix, a storage layer internal connection weight matrix and logsig (·) in the echo state network, and the method can enable a neural network to approach any nonlinear function at will, then the neural network can be applied to a nonlinear model, and when the activation function is used, an input quantity is directly multiplied by an input layer proportionality coefficient to be converted into a corresponding range of the activation function. Since the calculation is performed by inputting in sequence, t can be understood as a time of day.

in the formula, E represents a loss function, λ₁、λ₂Are all regularization coefficients.

In order to realize calculation, the loss function E is simplified, and then a coordinate descent algorithm is adopted to calculate an output weight;

wherein the simplified loss function is represented as:

there are:

wherein I is an identity matrix.

Solving matrix W'_outIs calculated for each element separately, W'_outValue of kth element of mth line (W'_out)_mkThe following were used:

wherein,

of formula (II) to (III)'_k(t) denotes the kth element, X 'of the kth line of Y'_j(t) denotes X' line jth element;

is zero.

Finally using matrix W'_outAnd the output weight W_outCalculating the output weight W according to the relationship_out. In this embodiment, the method further includes performing adaptive optimization on the output weight obtained in step U403, where the optimization is U404:

u404: converting the loss function, and calculating the weight W' by adopting a coordinate descent algorithm_outThen, the weight W ″, is utilized_outCalculating an optimized output weight;

the loss function after conversion is:

weight W_outAnd the output weight W_outThe relationship of (1) is:

wherein,

n is the number of nodes of the storage layer, and K is the number of nodes of the output layer.

In order to verify the reliability of the method, the method creatively replaces thermal data identification with thermal data prediction, and improves the accuracy rate of thermal data discrimination. We used four actual workloads for objective evaluation. Finaminial 1 is a write intensive trace file, MSR is a common workload for large enterprise servers, Distilled stands for typical usage patterns of personal computers, and finally MillSSD is collected from an industrial automated optical inspection instrument with Runcore RCS hardware configuration-V-T25 SSD (512GB, SATA2), Intel X27400 and 2G DDR 3. MillSSD is also a write-intensive trace file because it has the effect of a substantial image backup. The results of the performance comparison of this example are shown in FIG. 5. From the test results, it can be seen that: on WDAC basis, the HOESN thermal ratio curve almost overlaps WDAC in most cases. This major trend can be clearly seen at all four workloads, especially for more write-intensive MSRs and milssd. It is clear that at four workloads, our HOESN has the lowest FIR followed by DL-MBF _ s. Although MBF experiences a relatively high FIR, it is still a good HDI scheme for SSDs, where WDAC is proposed, which becomes the classical benchmark for the following studies. Notably, of the four workloads, the degree of improvement in HOESN was most impressive for MillSSD (from 4.08% to 2.23%). These preliminary tests also demonstrate our original idea that understanding the access behavior of hot data of NAND flash can be considered a time-series prediction, for which hoe, was proposed. The results show that our prediction method can get a good idea of the access behavior of the disk workload, which is the basic premise for reliable service for GC and WL.

It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the invention is not to be limited to the examples described herein, but rather to other embodiments that may be devised by those skilled in the art based on the teachings herein, and that various modifications, alterations, and substitutions are possible without departing from the spirit and scope of the present invention.

Claims

1. A thermal data prediction method based on a joint optimization echo state network is characterized in that: the method comprises the following steps:

updating the positions of the particles by adopting a quantum particle swarm algorithm based on the position range of each particle, wherein L is utilized in each updating process₂+ adaptive L_1/2Outputting a weight value and calculating a global optimal adaptation value in the regularized and constrained echo state network calculation process, and taking a particle position corresponding to the global optimal adaptation value as an optimal storage layer parameter when iteration is finished;

y＝x*W_out(1)

where y represents the predicted logical block takenAddress, predicting the data at the logic block address as thermal data, x as the logic block address where the input historical thermal data is located, W_out(1)And representing the final output weight, wherein the logical block address where the historical thermal data is located is used for the echo state network training process in the step S2 and the step S3.

2. The method of claim 1, wherein: the iterative optimization for determining the optimal storage layer parameters in step S2 is performed as follows:

3. The method of claim 2, wherein: the position of any particle j is updated according to the following formula:

wherein,

and u_jAre all random numbers between (0,1), sbest_j、sbest_iThe individual optimal parameters of the jth and ith particles are shown, the mbest is the average value of the current individual optimal parameters of all the particles, iter and iter_maxRespectively the current iteration number and the maximum iteration number, omega_max、ω_minMaximum and minimum inertia factors, respectively, N is the total number of particles, gbest represents the global optimum parameter, and β represents the step size parameter of the particle movement.

4. The method of claim 2, wherein: the calculation formula of the adaptive value of any particle j is as follows:

outputting a weight value for the process corresponding to the current particle j; y represents the rear section of the logical block address where the historical thermal data used for network training is located, X represents the state information of the storage layer updated based on the front section of the logical block address where the historical thermal data used for network training is located,

representing a process output weight corresponding to a current particle j

5. The method of claim 1, wherein: by using L₂+ adaptive L_1/2The process of calculating the final output weight or the process output weight by the regularized constrained echo state network is as follows:

X(t)＝logsig(U(t)W_in+X(t-1)W_x)

in the formula, E represents a loss function, λ₁、λ₂Are all regularization coefficients, W_outRepresents the output weight, X W_outRepresentation is based on output weight W_outPredicting results corresponding to the rear section of the logic block address where the historical thermal data is located;

wherein, if the process output weight in step S2 is calculated, the weight W is output_outEquals to the process output weight; if the final output weight is calculated in step S3, the weight W is output_outEqual to the final output weight.

6. The method of claim 5, wherein: the process of step U403 is: simplifying the loss function, and calculating an output weight by adopting a coordinate descent algorithm;

wherein the simplified loss function is represented as:

there are:

wherein I is an identity matrix;

solving matrix W'_outIs to calculate separately for each element thereof, a matrix W'_outValue of kth element of mth line (W'_out)_mkThe following were used:

wherein,

of formula (II) to (III)'_k(t) denotes the kth element, X 'of the kth line of Y'_j(t)、X′_m(t) denotes the t-th element of the jth line of X 'and the t-th element of the mth line of X';

7. The method of claim 6, wherein: the method also comprises the step of carrying out self-adaptive optimization on the output weight value obtained in the step U403, wherein the optimization process is as follows:

converting the loss function, and calculating the weight W' by adopting a coordinate descent algorithm_outThen calculating the optimized output weight;

the loss function after conversion is:

weight W_outAnd the output weight W_outThe relationship of (1) is:

wherein,

k is the number of nodes of the input layer.

8. The method of claim 1, wherein: the storage layer parameters in the echo state network comprise four key parameters of an internal connection spectrum radius, a storage layer scale, an input layer proportion coefficient and a storage layer sparsity.

9. The method of claim 1, wherein: the parameters required for initializing the quantum-behaved particle swarm algorithm in step S1 include the total number of particles N and the maximum iteration number iter_maxAnd a maximum inertia factor omega_maxAnd minimum inertia factor omega_min。

10. The method of claim 1, wherein: when the particle position is updated, if the particle moving distance exceeds the position range corresponding to the particle, the particle position parameter is set to exceed the boundary value corresponding to the position range.