CN114297099A

CN114297099A - Data cache optimization method and device, nonvolatile storage medium and electronic equipment

Info

Publication number: CN114297099A
Application number: CN202111649851.5A
Authority: CN
Inventors: 张海玉; 乌兰哈达
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-08

Abstract

The application discloses a data cache optimization method and device, a nonvolatile storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring target object request data; inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics; and pre-caching the target hot spot data to optimize data caching. The method and the device solve the technical problem of low data reading speed caused by low cache hit rate.

Description

Data cache optimization method and device, nonvolatile storage medium and electronic equipment

Technical Field

The present application relates to the field of data caching, and in particular, to a data caching optimization method and apparatus, a non-volatile storage medium, and an electronic device.

Background

With the development and popularization of interconnection application and big data, as the access amount of data in unit time is increased in stages, the reading and writing of data becomes the biggest bottleneck, the burden of a database is increased and the response is deteriorated due to a large amount of access, and finally the delay of website display is caused.

At present, memory-based object caching systems are deployed and applied in data centers on a larger scale. The memory cache system used in particular is mainly an open-source Memcached (distributed cache system) and a Redis (remote dictionary service). Data can be returned quickly based on the two caches, so that the increase of response time caused by frequent query of a database is avoided, but the traditional cache-based data loading mode has the following problems: the data hit rate is not high. The cache usually stores data in a Key-Value pair structure, and sets passive failure time for the data, and when the data failure time expires, the data automatically fails. However, the passive failure of data obviously cannot meet the cache requirements of high-frequency sub-synchronization and short life cycle, and due to the change of service and request data, the requirement and the data of the memory are not synchronized easily, so that the reading of a large amount of data is still read from the database, and the cache cannot play the true role; data is accessed frequently, resulting in degraded server performance. In a system with large data and high concurrency, cache service is used out of order, cache service resources are rapidly consumed, a large amount of low-hit-rate data is accumulated and backlogged in a cache, the resource consumption of a cache server is increased sharply, and input and output of a back-end database are too frequent. Meanwhile, a large number of network inputs and outputs are highly likely to cause the problem of no bottom hole in the cache (when the performance of the cache system is not good, by adding nodes, there is still no improvement), which may cause that one batch operation may involve multiple network operations, that is, the batch operation may consume time continuously as the number of instances increases. And if a large number of requests penetrate the cache at the same time at a certain moment, all the requests check the database, and at this moment, the loads of the CPU and the memory of the database are overhigh, and even down (cache avalanche phenomenon) can be caused.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a data cache optimization method and device, a nonvolatile storage medium and electronic equipment, and aims to at least solve the technical problem of low data reading speed caused by low cache hit rate.

According to an aspect of an embodiment of the present application, there is provided a data cache optimization method, including: acquiring target object request data; inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics; and pre-caching the target hot spot data to optimize data caching.

Optionally, the predetermined hotspot data characteristic is determined by: determining information gains of a hot spot data set and a non-hot spot data set; taking the characteristics corresponding to the data in the hot spot data set as a first characteristic set under the condition that the information gain of the hot spot data set is equal to a set first information gain threshold value; under the condition that the information gain of the non-hotspot data set is equal to a set second information gain threshold value, taking the characteristics corresponding to the data in the set as a second characteristic set; and removing the features in the first feature set, which are consistent with the features in the second feature set, to form the hot spot data features.

Optionally, the hotspot data set and the non-hotspot data set are determined by the following method, including: acquiring historical data and historical access data of a target object request; forming the hot spot data set by data with request times higher than a first set threshold in the request historical data; and determining the data with the access times lower than a second set threshold and the cache hit rate lower than a third set threshold in the historical access data as a non-hotspot data set.

Optionally, the determining an information gain of the hotspot data set includes: determining the information entropy of the hotspot data set; determining a sample entropy of the hotspot data set; and taking the difference between the information entropy of the hotspot data set and the sample entropy of the hotspot data set as the information gain of the hotspot data set.

Optionally, the training of the pre-trained neural network model is obtained by training in the following manner: and training by taking the request historical data as sample data and the hot data as a sample label of the sample data to obtain the pre-trained neural network model.

Optionally, before training the pre-trained neural network model, the method further comprises: converting the time stamps of the data in the hot spot data set and the non-hot spot data set into time stamps in a floating point number format.

Optionally, before the target hotspot data is pre-cached to optimize data caching, the method further includes: acquiring the quantity of non-hotspot data; and under the condition that the quantity of the non-hot spot data is smaller than a fourth set threshold value, pre-caching the target hot spot data so as to optimize data caching.

According to another aspect of the embodiments of the present application, there is also provided a data cache optimization apparatus, including: the acquisition module is used for acquiring target object request data; the prediction module is used for inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; the determining module is used for determining target hot spot data according to the predicted hot spot data and the predetermined hot spot data characteristics; and the optimization module is used for pre-caching the target hot spot data so as to optimize the data cache.

According to another aspect of the embodiments of the present application, a non-volatile storage medium is further provided, where the non-volatile storage medium includes a stored program, and when the program runs, the device where the non-volatile storage medium is located is controlled to execute the data cache optimization method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, a memory and a processor; the processor is used for running a program, wherein the program executes the data caching method during running.

In the embodiment of the application, the method comprises the steps of obtaining target object request data; inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics; and pre-caching the target hot spot data to optimize data caching. According to the method, characteristic analysis is conducted on original cache data and user request data while neural network prediction is conducted, preloading of the cache data is completed based on the characteristic analysis result and the user request hot data prediction, the purpose of improving the cache hit rate is achieved, the technical effect of preloading the hot data is achieved, and the technical problem that data reading speed is low due to the fact that the cache hit rate is low is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of an alternative data cache optimization method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an alternative data cache optimization apparatus according to an embodiment of the present application;

fig. 3 is a schematic diagram of another alternative data cache optimization apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present application, there is provided a method embodiment for data cache optimization, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a data cache optimization method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S102, obtaining target object request data;

step S104, inputting data into a pre-trained neural network model to obtain predicted hot spot data;

step S106, determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics;

and step S108, pre-caching the data to optimize the data caching.

Through the steps, characteristic analysis can be performed on original cache data and user request data while prediction is performed through a neural network, preloading of the cache data is completed based on the characteristic analysis result and the prediction of the user request hot data, the purpose of improving the cache hit rate is achieved, the technical effect of preloading the hot data is achieved, and the technical problem that the data reading speed is low due to the fact that the cache hit rate is low is solved.

It should be noted that the target hotspot data is determined according to the predicted hotspot data and the predetermined hotspot data characteristic, and specifically, a data set corresponding to the hotspot data characteristic is found from the database and combined with the predicted hotspot data to form the target hotspot data.

In some embodiments of the present application, the predetermined hotspot data characteristic is determined by: determining information gains of a hot spot data set and a non-hot spot data set; taking the characteristics corresponding to the data in the hot spot data set as a first characteristic set under the condition that the information gain of the hot spot data set is equal to a set first information gain threshold value; under the condition that the information gain of the non-hotspot data set is equal to a set second information gain threshold value, taking the characteristics corresponding to the data in the set as a second characteristic set; and removing the features in the first feature set, which are consistent with the features in the second feature set, to form the hot spot data features.

Specifically, the hotspot data set is S₁The non-hotspot data set is S₂(ii) a Determining the information gain of each data in the feature analysis, data set S₁、S₂The uniqueness of the attributes is ensured during selection, so that the classification accuracy can be improved, the larger the entropy of the set is, the more uniform the data sample distribution is, and the smaller the opposite information gain value is. Each attribute value corresponds to an information gain value, and the small gain value is selected as a classification node, namely when the data is the data set S₁、S₂When the information gain of (2) reaches the set first information gain threshold and the set second information gain threshold, respectively, for example: data set S₁、S₂Respectively, that is to say the data set S₁、S₂When the gain of the information is minimum, the obtained decision tree has obvious and unique characteristics for classifying the data set, and the data set S is determined at the moment₁Corresponding first feature set L₁Set of data S₂Corresponding second feature set L₂(ii) a In the first feature set L₁With the second feature set L removed₂To obtain a hot spot data feature set L₃(ii) a Can be represented as L₃＝L₁-(L₁∩L₂)。

In some embodiments of the present application, determining a hotspot data set and a non-hotspot data set comprises: acquiring historical data and historical access data of a target object request; forming the hot spot data set by data with request times higher than a first set threshold in the request historical data; and determining the data with the access times lower than a second set threshold and the cache hit rate lower than a third set threshold in the historical access data as a non-hotspot data set.

It should be noted that the first set threshold may be a median of the number of requests, or may be other numbers, and may be adjusted according to requirements; the second set threshold may be a median number of access times; the third set threshold corresponds to that the cache hit rate can be the median of the cache hit rate; the historical access data is statistically derived from the cache.

For example: and (3) extracting the data with the number of times of target object request being more than the median value and the required keywords of the user to form a hot spot data set, and forming a non-hot spot data set by the data with the cache hit rate and the access times being less than the median.

In some embodiments of the present application, the information gain of a hotspot data set may be determined by the following steps, including: determining the information entropy of the hotspot data set; determining a sample entropy of the hotspot data set; and taking the difference between the information entropy of the hotspot data set and the sample entropy of the hotspot data set as the information gain of the hotspot data set. The calculation formula is as follows: taking the hot spot data set as S₁The non-hotspot data set is S₂For example, set S₁Is entropy of

p_iIs a set S₁I is 0 … 1; set S₂Is entropy of

Wherein p is_gG is 0 … 1 for the element in the set. Sample entropy of

Wherein, when a is 1, S_aIs S₁When a is 2, S_aIs S₂；S_jIs S_aIs selected.

In some embodiments of the present application, the training of the pre-trained neural network model is trained by: and training by taking the request historical data as sample data and the hot data as a sample label of the sample data to obtain the pre-trained neural network model.

Specifically, the number of training time windows and the size of the time windows are determined: taking window as delta t₁,△t₂,...,△t_m(m.gtoreq.1). The number of windows is required to be at least one, and the window size is sequentially increased along with the index m. Constructing an artificial neural network under different time windows, the network comprising the following features: comprises an input layer, at least 1 hidden layer, and an output layer; each hidden layer at least contains 1 or more nodes. A non-linear activation function is used. The loss function contains a regularization term, and the value of the regularization term is related to the size of the time window. To prevent the over-fitting phenomenon in data prediction, we need to add a penalty term in the loss function to reduce the number of non-zero parameters, and at the same time, a first-normal regularization term is used for sparse data:

in the formula, loss_new(w, b) is a loss function with regularization term, loss (w, b) is a raw loss function,

is a regularization term.

In the regularization term, a summation function

And m lambda is an adjusting parameter of the regularization term.

Taking the number of user requests per fixed timeThe data access amount is a training set, and the ratio of adjacent time periods meets a constant alpha (alpha is more than or equal to 1). Meanwhile, the prediction of the medium-long term data needs to increase each time window and delta t₁Constructing a regularization adjustment coefficient lambda, wherein lambda and r satisfy the following formula:

in the formula, i is an index of different time windows, and lambda which is in accordance with the calculation formula is increased along with the increase of the time windows, so that the inhibition capability of the middle and long-term window on noise and short-term severe fluctuation is improved in training, the prediction accuracy of a neural network model is improved, overfitting is prevented during training, and r is prevented_iIs a constant.

When loss_new(w, b) takes the minimum value when infinity approaches 0 and training ends. Or training is terminated when the training neural network model reaches the maximum number of iterations.

Under the condition that the request historical data are used as sample data, and the hotspot data are sample label training of the sample data, the request historical data of a plurality of time periods are obtained, the duration of each time period is different, so that the hotspot data of different time periods are obtained, and the training accuracy is improved.

And predicting the user request trend in different timeliness through a plurality of time windows, and training and predicting the plurality of time windows in parallel. Improving the prediction precision of the medium and long term: and a regularization term is introduced into the neural network, so that the overfitting of data is prevented.

In some embodiments of the present application, before obtaining the information gain of each data in the hotspot data set and the non-hotspot data set, the method further includes: converting the time stamps of the data in the hot spot data set and the non-hot spot data set into time stamps in a floating point number format.

Specifically, the time stamp of the data is converted: converting time items in different formats into time stamps in floating point number format (unit: second), and calculating as follows:

wherein t is the converted time stamp, t' is the time stamp corresponding to the time item, t_startFor the start timestamp, t, of the current data acquisition window_endFor the end timestamp of the current data acquisition window, Δ t represents the time window, i.e., the acquisition period of the user requested data.

In some embodiments of the present application, before pre-caching the target hotspot data to optimize data caching, the method further comprises: acquiring the quantity of non-hotspot data; and under the condition that the quantity of the non-hot spot data is smaller than a fourth set threshold, pre-caching the target hot spot data so as to optimize data caching.

By increasing the threshold value to be set as a system operation setting termination point, when the quantity of the non-hot data is smaller than the threshold value, the manager stops the optimization of the cache data, and the phenomenon that the hit rate is reduced due to the continuous optimization of the cache data is prevented.

In some embodiments of the present application, a data cache optimization apparatus is further provided, as shown in fig. 2, including:

a cache data processor: the data processing method is used for extracting data with cache hit rate/access times below a median value to form a data set;

requesting the data processor to: the system comprises a counter, a user request data set and a user request data set, wherein the counter is used for recording user API (application program interface) requests and parameters, merging and counting request items of the same class, writing the counted number into the counter, taking out data of which the request class number in the counter is more than a median, and extracting required keywords of a user to form the user request data set;

a counter: the cache system is used for recording the access times and cache hit rate of each data in the cache data and recording the request times of requesting the data by a user;

a manager: the system is used for periodically polling the cache and the counter and deleting the non-hotspot data set from the cache to release space; the manager sets a threshold value, when the number of data access times and the cache hit rate in the non-hotspot data set are both greater than the set access number threshold value and the set hit cache hit rate threshold value, the system stops the optimization of the cache data, and only when the access number and the hit rate in the non-hotspot data set are both less than the set access number threshold value and the set hit cache hit rate threshold value, the cache data optimization can be operated;

a characteristic analysis module: the hot spot data collection and the non-hot spot data collection are subjected to feature analysis to form a first feature collection and a second feature collection, and the second feature collection is removed from the first feature collection to obtain a hot spot feature collection serving as a final data preloading feature collection;

a neural prediction module: for predicting hot spot data through a neural network;

the hot spot loading module: and the method is used for loading data which are in accordance with the hot spot feature set in the database and rising hot spot data predicted by the neural network to a cache.

An embodiment of the present application further provides a data cache optimization apparatus, as shown in fig. 3, including: an obtaining module 30, configured to obtain target object request data; the prediction module 32 is configured to input the target object request data into a pre-trained neural network model to obtain predicted hot spot data; a determining module 34, configured to determine target hotspot data according to the predicted hotspot data and a predetermined hotspot data characteristic; and the optimization module 36 is configured to pre-cache the target hotspot data to optimize data caching.

The prediction module 32 includes: a model submodule; the model submodule is used for training by taking the request historical data as sample data and the hot data as a sample label of the sample data to obtain the pre-trained neural network model.

The determination module 34 includes: a hotspot data characteristic determining submodule and an information gain determining submodule; the hot spot data characteristic determining submodule is used for taking the characteristic corresponding to the data in the hot spot data set as a first characteristic set under the condition that the information gain of the hot spot data set is equal to a set first information gain threshold; under the condition that the information gain of the non-hotspot data set is equal to a set second information gain threshold value, taking the characteristics corresponding to the data in the set as a second characteristic set; and removing the features in the first feature set, which are consistent with the features in the second feature set, to form the hot spot data features.

The information gain determining submodule is used for determining the information entropy of the hotspot data set; determining a sample entropy of the hotspot data set; and taking the difference between the information entropy of the hotspot data set and the sample entropy of the hotspot data set as the information gain of the hotspot data set.

The information gain determining submodule comprises a timestamp conversion unit; the time stamp converting unit is used for converting the time stamps of the data in the hot spot data set and the non-hot spot data set into the time stamps in a floating point number format.

The hot spot data characteristic determining submodule comprises: the data determining unit is used for acquiring the historical data and the historical access data of the target object request; forming the hot spot data set by data with request times higher than a first set threshold in the request historical data; and determining the data with the access times lower than a second set threshold and the cache hit rate lower than a third set threshold in the historical access data as a non-hotspot data set.

The optimization module 36 includes: a judgment submodule; the judgment submodule is used for acquiring the quantity of the non-hotspot data; and under the condition that the quantity of the non-hot spot data is smaller than a fourth set threshold, pre-caching the target hot spot data so as to optimize data caching.

The nonvolatile storage medium stores a program for: acquiring target object request data; inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics; and pre-caching the target hot spot data to optimize data caching.

The electronic device is used for storing and executing the following programs: acquiring target object request data; inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data; determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics; and pre-caching the target hot spot data to optimize data caching.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A data cache optimization method is characterized by comprising the following steps:

acquiring target object request data;

inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data;

determining target hotspot data according to the predicted hotspot data and predetermined hotspot data characteristics;

and pre-caching the target hot spot data to optimize data caching.

2. The method of claim 1, wherein the predetermined hotspot data characteristic is determined by:

determining information gains of a hot spot data set and a non-hot spot data set;

taking the characteristics corresponding to the data in the hot spot data set as a first characteristic set under the condition that the information gain of the hot spot data set is equal to a set first information gain threshold value;

under the condition that the information gain of the non-hotspot data set is equal to a set second information gain threshold value, taking the characteristics corresponding to the data in the set as a second characteristic set;

and removing the features in the first feature set, which are consistent with the features in the second feature set, to form the hot spot data features.

3. The method of claim 2, wherein the hot spot data set and the non-hot spot data set are determined by:

acquiring historical data and historical access data of a target object request;

forming the hot spot data set by data with request times higher than a first set threshold in the request historical data;

and determining the data with the access times lower than a second set threshold and the cache hit rate lower than a third set threshold in the historical access data as a non-hotspot data set.

4. The method of claim 2, wherein determining an information gain for a hotspot data set comprises:

determining the information entropy of the hotspot data set;

determining a sample entropy of the hotspot data set;

and taking the difference between the information entropy of the hotspot data set and the sample entropy of the hotspot data set as the information gain of the hotspot data set.

5. The method of claim 3, wherein the training of the pre-trained neural network model is trained by:

and training by taking the request historical data as sample data and the hot data as a sample label of the sample data to obtain the pre-trained neural network model.

6. The method of claim 4, wherein prior to training the pre-trained neural network model, the method further comprises:

converting the time stamps of the data in the hot spot data set and the non-hot spot data set into time stamps in a floating point number format.

7. The method of claim 2, wherein prior to pre-caching the target hotspot data for optimization of data caching, the method further comprises:

acquiring the quantity of non-hotspot data;

and under the condition that the quantity of the non-hot spot data is smaller than a fourth set threshold value, pre-caching the target hot spot data so as to optimize data caching.

8. A data cache optimization apparatus, comprising:

the acquisition module is used for acquiring target object request data;

the prediction module is used for inputting the target object request data into a pre-trained neural network model to obtain predicted hot spot data;

the determining module is used for determining target hot spot data according to the predicted hot spot data and the predetermined hot spot data characteristics;

and the optimization module is used for pre-caching the target hot spot data so as to optimize the data cache.

9. A non-volatile storage medium, comprising a stored program, wherein when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the data cache optimization method according to any one of claims 1 to 7.

10. An electronic device, characterized by a memory and a processor; the processor is configured to execute a program, wherein the program executes the data caching method according to any one of claims 1 to 7.