CN117971503B

CN117971503B - Data caching method and system based on edge calculation

Info

Publication number: CN117971503B
Application number: CN202410372649.XA
Authority: CN
Inventors: 许磊; 许洁; 毛骜鹏
Original assignee: Hangzhou Yuanshi Technology Co ltd
Current assignee: Hangzhou Yuanshi Technology Co ltd
Priority date: 2024-03-29
Filing date: 2024-03-29
Publication date: 2024-06-11
Anticipated expiration: 2044-03-29
Also published as: CN117971503A

Abstract

The invention relates to the technical field of data caching, in particular to a data caching method and system based on edge calculation, wherein the method comprises the following steps: obtaining cache data information of user edge equipment; acquiring vocabulary in each cache data; analyzing the occurrence frequency of the vocabulary in the cache data and the importance degree of the vocabulary, constructing a keyword characteristic value and acquiring keywords; constructing a user behavior retrieval depth for the access degree of the keywords of the cache data in a plurality of times; acquiring a cold and hot attribute adjustment coefficient of the association characteristic of the data to be cached and the cached data; constructing a cold and hot coefficient; and selecting the edge equipment based on the cold and heat coefficients of different users for the data to be cached, thereby completing the data caching based on edge calculation. The invention aims to predict the user cache data according to the historical access mode and behavior of the user so as to respond to the user request more quickly and reduce the data access time delay.

Description

Data caching method and system based on edge calculation

Technical Field

The invention relates to the technical field of data caching, in particular to a data caching method and system based on edge calculation.

Background

The data cache refers to a high-speed memory in the hard disk, and is used for storing some temporarily unused data so as to facilitate the subsequent reading of the data. And the data caching based on edge calculation is to deploy the caching on the edge equipment, store the data at a position closer to the user, reduce the delay of data transmission, and thus, the data access speed can be accelerated and the user experience can be improved. However, in the process of data caching, if the data which is not frequently used by the user is cached on the edge device, the reading and writing speed of the user can be increased intangibly, the content transmission speed is reduced, the network congestion is increased, and therefore the use experience of the user is greatly reduced.

The traditional LFU algorithm realizes a cache data elimination mechanism according to the access frequency of data, but the algorithm is suitable for application scenes with relatively fixed access frequency, and has lower cache hit rate for personal users compared with the access efficiency of fixed services such as enterprises; while conventional LRU algorithms add to the time-plane considerations, such algorithms require recording access time stamps for each data, adding to the computational complexity.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a data caching method and system based on edge calculation, and the adopted technical scheme is as follows:

In a first aspect, an embodiment of the present invention provides a data caching method based on edge computation, where the method includes the following steps:

obtaining cache data information of user edge equipment in a historical time period, wherein the cache data information comprises a cache path vector, cache delay data and cache data;

Processing the cache data of the user edge equipment in the historical time period to obtain vocabulary in each cache data; encoding each vocabulary in the cache data to obtain word vectors of each vocabulary; clustering each vocabulary based on the word vector to obtain each cluster; acquiring key word characteristic values of each cluster according to the distribution of elements in each cluster and the importance degree of each vocabulary in the cache data; acquiring keywords of each cache data in a historical time period according to the keyword characteristic values of each cluster; according to the difference degree of the keywords of each cache data in the historical time period, combining the cache path vector of the user edge device and the corresponding path cache delay data to obtain the user behavior retrieval depth of the user in the historical time period; acquiring a cold and hot attribute adjustment coefficient of the data to be cached at the current moment according to the access frequency and the access time delay of the data to be cached at the current moment; acquiring the cold and hot coefficients of the current data to be cached according to the user behavior retrieval depth of the user in the historical time period and the cold and hot attribute adjustment coefficient of the data to be cached at the current moment;

And selecting the optimal edge equipment to finish the caching of the data based on the cold and heat coefficients of different users according to the data to be cached.

Preferably, the processing the cached data of the user edge device in the history period to obtain the vocabulary in each cached data includes:

and (3) for each cache data in the historical time period, adopting ASCIL code conversion, and adopting a bidirectional maximum matching method to acquire vocabulary in each cache data.

Preferably, the clustering of the vocabularies based on the word vectors is performed to obtain clustering clusters, specifically:

And taking the Hamming distance between word vectors corresponding to the words as a clustering distance, taking all words in each cache data as the input of a clustering algorithm, and outputting each clustering cluster.

Preferably, the obtaining the keyword feature value of each cluster according to the distribution of the elements in each cluster and the importance degree of each vocabulary in the cache data includes:

acquiring the lengths of the longest identical substrings of word vectors corresponding to all words in each cluster; acquiring TF-IDF values of each vocabulary, and calculating the sum value of all the TF-IDF values in each cluster; and taking the product of the sum value, the length and the vocabulary quantity of each cluster as the keyword characteristic value of each cluster.

Preferably, the obtaining the keyword of each cache data in the history period according to the keyword feature value of each cluster includes:

And for each cache data in the history time period, using the vocabulary with the maximum TF-IDF value in the cluster corresponding to the maximum keyword characteristic value as the keyword of each cache data.

Preferably, the step of obtaining the retrieval depth of the user behavior of the user in the historical time period according to the difference degree of the keywords of each cache data in the historical time period and combining the cache path vector of the user edge device and the corresponding path cache delay data specifically includes:

acquiring Euclidean distance between word vectors of the corresponding keywords of the cache data under each adjacent access times in the historical time period; obtaining dtw distances among cache path vectors of the cache data under each adjacent access times in a historical time period; calculating the product of the Euclidean distance and the dtw distances; calculating the reciprocal of the product; and taking the sum of the reciprocal values under all adjacent access times in the historical time period as the user behavior retrieval depth of the user in the historical time period.

Preferably, the obtaining the adjustment coefficient of the cold and hot attribute of the data to be cached at the current moment according to the access frequency and the access time delay of the data to be cached at the current moment includes:

for data to be cached at the current moment, acquiring the access frequency of a user to access the data to be cached in a historical time period; acquiring the time delay of accessing the data to be cached each time in the historical time period; calculating the sum of all the time delays in the historical time period; acquiring hit rate of user access data in a historical time period;

when the hit rate is smaller than or equal to a preset hit rate threshold, calculating the ratio of the access frequency to the hit rate, and taking the product of the ratio and the sum as a cold and hot attribute adjustment coefficient of data to be cached at the current moment;

And when the hit rate is larger than a preset hit rate threshold, calculating the product of the access frequency and the sum value as a cold and hot attribute adjustment coefficient of the data to be cached at the current moment.

Preferably, the obtaining the heat and cold coefficient of the current data to be cached is specifically a product of the user behavior retrieval depth of the user in the historical time period and the heat and cold attribute adjustment coefficient of the current data to be cached.

Preferably, the selecting the best edge device to complete the data caching according to the data to be cached based on the cold and heat coefficients of different users includes:

Calculating the absolute value of the difference between the cold and hot coefficients of each user and other users based on the same data to be cached, taking the average value of the absolute values of the difference between each user and all other users as the characteristic distance of each user, and taking the reciprocal of the characteristic distance of each user as the weight of each user; acquiring a mesh topology structure of edge equipment;

Taking the average value of the cold and hot coefficients of all users as a cold and hot coefficient threshold value; storing a set formed by edge equipment corresponding to a user with a cold and heat coefficient larger than a cold and heat coefficient threshold as a user edge set; storing a set formed by all edge devices as an edge set;

the cost value expression of each edge device in the edge set is:

In the method, in the process of the invention, Cost value representing the ith edge device in the edge set,/>Representing the minimum distance between the ith edge device in the edge set and the jth edge device in the user edge set in the mesh topology,/>Representing the weight corresponding to the jth edge device in the user edge set, n represents the number of elements of the user edge set, and/(>)Representing a minimization function;

and taking the edge device with the minimum cost value in the edge set as the best cache edge device for the data to be cached.

In a second aspect, an embodiment of the present invention further provides a data caching system based on edge calculation, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.

The invention has at least the following beneficial effects:

The method and the device have the advantages that the key words of the cached data under different access times are analyzed, the types of the data accessed by the user in the historical time period are mined, the historical behaviors of the user are further analyzed conveniently, the preference of the accessed data of the user is learned in real time, and therefore the data caching of each user is personalized; constructing a user behavior retrieval depth for a user in a historical time period, and describing data mining for the user to deeply access the information of the same type in the time period; constructing a cold and hot attribute adjustment coefficient, and improving the cache hit rate of data to be cached, thereby reducing the cache time delay; constructing a cost value, and finishing the selection of the best cache edge equipment of the data to be cached; by the data caching based on edge calculation, the user request can be responded more quickly, the data access time delay is reduced, the burden of a central data center is lightened, and the expandability and the stability of the whole system are improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating a data caching method based on edge computation according to an embodiment of the present invention;

Fig. 2 is a flowchart for acquiring the heat and cold coefficients of data to be buffered.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a data caching method and system based on edge calculation according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the data caching method and system based on edge calculation provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart illustrating a data caching method based on edge computation according to an embodiment of the invention is shown, the method includes the following steps:

Step S001: and collecting cache data information of the user edge equipment in a historical time period.

The cache path vector is specifically a vector formed by the position information of the cache path of the cache data in the edge device, and the cache information of the data read in a period of time in the edge device where each user is located in the cloud end is obtained, wherein the cache path vector is a vector formed by the position information of the cache path of the cache data in the edge device, in this embodiment, all the user cache data in the first m times from the current time are obtained, in this embodiment, the value of m is 1 hour, and an implementer can set the value according to actual conditions. The period corresponding to the first m times from the current time is noted as a history period for convenience of description.

So far, the cache data information of the user edge equipment is obtained.

Step S002: analyzing historical cache data information of a user, firstly obtaining keywords of the cache data, analyzing the keywords of the cache data under the continuous access times, and constructing a user behavior retrieval depth; and combining the characteristics of the data to be cached of the user at the current moment, constructing a cold and hot attribute adjustment coefficient, and obtaining the cold and hot coefficient of the current data to be cached.

According to the access mode and the behavior of the user, the data possibly needed by the user is predicted and cached on the edge device in advance. Thus, the data transmission time can be reduced, and the hit rate can be improved.

Aiming at the data condition accessed in the user history time period, the data types accessed by the user in the history time period are mined by analyzing keywords for caching the data under different access times, so that the user history behavior is further analyzed conveniently, the user access data preference is learned in real time, and therefore data caching is performed for each user individually.

According to the embodiment, for each cache data of a user in a historical time period, ASCIL code conversion is carried out on each cache data, and word segmentation is carried out on the converted text data by adopting a bidirectional maximum matching method. Meanwhile, in order to analyze key information in data, some vocabularies which have no practical meaning in the data need to be removed, in the embodiment, vocabularies which appear in all cache data but have no practical meaning are removed by adopting a Hadamard stop vocabulary, and word vectors of all vocabularies are obtained by adopting One-hot coding vectors for all vocabularies obtained after removal. The two-way maximum matching method, the Ha-Gong Dai stop vocabulary and the One-hot encoding vector are known techniques, and the present embodiment will not be described in detail.

Meanwhile, aiming at any cache data, a clustering algorithm is adopted, and the Hamming distance between word vectors corresponding to the words is used as a clustering distance, so that each cluster comprising each word is obtained. Combining TF-IDF values of various vocabularies in the cache data to obtain a keyword characteristic value of any cache data of a user in a historical time period, wherein the expression is as follows:

In the method, in the process of the invention, A keyword feature value representing an ith cluster; /(I)Representing the number of words in the i-th cluster,Representing the length of the longest identical substring in the word vectors corresponding to all vocabularies in the ith clusterThe TF-IDF value representing the jth vocabulary in the ith cluster. It should be noted that, the calculation of the TF-IDF (term frequency-inverse frequency) value is known in the prior art, and will not be described in detail in this embodiment.

When the number of words in the cluster is larger, and the length of the longest identical substring between word vectors corresponding to words in the cluster is longer, the words in the cluster are similar, the TF-IDF value in the cache data is larger, the probability of containing high-frequency words in the cluster is larger, the feature of the cache data extraction keyword is more consistent, and therefore the feature value of the keyword of the cluster is larger, and the words in the cluster are more consistent with the keyword of the cache data content.

Selecting a cluster with the largest key word characteristic value, and marking the element with the largest TF-IDF value as the key word in the cache data as。

In general, since the speed of data caching has a certain association with the data volume and the data content of the cached data, when the data volume is larger, the cache hit rate is higher, the average cache delay is reduced to a certain extent, so that the subsequent data access time is reduced. If the user performs deep access to the information of the same type in the historical time period, the cache hit rate is greatly increased, so that the cache time delay is reduced.

For the edge devices of each user, when the user needs to buffer certain data, the user can first check whether the buffer data exists in the own buffer memory, and if not, the data needs to be acquired from the buffer memories of other edge devices. When the edge device of the user accesses data from other edge devices, a cache path vector Hc corresponding to each cache data in each user history time period can be obtained, the cache path comprises a transmission path of the data obtained by the edge device of the user, meanwhile, a cache time delay Hs corresponding to each path is obtained, the cache path vector in the user history time period and the cache time delay of the corresponding path are analyzed, and a user behavior retrieval depth in the user history time period is obtained, wherein the expression is as follows:

In the method, in the process of the invention, Representing user behavior retrieval depth over a historical period of time,/>Representing the access frequency of users to all cached data in a historical time period,/>Europe distance between word vectors representing keywords corresponding to buffer data accessed by ith time and (i+1) th time in history time period,/>、/>Buffer path vectors respectively representing buffer data accessed by ith and (i+1) th times in historical time period,/>As a dtw distance function. The dtw distance is a known technology, and will not be described in detail in this embodiment.

Meanwhile, when the search paths of the user for searching the cache data among all adjacent access times are similar in the historical time period, the search habit of the user is deeper, namely the cache data needed by the user in the historical time period has a similar condition, namely the similar condition among the data is mined, so that the method can be used for mining the historical behavior characteristics of the user, namelyThe larger the data accessed by the user in the time period is, the data needs to be cached to other edge devices which are closer to the edge device of the user, so that the user can access the data conveniently.

For the cached data in the historical time period, a certain relation exists between the cached data and the data to be cached at the current moment, if the data to be cached is frequently accessed in the historical time period, but the hit rate is lower, and the caching delay of the data to be cached is larger, the fact that the data to be cached is frequently accessed in the time period is indicated, but too much caching time is wasted in the process of accessing the data, so that the data caching efficiency is reduced, and the data transmission time is prolonged. Therefore, the cold and hot attribute adjustment coefficient of the data to be cached at the current moment of the user is calculated, and the expression is as follows:

In the method, in the process of the invention, The adjustment coefficient of the cold and hot attribute of the data to be cached at the current moment is expressed by the formula/>Representing the access frequency of users to the data to be cached in the history time period,/>Representing hit rate of user access data in history time period,/>Representing the time delay of the ith access to the data to be cached in the history time period,/>For the hit rate threshold, 0.1 is set in the present embodiment. It should be noted that, the hit rate is calculated as a prior art, and the details are not repeated in this embodiment.

It should be noted that, when the data to be cached is frequently requested to be accessed in the history period, but the hit rate of the user accessing the data to be cached in the history period is smaller than the hit rate threshold, it is indicated that the data to be cached may be the data frequently accessed by the edge device in the history period, and the data needs to be cached and reserved, so that more time is wasted when the subsequent data is accessed, therefore, the adjustment coefficient of the cold and hot properties of the data to be cached is corrected in combination of the cache delay and the access frequency of the data to be cached in the history period,The larger the data to be cached at the current moment is, the more the data to be cached needs to be cached on the edge equipment which is closer to the edge equipment of the user, so that the next data access is facilitated.

And constructing a data cooling and heating coefficient of the data cached by the user by adopting a cooling and heating attribute adjustment coefficient of the data accessed by the user and a user behavior retrieval depth in a historical time period, wherein the data cooling and heating coefficient is used for representing the caching necessity of the data for the user, and the expression is as follows:

In the method, in the process of the invention, Representing the cold and hot coefficient of data to be cached at the current moment,/>Representing the depth of retrieval of the user's behavior of the user over a historical period of time,/>And the cold and hot attribute adjustment coefficient of the data to be cached at the current moment is represented. The flow of obtaining the heat and cold coefficients of the data to be cached is shown in fig. 2.

It should be noted that, when the user behavior search depth of the user caching the data is larger and the cooling and heating attribute adjustment coefficient of the data is larger in the history period, it is indicated that the data is deeply accessed by the user in the history period, and the data is more hot data, namelyThe larger.

So far, the cold and hot coefficients of the data to be cached are obtained.

Step S003: and analyzing the cold and hot coefficients of the same data to be cached based on different users, constructing a cost value, and finishing data caching based on edge calculation.

Considering that similar data accessed by users may exist between edge devices of users which are closer to each other, if two or more user edge devices have data access requirements on the same data, the data is cached on the edge devices which are closer to the two or more users, so that the multi-user access is facilitated.

And calculating the cold and hot coefficients of other users according to the cache data in the historical time period of the edge equipment of the other users and the data to be cached based on the same data to be cached. Taking the average value of the absolute values of the difference values of the cold and hot coefficients of each user and other users as a characteristic distance, and taking the inverse of the characteristic distance of each user as the weight of each user; and taking each edge device as each node, and acquiring the position information of the edge device and the mesh topological structure. Taking the average value of the cold and hot coefficients of all users as a cold and hot coefficient threshold value, screening users needing data to be cached, namely users with cold and hot coefficients larger than the cold and hot coefficient threshold value, storing a set formed by corresponding edge devices as a user edge set, and storing a set formed by all edge devices as an edge set; obtaining cost values of edge devices in the edge set according to a mesh topological structure between the edge set of the user and the edge devices in the edge set, wherein the cost values are expressed as follows:

In the method, in the process of the invention, Cost value representing the ith edge device in the edge set,/>Representing the minimum distance between the ith edge device in the edge set and the jth edge device in the user edge set in the mesh topology,/>Representing the weight corresponding to the jth edge device in the user edge set, n represents the number of elements of the user edge set, and/(>)Representing a minimization function.

And taking the edge device with the minimum cost value in the edge set as the optimal edge device of the data to be cached.

By the data caching method and the data caching system based on edge calculation, user requests can be responded more quickly, data access time delay is reduced, the burden of a central data center is lightened, and the expandability and the stability of the whole system are improved.

Based on the same inventive concept as the above method, the embodiment of the present invention further provides a data caching system based on edge calculation, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the above methods based on edge calculation when executing the computer program.

In summary, the embodiment of the invention mainly analyzes the keywords of the cached data under different access times, and mines the data types accessed by the user in the historical time period, so that the historical behavior of the user is further analyzed, the preference of the accessed data of the user is learned in real time, and the data caching is performed for each user in a personalized way; constructing a user behavior retrieval depth for a user in a historical time period, and describing data mining for the user to deeply access the information of the same type in the time period; constructing a cold and hot attribute adjustment coefficient, and improving the cache hit rate of data to be cached, thereby reducing the cache time delay; constructing a cost value, and finishing the selection of the best cache edge equipment of the data to be cached; by the data caching based on edge calculation, the user request can be responded more quickly, the data access time delay is reduced, the burden of a central data center is lightened, and the expandability and the stability of the whole system are improved.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A data caching method based on edge calculation, which is characterized by comprising the following steps:

selecting optimal edge equipment to finish data caching based on the cold and heat coefficients of different users according to the data to be cached;

The retrieval depth of the user behavior of the user in the historical time period is obtained by combining the cache path vector of the user edge device and the corresponding path cache delay data according to the difference degree of the keywords of each cache data in the historical time period, and is specifically as follows:

Acquiring Euclidean distance between word vectors of the corresponding keywords of the cache data under each adjacent access times in the historical time period; obtaining dtw distances among cache path vectors of the cache data under each adjacent access times in a historical time period; calculating the product of the Euclidean distance and the dtw distances; calculating the reciprocal of the product; taking the sum of the reciprocal values of all adjacent access times in the historical time period as the user behavior retrieval depth of the user in the historical time period;

The obtaining the adjustment coefficient of the cold and hot attribute of the data to be cached at the current moment according to the access frequency and the access time delay of the data to be cached at the current moment comprises the following steps:

When the hit rate is larger than a preset hit rate threshold, calculating the product of the access frequency and the sum value as a cold and hot attribute adjustment coefficient of data to be cached at the current moment;

the obtained cold and hot coefficients of the current data to be cached are specifically products of the user behavior retrieval depth of the user in the historical time period and the cold and hot attribute adjustment coefficients of the data to be cached at the current moment;

the selecting the optimal edge equipment to finish the data caching based on the cold and heat coefficients of different users according to the data to be cached comprises the following steps:

Calculating the absolute value of the difference between the cold and hot coefficients of each user and other users based on the same data to be cached, taking the average value of the absolute values of the difference between each user and all other users as the characteristic distance of each user, and taking the reciprocal of the characteristic distance of each user as the weight of each user; acquiring net topology structures of all edge devices;

the cost value expression of each edge device in the edge set is:

2. The method for caching data based on edge calculation as claimed in claim 1, wherein the processing the cached data of the user edge device in the history period to obtain vocabulary in each cached data includes:

3. The data caching method based on edge calculation as claimed in claim 1, wherein the clustering of words based on word vectors is performed to obtain clusters, specifically:

4. The method for caching data based on edge calculation as claimed in claim 1, wherein the obtaining the keyword feature value of each cluster according to the distribution of elements in each cluster and the importance degree of each vocabulary in the cached data includes:

5. The method for caching data based on edge calculation as claimed in claim 4, wherein said obtaining keywords of each cached data in a history period according to the keyword feature values of each cluster includes:

6. A data caching system based on edge computation, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-5 when executing the computer program.