CN116828053B - Data caching method and device, electronic equipment and storage medium - Google Patents

Data caching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116828053B
CN116828053B CN202311082814.XA CN202311082814A CN116828053B CN 116828053 B CN116828053 B CN 116828053B CN 202311082814 A CN202311082814 A CN 202311082814A CN 116828053 B CN116828053 B CN 116828053B
Authority
CN
China
Prior art keywords
data
node
target data
heat
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311082814.XA
Other languages
Chinese (zh)
Other versions
CN116828053A (en
Inventor
王浩洋
徐政钧
刘逸雄
潘建东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Securities Co Ltd
Original Assignee
China Securities Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Securities Co Ltd filed Critical China Securities Co Ltd
Priority to CN202311082814.XA priority Critical patent/CN116828053B/en
Publication of CN116828053A publication Critical patent/CN116828053A/en
Application granted granted Critical
Publication of CN116828053B publication Critical patent/CN116828053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data caching method, a data caching device, electronic equipment and a storage medium. The method comprises the following steps: for each second node, acquiring the heat of target data in the second node, determining the transmission delay of the first node for transmitting the target data to the second node and the minimum delay for transmitting the target data to the second node, and determining the difference between the transmission delay and the minimum delay as a transmission delay difference; determining a preset type value of the target data according to a database of the data field to which the target data belong; calculating a sub-data cache value corresponding to the second node according to the heat degree, the preset type value and the transmission delay difference value; summing all the sub data cache values corresponding to the second nodes to obtain a data cache value of the target data; and according to the data caching value of the target data, if the stage score of the first node is increased under the condition of caching the target data to the first node, caching the target data to the first node. The caching efficiency of knowledge data can be improved.

Description

Data caching method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of multimedia technologies, and in particular, to a data caching method, a data caching device, an electronic device, and a storage medium.
Background
Under the rapid development of 5G and mobile multimedia services, knowledge data gradually derive multi-mode expression forms such as texts, images, audios and videos, the flow of the multi-mode expression forms is exponentially increased, and huge pressure is brought to a knowledge service platform in the aspects of network time delay, throughput and the like. How to improve the caching efficiency of the knowledge service platform to the knowledge data, and cope with the data scale and quality requirements of the knowledge service, which are continuously improved, becomes a current urgent problem to be solved.
Disclosure of Invention
The embodiment of the application aims to provide a data caching method, a data caching device, electronic equipment and a storage medium, so as to improve caching efficiency of knowledge data. The specific technical scheme is as follows:
the embodiment of the application provides a data caching method which is applied to a first node in a distributed storage system, wherein the distributed storage system also comprises a second node, and the method comprises the following steps:
for each second node, acquiring the heat of the target data at the second node;
determining, for each of the second nodes, a transmission delay for the first node to transmit the target data to the second node, and determining a minimum delay for transmitting the target data to the second node;
Determining, for each of the second nodes, a difference between the transmission delay and the minimum delay as a transmission delay difference;
determining a preset type value of the target data according to a database of the data field to which the target data belong; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value and the transmission delay difference value; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
summing all the sub data cache values corresponding to the second nodes to obtain a data cache value of the target data;
determining whether a stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data, wherein the stage score is positively correlated with the data caching values of all data cached by the first node;
If yes, the target data are cached to the first node.
In one possible embodiment, the method further comprises:
calculating the ratio of the data volume of the target data to the cache capacity of the first node;
and for each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value and the transmission delay difference value, including:
for each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value, the transmission delay difference value and the ratio; the sub data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference value, and negatively correlated with the ratio.
In one possible embodiment, the data cache value of the target data is defined as:
wherein ,a data cache value representing said target data, a +.>Representing said first node,/->Representing the target data->Representing said second node,/->Representing a minimum delay of transmission of said target data to said second node,/for >Representing a maximum delay of transmission of said target data to said second node,/for>Representing a transmission delay of the first node to transmit the target data to the second node,representing the heat of the target data at the second node,/a>Representing the ratio of the data amount of said target data to the buffer capacity of said first node,/->And representing the preset type value of the target data.
In a possible embodiment, if the foregoing is true, caching the target data to the first node includes:
and if the stage score of the first node is increased under the condition that the target data is cached to the first node, and the sum of the data amounts of all data cached by the first node is smaller than the cache capacity of the first node, caching the target data to the first node.
In a possible embodiment, the acquiring the heat of the target data at the second node includes:
inputting the target data into a data heat prediction model to obtain the heat of the target data output by the data heat prediction model at the second node;
the data heat prediction model comprises an input layer, a plurality of self-adaptive time convolution networks, a full connection layer, a discarding module, a classifier and an output layer, wherein the self-adaptive time convolution networks comprise a time convolution network and a stopping unit.
In one possible embodiment, the target data includes a plurality of sub-data, and the acquiring the heat of the target data at the target node includes:
inputting at least one piece of sub data into the time convolution network to obtain the heat characteristic output by the time convolution network;
calculating a stop score by the stop unit according to the heat characteristic;
if the stopping score meets a preset stopping condition, inputting all obtained heat characteristics into a full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node;
and if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub-data into the time convolution network based on the sub-data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
The embodiment of the application also provides a data caching device, which is applied to a first node in a distributed storage system, wherein the distributed storage system also comprises a second node, and the device comprises:
the heat acquisition module is used for acquiring the heat of the target data at each second node;
A minimum delay determining module, configured to determine, for each of the second nodes, a transmission delay for the first node to transmit the target data to the second node, and determine a minimum delay for transmitting the target data to the second node;
a transmission delay difference determining module, configured to determine, for each of the second nodes, a difference between the transmission delay and the minimum delay as a transmission delay difference;
the preset type value determining module is used for determining the preset type value of the target data according to a database of the data field to which the target data belong; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
the sub data cache value calculation module is used for calculating a sub data cache value corresponding to each second node according to the heat degree, the preset type value and the transmission delay difference value; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
The data cache value obtaining module is used for summing all the sub data cache values corresponding to the second nodes to obtain the data cache value of the target data;
the determining module is used for determining whether a stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data, wherein the stage score is positively correlated with the data caching values of all data cached by the first node;
and the caching module is used for caching the target data to the first node if yes.
In one possible embodiment, the apparatus further comprises:
the ratio calculating module is used for calculating the ratio of the data volume of the target data to the cache capacity of the first node;
the sub-data buffer value calculation module is specifically configured to calculate, for each second node, a sub-data buffer value corresponding to the second node according to the heat, the preset type value, the transmission delay difference value, and the ratio; the sub data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference value, and negatively correlated with the ratio.
In one possible embodiment, the data cache value of the target data is defined as:
wherein ,a data cache value representing said target data, a +.>Representing said first node,/->Representing the target data->Representing said second node,/->Representing a minimum delay of transmission of said target data to said second node,/for>Representing a maximum delay of transmission of said target data to said second node,/for>Representing a transmission delay of the first node to transmit the target data to the second node,representing the heat of the target data at the second node,/a>Representing the ratio of the data amount of said target data to the buffer capacity of said first node,/->And representing the preset type value of the target data.
In a possible embodiment, the buffering module is specifically configured to buffer the target data to the first node if the stage score of the first node increases in the case where the target data is buffered to the first node, and the sum of the data amounts of all data buffered by the first node is smaller than the buffering capacity of the first node.
In a possible embodiment, the heat acquiring module is specifically configured to input the target data into a data heat prediction model, and obtain heat of the target data output by the data heat prediction model at the second node;
The data heat prediction model comprises an input layer, a plurality of self-adaptive time convolution networks, a full connection layer, a discarding module, a classifier and an output layer, wherein the self-adaptive time convolution networks comprise a time convolution network and a stopping unit.
In a possible embodiment, the heat acquiring module is specifically configured to input at least one piece of sub-data into the time convolution network to obtain a heat characteristic output by the time convolution network;
calculating a stop score by the stop unit according to the heat characteristic;
if the stopping score meets a preset stopping condition, inputting all obtained heat characteristics into a full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node;
and if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub-data into the time convolution network based on the sub-data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
The embodiment of the application also provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface, and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
and the processor is used for realizing any one of the data caching methods when executing the programs stored in the memory.
The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the data caching methods when being executed by a processor.
The embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the data caching methods described above.
The embodiment of the application has the beneficial effects that:
according to the data caching method, the device, the electronic equipment and the storage medium, the heat degree of target data in the second nodes can be acquired for each second node, the transmission delay of the first node for transmitting the target data to the second nodes is determined, the minimum delay of the first node for transmitting the target data to the second nodes is determined, the difference value between the transmission delay and the minimum delay is used as the transmission delay difference value, and the importance degree of the data type of the target data (namely the preset type value of the target data) is determined according to the database of the data field of the target data. And then, calculating to obtain a sub-data cache value corresponding to the second node according to the heat degree of the target data in the second node, the transmission delay difference value and the preset type value of the target data, summing all the sub-data cache values corresponding to the second node to obtain a data cache value of the target data, and caching the target data to the first node if the stage score of the first node is increased under the condition that the target data is cached to the first node.
It can be understood that the higher the target data is at the second node, the greater the possibility that the user terminal requests the target data from the second node is, if the target data is cached at the first node, the second node may subsequently request the target data from the first node to obtain the target data, and if the target data is not cached at the first node, the second node may subsequently request the target data from the first node, and if the second node subsequently requests the target data from the first node, the first node may further request the target data from other nodes, which may result in a greater number of requests. And, the smaller the transmission delay difference is, the closer the transmission delay of the first node to the second node for transmitting the target data is to the minimum delay of the second node, that is, the smaller the transmission delay when the first node transmits the target data to the second node, if the first node caches the target data, the smaller the transmission delay required by the second node to receive the target data when the second node requests the target data from the first node, the shorter the request time is, and if the first node does not cache the target data, the first node needs to request the target data from other nodes further when the second node requests the target data from the first node later, which not only results in the increase of the request times, but also results in the increase of the request time.
Therefore, the higher the heat of the target data at the second node is, the more the target data needs to be cached at the first node, the smaller the transmission delay difference value is, the more important the data type of the target data is, the larger the preset type value is, the more the target data needs to be cached, and therefore, the data cache value of the target data is positively correlated with the heat of the target data at the second node, positively correlated with the preset type value and negatively correlated with the transmission delay difference value. And because the data cache value of the target data is obtained by summing the sub-data cache values corresponding to the second node, the sub-data cache value corresponding to the second node is positively correlated with the heat of the target data at the second node, positively correlated with the preset type value and negatively correlated with the transmission delay difference value. Therefore, the data cache value of the target data can fully reflect the heat degree of the target data in the second node, the transmission delay of the first node for transmitting the target data to the second node and the importance degree of the data type of the target data. Therefore, whether the target data can be cached in the first node is judged according to the data caching value of the target data, and meanwhile, the influence of data heat, data transmission delay and data types on data caching is considered, so that the caching efficiency of the knowledge data is improved.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.
Fig. 1 is a schematic flow chart of a data caching method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a data heat prediction model according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a heat acquiring method according to an embodiment of the present application;
fig. 4 is a schematic diagram of an application of a data caching method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a structure of a data buffering device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.
For more clearly describing the data caching method provided by the present application, an exemplary description will be given below of one possible application scenario of the data caching method provided by the present application, and it will be understood that the following example is only one possible application scenario of the data caching method provided by the present application, and in other possible embodiments, the data caching method provided by the present application may also be applied to other possible application scenarios, which are not limited in any way.
With the deep development of economic globalization and financial globalization, financial institutions in countries and regions of the world are engaged in market competition, and the improvement of the service level of a financial knowledge platform becomes an objective requirement for enterprise development. As a core resource of the financial knowledge service platform, the financial knowledge base generally collects, sorts and stores various knowledge data from all parts of the world, such as finance, economy, industry, etc. Under the rapid development of 5G and mobile multimedia services, financial knowledge data is gradually derived into multi-mode expression forms such as texts, images, audios and videos, the flow of the multi-mode expression forms is exponentially increased, and huge pressure is brought to a knowledge service platform in the aspects of network time delay, throughput and the like.
Because of the limited storage capacity of a single storage device, the knowledge service platform often stores each financial knowledge data in a distributed storage system, where the distributed storage system includes a plurality of storage nodes (hereinafter referred to as nodes), and a portion of the financial knowledge data is cached in each node.
The process of acquiring financial knowledge data by a user in a knowledge service platform comprises the following steps: the user uses the user terminal to access one node (denoted as node A) in the distributed storage system, the user terminal requests the financial knowledge data (denoted as target financial knowledge data) needed to be acquired by the user from the node A, the node A judges whether the target financial knowledge data is locally cached or not according to the request of the user terminal, if the target financial knowledge data is locally cached by the node A, the node A returns the target financial knowledge data to the user terminal, if the target financial knowledge data is not locally cached by the node A, the node A requests the target financial knowledge data from other nodes (denoted as node B), the node B judges whether the target financial knowledge data is locally cached according to the request of the node A, if the target financial knowledge data is locally cached by the node B, the node B returns the target financial knowledge data to the node A, so that the node A returns the target financial knowledge data to the user terminal, if the target financial knowledge data is not locally cached by the node B, the node B requests the target financial knowledge data to other nodes, and so on.
It can be understood that the fewer the number of times the target financial knowledge data is requested from other nodes (hereinafter referred to as the request number), the more efficient the user terminal can acquire the target financial knowledge data, and the bandwidth pressure in the distributed storage system can be reduced. Therefore, how to enable the node a to reasonably cache the financial knowledge data under the condition of limited storage capacity of the node, so as to reduce the number of requests as much as possible, is a technical problem to be solved urgently by those skilled in the art.
Most research data caching processes are based on the assumption that user request data have the same size, and do not consider differences of knowledge types, storage forms and the like of financial knowledge data, for example, the file size of a hot financial knowledge video in an actual system is far larger than that of a hot knowledge picture or text, and different types of influences such as financial knowledge transaction types, policy types, bulletin types, entertainment types and the like have differences. Therefore, if the node caches the financial knowledge data according to the related technology, it is difficult to effectively reduce the number of requests, so that the efficiency of the user terminal to acquire the target financial knowledge data is low, and the bandwidth pressure in the distributed storage system is increased.
Based on this, an embodiment of the present application provides a data caching method, which is applied to a first node in a distributed storage system, where the distributed storage system further includes a second node, as shown in fig. 1, and the method includes:
s101, for each second node, acquiring the heat of the target data at the second node.
S102, determining a transmission delay of the first node for transmitting the target data to the second nodes, and determining a minimum delay of transmitting the target data to the second nodes.
S103, determining a difference value between the transmission delay and the minimum delay as a transmission delay difference value for each second node.
S104, determining the preset type value of the target data according to the database of the data field to which the target data belong.
The preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database.
S105, aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value and the transmission delay difference value.
The sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference.
And S106, summing the sub data cache values corresponding to all the second nodes to obtain the data cache value of the target data.
S107, determining whether the stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data.
Wherein the phase score is positively correlated with the data cache values of all data cached by the first node.
S108, if yes, caching the target data to the first node.
With this embodiment, the heat degree of the target data at the second node may be obtained for each second node, the transmission delay of the first node for transmitting the target data to the second node may be determined, the minimum delay of the first node for transmitting the target data to the second node may be determined, the difference between the transmission delay and the minimum delay may be used as the transmission delay difference, and the importance degree of the data type to which the target data belongs (i.e. the preset type value of the target data) may be determined according to the database in the data field to which the target data belongs. And then, calculating to obtain a sub-data cache value corresponding to the second node according to the heat degree of the target data in the second node, the transmission delay difference value and the preset type value of the target data, summing all the sub-data cache values corresponding to the second node to obtain a data cache value of the target data, and caching the target data to the first node if the stage score of the first node is increased under the condition that the target data is cached to the first node.
It can be understood that the higher the target data is at the second node, the greater the possibility that the user terminal requests the target data from the second node is, if the target data is cached at the first node, the second node may subsequently request the target data from the first node to obtain the target data, and if the target data is not cached at the first node, the second node may subsequently request the target data from the first node, and if the second node subsequently requests the target data from the first node, the first node may further request the target data from other nodes, which may result in a greater number of requests. And, the smaller the transmission delay difference is, the closer the transmission delay of the first node to the second node for transmitting the target data is to the minimum delay of the second node, that is, the smaller the transmission delay when the first node transmits the target data to the second node, if the first node caches the target data, the smaller the transmission delay required by the second node to receive the target data when the second node requests the target data from the first node, the shorter the request time is, and if the first node does not cache the target data, the first node needs to request the target data from other nodes further when the second node requests the target data from the first node later, which not only results in the increase of the request times, but also results in the increase of the request time.
Therefore, the higher the heat of the target data at the second node is, the more the target data needs to be cached at the first node, the smaller the transmission delay difference value is, the more important the data type of the target data is, the larger the preset type value is, the more the target data needs to be cached, and therefore, the data cache value of the target data is positively correlated with the heat of the target data at the second node, positively correlated with the preset type value and negatively correlated with the transmission delay difference value. And because the data cache value of the target data is obtained by summing the sub-data cache values corresponding to the second node, the sub-data cache value corresponding to the second node is positively correlated with the heat of the target data at the second node, positively correlated with the preset type value and negatively correlated with the transmission delay difference value. Therefore, the data cache value of the target data can fully reflect the heat degree of the target data in the second node, the transmission delay of the first node for transmitting the target data to the second node and the importance degree of the data type of the target data. Therefore, whether the target data can be cached in the first node is judged according to the data caching value of the target data, and meanwhile, the influence of data heat, data transmission delay and data types on data caching is considered, so that the caching efficiency of the knowledge data is improved.
The foregoing steps S101 to S108 will be described in detail below:
in S101, the first node refers to any node with a storage capability that can store data in the distributed storage system, and the second node also refers to any node with a storage capability that can store data in the distributed storage system. The target data may refer to knowledge data related to various fields, and by way of example, the target data may refer to knowledge data related to a financial field (i.e., the foregoing financial knowledge data), knowledge data related to a translation field, knowledge data related to a programming field, and so on. The target data may also have various representations, for example, the target data may be text information, image information, audio information, video information, and the like. The following will take the target data as knowledge data related to the financial domain as an example, and for knowledge data in other domains, similar to knowledge data related to the financial domain, the description will not be repeated.
It can be understood that the higher the target data is at the second node, the greater the possibility that the user terminal requests the target data from the second node is, if the target data is cached at the first node, the second node may subsequently request the target data from the first node to obtain the target data, and if the target data is not cached at the first node, the second node may subsequently request the target data from the first node, and if the second node subsequently requests the target data from the first node, the first node may further request the target data from other nodes, which may result in a greater number of requests. Thus, the higher the heat of the target data at the second node, the more the target data needs to be cached at the first node to reduce the number of requests when the target data is requested by the second node. The method for acquiring the heat of the target data at the second node will be described in detail below, and will not be described herein.
In S102, for each second node, a transmission delay for the first node to transmit the target data to the second node is determined, thereby determining a transmission delay required for the second node to receive the target data if the second node requests the target data from the first node. And determining a minimum delay for transmitting the target data to the second node for each second node, so that the size of the transmission delay for transmitting the target data to the second node by the first node can be determined through comparison between the minimum delay and the transmission delay for transmitting the target data to the second node by the first node, and further judging whether the target data should be cached in the first node.
In S103, the smaller the transmission delay difference is, the closer the transmission delay when the first node transmits the target data to the second node is to the minimum delay when the target data is transmitted to the second node, that is, the smaller the transmission delay when the first node transmits the target data to the second node is. The larger the transmission delay difference value is, the more the transmission delay of the first node for transmitting the target data to the second node is far from the minimum delay for transmitting the target data to the second node, namely the larger the transmission delay of the first node for transmitting the target data to the second node is.
When the transmission delay difference is smaller, if the first node caches the target data, the smaller the transmission delay required by the second node to receive the target data when the second node requests the target data from the first node, the shorter the request time is, and if the second node does not cache the target data in the first node, the first node needs to request the target data from other nodes further when the second node subsequently requests the target data from the first node, which not only results in more request times, but also results in longer request time. The smaller the transmission delay difference, the more target data needs to be buffered at the first node.
In S104, the preset type value of the target data is used to indicate the importance degree of the data type of the target data in the data field of the target data, and the larger the preset type value is, the greater the importance degree of the data type of the target data in the data field of the target data is, the more the target data needs to be cached to the first node. The preset type value may be any value, and hereinafter, any value from 0 to 1 will be described as an example. The data fields to which the target data belong may be a financial field, a translation field, a programming field, etc., and the data in different data fields may be also classified into different data types. Taking the data field to which the target data belongs as an example of the financial field, the data type to which the target data belongs may refer to transaction-type knowledge data, announcement-type knowledge data, policy-type knowledge data, entertainment-type knowledge data, and the like.
The determination of the value of the preset type may, in one possible embodiment, be based on all expert knowledge data of a certain field, such as a financial field, to build a database, or a knowledge graph, which may update the knowledge data contained therein according to time variations. And determining the importance degree of different data types in the database or the knowledge graph through the self-adaptive convolution network, namely determining a preset type value, wherein the preset type value is updated along with the updating of the database or the knowledge graph.
It will be appreciated that the importance of a data type is positively correlated with the real-time nature of the data contained in that data type, as well as the importance of the data information contained in that data type. If the data contained in a certain data type is updated faster, namely, the real-time performance is higher, the data type is more important, namely, the importance degree is higher; the more important a data type is the data information that it contains. Thus, in one possible implementation, a logistic regression function may be designed based on experience that the importance of a certain data type is positively correlated with the real-time nature of the data contained in that data type, as well as with the importance of the data information contained in that data type, by which the value of the preset type is determined.
For example, the preset type value may be: the preset type of the transaction type knowledge data is 1, the preset type of the notice type knowledge data is 0.8, the preset type of the policy type knowledge data is 0.6, the preset type of the entertainment type knowledge data is 0.4, when the data type of the target data is the transaction type knowledge data, the preset type of the target data is 1, and when the data type of the target data is the entertainment type knowledge data, the preset type of the target data is 0.4.
In S105, as described in the foregoing S101-S104, the higher the heat of the target data at the second node, the more the target data needs to be cached at the first node; the more important the data type of the target data is, the larger the value of the preset type is, and the more the target data needs to be cached to the first node; the smaller the transmission delay difference, the more target data needs to be buffered at the first node. The more the target data needs to be cached in the first node, the larger the data caching value of the target data is, so that the data caching value of the target data is inversely related to the transmission delay difference value, positively related to the heat of the target data in the second node and positively related to the preset type value. Because the data cache value of the target data is obtained by summing the sub-data cache values corresponding to the second node, the sub-data cache value corresponding to the second node is inversely related to the transmission delay difference value, positively related to the heat of the target data in the second node and positively related to the preset type value.
In S106, according to the foregoing calculation method in S105, the sub data cache values corresponding to all the second nodes are obtained, and the sub data cache values corresponding to all the second nodes are summed to obtain the data cache value of the target data.
In S107, the larger the data cache value of the target data, the more the target data needs to be cached, so when the data cache values of all the data cached by the first node are larger, the data cached by the first node are all the data more needed to be cached, and the higher the stage score of the first node is. Thus, the phase score is positively correlated with the data cache value of all data cached by the first node.
Whether the target data should be cached in the first node can be determined by determining whether a stage score of the first node increases in the case of caching the target data in the first node according to the data cache value of the target data.
In S108, if the phase score of the first node increases in the case where the target data is cached in the first node, the target data may be cached in the first node.
Specifically, in S107-S108, the process of buffering the target data by the first node may be used as a stage, where the first node may or may not buffer the target data. Hereinafter, for convenience Description, the kth target data is recorded asThe first node caches->The stage of (2) is the kth stage. The first node is determining whether buffering is required +.>Previously, a determination has been made as to whether the first k-1 target data are buffered to the first node, the k-1 data being +.>、/>、…、/>The corresponding stages are the 1 st stage, the 2 nd stage, the … and the k-1 st stage respectively. The first node +/can be judged by the following calculation formula (1) of the phase score of the first node>Whether or not to cache data +.>
……(1)
wherein ,indicating the phase score of the first node in the kth phase,/->C of (a) represents the cache remaining capacity of the first node,/->Represents the i-th target data->Is set, when i=k,represents the kth target data->Data buffer value,/->Represents the mth node, i.e. the first node,/->The representation is: to ensure that the phase score of the first node is maximized, data +.>Whether buffering to the first node is required at stage i, data +.>When the buffer memory is needed to be carried out to the first node in the ith stage, the value of the buffer memory is any positive number; when data->When the buffer is not needed to be buffered to the first node in the i-th stage, the value is 0, when i=k, the value is +.>Representing target data +.>Whether to cache to the first node in the k-stage. Hereinafter will be +. >An example of the value 1 or 0 is illustrated.
As can be seen from equation (1), when the i-phase first node caches dataWhen (I)>The value is 1, when the first node does not cache data in the i stage +.>When (I)>The value is 0, so that the sum of the data cache values of all the data cached by the first node can be calculated as the phase score of the first node through the formula (1).
Hereinafter, a process of judging whether the first node buffers data according to formula (1) will be exemplarily described taking the kth stage as an example. At the kth stage, when the first node caches the target dataWhen (I)>The value is 1, and at this time, the stage score of the first node under the condition that the target data is cached to the first node is calculated by the formula (1); when the first node does not cache the target data in the i stage +.>When (I)>The value is 0, and in this case, the stage score of the first node is calculated by the formula (1) under the condition that the first node does not cache the target data. And judging whether to cache the target data to the first node by taking the maximum value of the stage scores of the first node under the two conditions that the target data is cached to the first node and the first node does not cache the target data. If the stage score of the first node is larger under the condition of caching the target data to the first node, which means that the stage score of the first node is increased under the condition of caching the target data to the first node, caching the target data to the first node; if the target data is cached in the first node And if the stage score of the first node is smaller, the stage score of the first node is not increased under the condition that the target data is cached to the first node, and the first node does not cache the target data.
In determining whether to cache the target data to the first node, it is necessary to consider not only the influence of the data cache value of the target data on the stage score of the first node, but also whether the data amount of all the data stored in the first node exceeds the cache capacity of the first node.
Based on this, in one possible embodiment, the aforementioned step S108 includes: and if the stage score of the first node is increased under the condition that the target data is cached to the first node, and the sum of the data amounts of all the data cached by the first node is smaller than the cache capacity of the first node, caching the target data to the first node.
By adopting the embodiment, when judging whether the target data can be cached in the first node, the influence of the data heat and the data type on the data caching is considered, and meanwhile, whether the data volume of the target data exceeds the caching residual capacity of the storage node is considered, so that the situation that the data volume of the target data is too large and cannot be stored in the first node is avoided, and the caching efficiency of the knowledge data is further improved.
In this embodiment, it may be determined whether the first node caches the target data using the formula (1), but since it is also necessary to consider whether the data amount of all the data stored in the first node exceeds the cache capacity of the first node, it is necessary to add a constraint condition as shown in the following formula (2) on the basis of the formula (1). Therefore, in this embodiment, it is necessary to determine whether the first node caches the target data together by the formula (1) and the formula (2), that is, determine to maximize the right of the equal sign in the formula (1) under the constraint condition that the formula (2) is satisfiedIf the obtained->=1, thenDetermining that the first node needs to cache the target data, otherwise, if +.>=0, it is determined that the first node does not need to cache the target data.
……(2);
wherein ,represents the i-th target data->Data volume of->Representation data->Whether buffering to the first node in the i-th stage is required, which takes a value of 1 or 0, when i=k, is +.>Representing target data +.>Whether to cache to the first node in k-stage, c represents the cache remaining capacity of the first node,/>Representing the cache capacity of the first node.
The process of judging whether the first node buffers data together by the formula (1) and the formula (2) will be described in detail as follows:
In stage 1, the data is judgedWhether or not (i.e., the first target data) is to be cached in the first node, it is only necessary to compare the cache remaining capacity c of the first node with the data +.>Data volume of->Since the first node has not stored any data, the cache remaining capacity c of the first node is the cache capacity of the first node, and the cache remaining capacity c of the first node is compared with the data +.>Data volume of->Is equivalent to the comparison of the buffer capacity of the first node with the data->Data volume of->Is of a size of (a) and (b). If the buffer capacity of the first node is greater than data +.>The first node is able to store dataAt this time, the first node has a stage score dp [1 ] at stage 1][c]For data->The data buffer value of the first node, the buffer remaining capacity of the first node +.>
In the kth stage, it is necessary to determine whether the stage score of the first node increases in the case of caching the target data to the first node according to the formula (1), and compare the remaining cache capacity c of the first node with the data amount of the target data according to the formula (2)Thereby enabling a determination of whether to cache the target data to the first node.
When the residual cache capacity c of the first node is larger than the data amount of the target data, if the first node is judged to cache the target data according to the formula (1), the target data is cached to the first node, and at this time, the cache residual capacity of the first node is increased The phase score dp [ k ] of the first node at the kth phase][c]Is thatThe method comprises the steps of carrying out a first treatment on the surface of the If the first node does not cache the target data according to the judgment of the formula (1), the first node does not cache the target data, and at the moment, the cache residual capacity of the first node is still ∈>The phase score dp [ k ] of the first node at the kth phase][c]Is->
When the residual cache capacity c of the first node is smaller than or equal to the data amount of the target data, the first node does not cache the target data, and at this time, the cache residual capacity of the first node is stillThe phase score dp [ k ] of the first node at the kth phase][c]Is that
From the above analysis, the foregoing formulas (1), (2) can be summarized as the following formula (3):
……(3)
the boundary condition of the formula (3) is the following formula (4):
……(4)
wherein ,indicating the stage score of the first node in the x-th stage when the cache remaining capacity is y, ++>Representing the first node->Represents the x-th target data->Data buffer value,/->The stage score when the residual capacity of the buffer memory is y in the x-1 stage is indicated by the first node.
Since the distributed storage system includes a plurality of nodes that can be used to store data, any node can be regarded as a first node, in order to determine an optimal caching policy as far as possible, the caching policy needs to satisfy the following formula (5) and formula (6):
……(5)
……(6)
wherein ,represents the kth target data->Data buffer value,/->Represents the mth node, i.e. the first node,/->Represents the kth target data->Whether buffering to the first node is needed, the value is 1 or 0,the buffer capacity of the first node is represented, M represents a total of M first nodes, and K represents a total of K data.
In the process of formulating the caching strategy, the formula (5) can be divided into a plurality of single phases according to different first nodes m, and for each single phase, namely for each first node, the maximum score of the first node phase is determined through the formula (1) and the formula (2), and the data cached by the first node is obtained under the condition that the sum of the data amounts of all data cached by the first phase is smaller than the caching capacity of the first node, so that the data cached by each first node is obtained, and the caching strategy of each first node is obtained. The cache policy of each first node is integrated as the cache policy of the whole distributed storage system, for the whole distributed storage system, the stage score is the sum of the stage scores of each first node, and for each single stage, the stage score of the whole distributed storage system should be highest because the cache policy of each first node makes the stage score of each first node highest, so the cache policy of each first node can be integrated as the optimal cache policy of the whole distributed storage system.
It will be appreciated that in data caching, factors that affect data caching performance are not only data type, data transfer delay, and data heat, but may also include the size of the data.
Based on this, in one possible embodiment, a ratio of the data amount of the target data to the cache capacity of the first node may also be calculated; aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat degree, the preset type value and the transmission delay difference value, wherein the sub-data cache value comprises: aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat degree, a preset type value, a transmission delay difference value and a ratio; the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference, and negatively correlated with the ratio.
Specifically, if the ratio of the data amount of the target data to the cache capacity of the first node is larger, which indicates that the data amount of the target data is similar to the cache capacity of the first node, or the data amount of the target data is larger than the cache capacity of the first node, if the target data is cached to the first node, the remaining cache capacity of the first node may not store other data, or the first node may not store the target data. Therefore, when the ratio of the data amount of the target data to the buffer capacity of the first node is large, buffering the target data to the first node should be avoided as much as possible, i.e. the data buffer value of the target data is small, and thus the data buffer value of the target data is inversely related to the above ratio. And the data cache value of the target data is obtained by summing the sub data cache values corresponding to the second node, so that the sub data cache values are inversely related to the ratio.
By adopting the embodiment, when the sub data cache value is calculated, the influence of the data heat degree, the data type, the data transmission delay and the data size on the data cache is considered, and meanwhile, the data cache value of the target data also considers four factors of the data heat degree, the data type, the data transmission delay and the data size, so that the calculation of the data cache value of the target data is more comprehensive, the accuracy of judging whether the target data is cached to the first node according to the data cache value is improved, and the caching efficiency of knowledge data is further improved.
In one possible embodiment, the data cache value of the kth target data may be calculated by the following formula:
……(7)
wherein ,data buffer value representing target data, +.>Represents the mth node, i.e. the first node,/->Represents kth target data,/->Represents the nth second node, ++>Representing the minimum delay for transmitting the kth target data to the nth second node, +.>Representing the maximum delay of transmitting the kth target data to the nth second node,/->Representing a transmission delay of the first node to the nth second node for transmitting the kth target data, a>Representing the heat of the kth target data at the nth second node, +. >Representing the ratio of the data amount of the kth target data to the buffer capacity of the first node, +.>Representing the preset type value of the kth target data.
By adopting the embodiment, when the sub data cache value is calculated, the influence of data heat, data type, data size and data transmission delay on the data cache is considered, so that the data cache value of target data also considers four factors of data heat, data type, data size and data transmission delay, the data value is comprehensively measured, the method has advantages in financial knowledge data scene relative to most edge collaborative cache research, the cache hit rate is improved, the transmission delay is reduced, and the cache efficiency of knowledge data is improved.
An exemplary method for acquiring the heat of the target data at the second node will be described below:
in one possible embodiment, the heat of the target data requested by the user at the second node may be modeled and predicted based on a fixed rule such as Ji Pufu (Zipf's law, zipf) law, to obtain the heat of the target data at the second node. In another possible embodiment, the heat of the target data at the second node may be obtained based on a deep learning model such as a round robin gate unit (Gate Recurrent Unit, GRU). Specifically, for each second node, the target data may be input into a deep learning model based on the loop gate unit, and the output result of the model is used as the heat of the target data at the second node.
For the method for acquiring the heat of the target data at the second node based on the fixed rules such as Ji Pufu law, the heat prediction is performed based on the fixed rules due to the frequent change of the financial knowledge data, so that the method is poor in effect and cannot cope with the frequently changed financial knowledge data in the application scene of predicting the heat of the financial knowledge data. For the method for acquiring the heat of the target data at the second node based on the deep learning model such as the circulation gate unit, the deep learning model in the method has the problems of incapability of parallel calculation, low training efficiency, gradient disappearance and the like. Therefore, the two methods for acquiring the heat of the target data at the second node have low data caching efficiency due to the performance problem of the model.
Based on this, an embodiment of the present application provides a data heat prediction model, as shown in fig. 2, where the data heat prediction model includes an input layer, a plurality of adaptive time convolutional networks (Ad-TCNs), a fully connected layer (FCs), a discard module (Dropout), a classifier (Softmax), and an output layer. Fig. 2 is a schematic diagram of the result of only one possible data heat prediction model, and for convenience of illustration, fig. 2 only includes two adaptive time convolution networks, and in a specific application, the number of the adaptive time convolution networks may be set according to actual requirements.
In this embodiment, the target data may be input into the data heat prediction model, to obtain the heat of the target data output by the data heat prediction model at the second node. Specifically, the preprocessing and feature cascading multi-mode target data is input into the data heat prediction model through the input layer, wherein the multi-mode target data refers to various expression forms of the target data, and the target data can be text information, image information, audio information, video information and the like. The plurality of continuous Ad-TCN modules are used for fully extracting the heat characteristics of the target data; 2 full connection layers (FC) are used for integrating the characteristic information extracted by the self-adaptive time convolution network; the Dropout module is subjected to overfitting by randomly deleting partial neuron node inhibition models; the Softmax classifier module is used for completing classification of the characteristic information and obtaining a model output layer. And a parameter rectification linear unit (Parametric Rectified Linear Unit, PReLU) and a batch normalization (Batch Normalization, BN) module are introduced between network layers to inhibit the model overfitting phenomenon and accelerate the learning rate. And finally, obtaining a heat prediction result of the target data at the second node through an output result of the output layer.
In fig. 2, an icon formed by stacking a plurality of solid-line square boxes is an expansion causal convolution, a rectangular box with a high length larger than the length of the bottom is a parameter rectification linear unit PreLU, a dotted-line box is a batch normalization BN module, a single solid-line square box is a 1×1 convolution, and a rectangular box with a high length smaller than the length of the bottom is a weight sum.
The adaptive time convolution network in the data heat prediction model includes a time convolution network and a stopping unit.
The time convolution network (Temporal Convolutional Network, TCN) is based on hole causal convolution and residual connection design, can solve the problem of historical information omission in a time sequence modeling task, and overcomes the phenomena of gradient disappearance and the like in deep network training, but after the size of a filter and an expansion factor of the TCN are determined, the size of a receptive field of the TCN becomes larger along with the increase of the depth of the network, so that the problems of calculation resource waste and long convergence time are caused. Therefore, the self-adaptive time convolution network provided by the application optimizes the time convolution network through the ACT algorithm, leads in the stop unit branches after the output of the time convolution network, accumulates the stop unit score of each branch, skips the operation of the residual time convolution network after judging that the accumulated sum reaches 1, and directly enters the full connection layer. Assuming that the data heat prediction model includes n adaptive time convolution networks, the nth adaptive time convolution network, i.e., the last adaptive time convolution network, as shown in fig. 2, does not include a sum total decision box, and the first n-1 adaptive time convolution networks each include a sum total decision box.
The adaptive computing time (Adaptive Computation Time, ACT) algorithm controls the number of repeated operations by introducing a stopping Unit (stopping Unit) into the repeated operation output at a certain time of the network, and a stopping Unit score (hereinafter referred to as a stopping score) is determined by the current operation output, and the stopping score can be calculated according to the following formula (8):
……(8)
wherein ,stop score indicating n stop units, < ->Representing the operational output of the nth stop unit, < ->Indicating the weight of the stop cell, +.>The bias coefficient indicating the nth stop unit score, f indicating the activation function, defaults to Sigmoid. The operation output of the stopping unit may refer to a feature vector/feature matrix of the target data.
According to the embodiment, an ACT algorithm is introduced into a time convolution network, and an adaptive time convolution network (Ad-TCN) is provided for data heat prediction, so that the adaptive time convolution network performs level jump when enough characteristic information is captured, namely, the adaptive time convolution network can directly input all acquired heat characteristics into a full-connection layer when enough characteristic information is captured, and characteristic extraction is not required to be performed through a subsequent adaptive time convolution network, thereby realizing adaptive control of network depth, saving model calculation time and improving data caching efficiency.
The method for obtaining the heat of the target data at the second node according to the data heat prediction model may be as shown in fig. 3, and includes the following steps: the target data includes a plurality of sub-data.
S301, inputting at least one piece of sub data into a time convolution network to obtain the heat characteristic output by the time convolution network.
Specifically, the specific contents of the target data of different time periods may be different, and the target data may be divided into a plurality of sub-data according to the different time periods.
S302, calculating a stop score by a stop unit according to the heat characteristic.
Specifically, the stop score of the stop unit may be calculated according to the foregoing formula (8).
And S303, if the stop score meets the preset stop condition, inputting all obtained heat characteristics into the full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node.
Specifically, the preset stopping condition may refer to that the sum of the stopping scores of all the stopping units is greater than or equal to 1, and when the sum of the stopping scores of all the stopping units is greater than or equal to 1, all the obtained heat characteristics are input to the full-connection layer, so as to obtain the heat of the target data output by the data heat prediction model at the second node.
And S304, if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub data into the time convolution network based on the sub data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
Specifically, when the sum of the stop scores of all the stop units is smaller than 1, the heat characteristic extraction is required, and in order to not obtain the repeated heat characteristic, the sub data which is not input into the time convolution network is required to be input into the time convolution network in the next adaptive time convolution network connected with the current stop unit, so as to obtain the heat characteristic output by the time convolution network.
According to the embodiment, the heat degree of the target data at the second node can be obtained through the data heat degree prediction model, and because the adaptive time convolution network performs level jump when enough characteristic information is captured, namely, when the stop score meets the preset stop condition, all obtained heat degree characteristics can be directly input into the full-connection layer, the characteristic extraction is not required to be performed through the subsequent adaptive time convolution network, the adaptive control of the network depth is realized, the calculation time of the model is saved, and the data caching efficiency is improved.
In practical application, the data caching method provided by the embodiment of the application can be performed according to the architecture shown in fig. 4. As shown in FIG. 4, the financial information (i.e., the target data) may have four expressions of text information, image information, audio information and video information, and the financial data of the four different expressions is preprocessed as multi-mode heat data to obtain text features, voice features, image features and video features. And after the features are subjected to feature cascade, the features are input to a feature extraction module comprising a self-adaptive time convolution network model for feature extraction and heat prediction, namely, the heat of the target data at the second node is obtained through a data heat prediction model comprising the self-adaptive time convolution network according to the steps of S301-S304. The financial information cache value (i.e. the data cache value of the target data) is calculated according to the information heat, the information type, the data size and the transmission delay, and specifically, the calculation of the financial information cache value can be performed according to the formula (7). And making a cache cooperation strategy based on a dynamic programming algorithm according to the financial information cache value, so as to obtain an optimal edge cooperation cache strategy. Specifically, the optimal caching strategy may be determined according to the relevant descriptions of the foregoing formulas (1) - (6).
In another possible embodiment, the prediction of the heat of the target data at the second node may be performed based on a fixed rule such as Ji Pufu law or based on a deep learning model such as a round robin gate unit, and the data caching policy may be determined according to the related description of the foregoing formulas (1) - (6) based on the predicted heat of the target data at the second node.
In yet another possible embodiment, the method described in the foregoing S301-S304 may be used to predict the heat of the target data at the second node using the data heat prediction model shown in fig. 2, and based on the predicted heat of the target data at the second node, the determination of the data caching policy is implemented according to an algorithm such as least frequently used (Least Frequently Used, LFU), least recently used (Least Recently Used, LRU), or greedy algorithm.
The embodiment of the application also provides a data caching device, which is applied to a first node in a distributed storage system, wherein the distributed storage system also comprises a second node, as shown in fig. 5, and the device comprises:
a heat acquiring module 501, configured to acquire, for each second node, a heat of the target data at the second node;
a minimum delay determining module 502, configured to determine, for each of the second nodes, a transmission delay of the first node for transmitting the target data to the second node, and determine a minimum delay of transmitting the target data to the second node;
A transmission delay difference determining module 503, configured to determine, for each of the second nodes, a difference between the transmission delay and the minimum delay as a transmission delay difference;
a preset type value determining module 504, configured to determine a preset type value of the target data according to a database of a data field to which the target data belongs; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
a sub-data buffer value calculating module 505, configured to calculate, for each of the second nodes, a sub-data buffer value corresponding to the second node according to the heat, the preset type value, and the transmission delay difference value; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
a data buffer value module 506, configured to sum the sub data buffer values corresponding to all the second nodes to obtain a data buffer value of the target data;
a determining module 507, configured to determine, according to the data cache value of the target data, whether a phase score of the first node increases under the condition of caching the target data to the first node, where the phase score is positively correlated with the data cache values of all data cached by the first node;
And the caching module 508 is configured to cache the target data to the first node if yes.
In one possible embodiment, the apparatus further comprises:
the ratio calculating module is used for calculating the ratio of the data volume of the target data to the cache capacity of the first node;
the sub data cache value calculation module is specifically configured to calculate, for each second node, a sub data cache value corresponding to the second node according to the heat, the preset type value, the transmission delay difference value, and the ratio; the sub data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference value, and negatively correlated with the ratio.
In one possible embodiment, the data cache value of the target data is defined as:
wherein ,data buffer value representing target data, +.>Representing the first node->Representing target data->Representing a second node->Representing a minimum delay for transmitting the target data to the second node,representing the maximum delay of transmitting the target data to the second node,/->Representing a transmission delay of the first node for transmitting the target data to the second node, < >>Representing the heat of the target data at the second node, Representing the ratio of the data amount of the target data to the buffer capacity of the first node,/for>And representing the preset type value of the target data.
In one possible embodiment, the caching module is specifically configured to cache the target data to the first node if the stage score of the first node increases in the case where the target data is cached to the first node, and the sum of the data amounts of all the data cached by the first node is smaller than the cache capacity of the first node.
In a possible embodiment, the heat acquiring module is specifically configured to input the target data into the data heat prediction model to obtain heat of the target data output by the data heat prediction model at the second node;
the data heat prediction model comprises an input layer, a plurality of self-adaptive time convolution networks, a full-connection layer, a discarding module, a classifier and an output layer, wherein the self-adaptive time convolution networks comprise a time convolution network and a stopping unit.
In a possible embodiment, the heat acquiring module is specifically configured to input at least one sub-data into the time convolution network to obtain a heat characteristic output by the time convolution network;
calculating a stop score by a stop unit according to the heat characteristic;
If the stopping score meets the preset stopping condition, inputting all obtained heat characteristics into the full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node;
and if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub data into the time convolution network based on the sub data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:
for each second node, acquiring the heat of the target data at the second node;
determining, for each second node, a transmission delay for the first node to transmit the target data to the second node, and determining a minimum delay for transmitting the target data to the second node;
determining, for each second node, a difference between the transmission delay and the minimum delay as a transmission delay difference;
Determining a preset type value of the target data according to a database of the data field to which the target data belong; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat degree, a preset type value and a transmission delay difference value; wherein, the sub data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
summing all the sub data cache values corresponding to the second nodes to obtain a data cache value of the target data;
determining whether a stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data, wherein the stage score is positively correlated with the data caching values of all data cached by the first node;
if yes, the target data is cached to the first node.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include Non-volatile memory (Non-VolatileMemory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (DigitalSignal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the steps of any of the above-mentioned data caching methods.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the data caching methods of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A data caching method applied to a first node in a distributed storage system, wherein the distributed storage system further comprises a second node, the method comprising:
for each second node, acquiring the heat of the target data at the second node;
determining, for each of the second nodes, a transmission delay for the first node to transmit the target data to the second node, and determining a minimum delay for transmitting the target data to the second node;
determining, for each of the second nodes, a difference between the transmission delay and the minimum delay as a transmission delay difference;
determining a preset type value of the target data according to a database of the data field to which the target data belong; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
Aiming at each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value and the transmission delay difference value; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
summing all the sub data cache values corresponding to the second nodes to obtain a data cache value of the target data;
determining whether a stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data, wherein the stage score is positively correlated with the data caching values of all data cached by the first node;
if yes, caching the target data to the first node;
the method further comprises the steps of:
calculating the ratio of the data volume of the target data to the cache capacity of the first node;
and for each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value and the transmission delay difference value, including:
For each second node, calculating a sub-data cache value corresponding to the second node according to the heat, the preset type value, the transmission delay difference value and the ratio; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference value, and negatively correlated with the ratio;
the data cache value of the target data is defined as:
wherein ,a data cache value representing said target data, a +.>Representing said first node,/->Representing the target data->Representing said second node,/->Representing a minimum delay of transmission of said target data to said second node,/for>Representing a maximum delay of transmission of said target data to said second node,/for>Representing a transmission delay of the first node to transmit the target data to the second node,representing the heat of the target data at the second node,/a>Representing the ratio of the data amount of said target data to the buffer capacity of said first node,/->And representing the preset type value of the target data.
2. The method of claim 1, wherein if so, caching the target data to the first node comprises:
And if the stage score of the first node is increased under the condition that the target data is cached to the first node, and the sum of the data amounts of all data cached by the first node is smaller than the cache capacity of the first node, caching the target data to the first node.
3. The method of claim 1, wherein the obtaining the heat of the target data at the second node comprises:
inputting the target data into a data heat prediction model to obtain the heat of the target data output by the data heat prediction model at the second node;
the data heat prediction model comprises an input layer, a plurality of self-adaptive time convolution networks, a full connection layer, a discarding module, a classifier and an output layer, wherein the self-adaptive time convolution networks comprise a time convolution network and a stopping unit.
4. A method according to claim 3, wherein the target data comprises a plurality of sub-data, and the obtaining the heat of the target data at the second node comprises:
inputting at least one piece of sub data into the time convolution network to obtain the heat characteristic output by the time convolution network;
Calculating a stop score by the stop unit according to the heat characteristic;
if the stopping score meets a preset stopping condition, inputting all obtained heat characteristics into a full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node;
and if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub-data into the time convolution network based on the sub-data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
5. A data caching apparatus applied to a first node in a distributed storage system, the distributed storage system further including a second node, the apparatus comprising:
the heat acquisition module is used for acquiring the heat of the target data at each second node;
a minimum delay determining module, configured to determine, for each of the second nodes, a transmission delay for the first node to transmit the target data to the second node, and determine a minimum delay for transmitting the target data to the second node;
A transmission delay difference determining module, configured to determine, for each of the second nodes, a difference between the transmission delay and the minimum delay as a transmission delay difference;
the preset type value determining module is used for determining the preset type value of the target data according to a database of the data field to which the target data belong; the preset type value is used for representing the importance degree of the data type of the target data in the data field, and all expert knowledge of the data field of the target data is stored in the database;
the sub data cache value calculation module is used for calculating a sub data cache value corresponding to each second node according to the heat degree, the preset type value and the transmission delay difference value; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, and negatively correlated with the transmission delay difference;
the data cache value obtaining module is used for summing all the sub data cache values corresponding to the second nodes to obtain the data cache value of the target data;
The determining module is used for determining whether a stage score of the first node is increased under the condition of caching the target data to the first node according to the data caching value of the target data, wherein the stage score is positively correlated with the data caching values of all data cached by the first node;
the caching module is used for caching the target data to the first node if yes;
the apparatus further comprises:
the ratio calculating module is used for calculating the ratio of the data volume of the target data to the cache capacity of the first node;
the sub-data buffer value calculation module is specifically configured to calculate, for each second node, a sub-data buffer value corresponding to the second node according to the heat, the preset type value, the transmission delay difference value, and the ratio; wherein the sub-data buffer value is positively correlated with the heat, positively correlated with the preset type value, negatively correlated with the transmission delay difference value, and negatively correlated with the ratio;
the data cache value of the target data is defined as:
wherein ,a data cache value representing said target data, a +.>Representing said first node,/- >Representing the target data->Representing said second node,/->Representing a minimum delay of transmission of said target data to said second node,/for>Representing a maximum delay of transmission of said target data to said second node,/for>Representing a transmission delay of the first node to transmit the target data to the second node,representing the heat of the target data at the second node,/a>Representing the ratio of the data amount of said target data to the buffer capacity of said first node,/->And representing the preset type value of the target data.
6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the caching module is specifically configured to cache the target data to the first node if a stage score of the first node increases under the condition that the target data is cached to the first node, and a sum of data amounts of all data cached by the first node is smaller than a cache capacity of the first node;
the heat acquiring module is specifically configured to input the target data into a data heat prediction model, and obtain heat of the target data output by the data heat prediction model at the second node;
the data heat prediction model comprises an input layer, a plurality of self-adaptive time convolution networks, a full-connection layer, a discarding module, a classifier and an output layer, wherein the self-adaptive time convolution networks comprise a time convolution network and a stopping unit;
The target data comprises a plurality of sub-data, and the heat acquisition module is specifically used for inputting at least one sub-data into the time convolution network to obtain heat characteristics output by the time convolution network;
calculating a stop score by the stop unit according to the heat characteristic;
if the stopping score meets a preset stopping condition, inputting all obtained heat characteristics into a full-connection layer to obtain the heat of the target data output by the data heat prediction model at the second node;
and if the stop score does not meet the preset stop condition, returning to execute the step of inputting at least one piece of sub-data into the time convolution network based on the sub-data which is not input into the time convolution network to obtain the heat characteristic output by the time convolution network.
7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.
CN202311082814.XA 2023-08-28 2023-08-28 Data caching method and device, electronic equipment and storage medium Active CN116828053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311082814.XA CN116828053B (en) 2023-08-28 2023-08-28 Data caching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311082814.XA CN116828053B (en) 2023-08-28 2023-08-28 Data caching method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116828053A CN116828053A (en) 2023-09-29
CN116828053B true CN116828053B (en) 2023-11-03

Family

ID=88120606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311082814.XA Active CN116828053B (en) 2023-08-28 2023-08-28 Data caching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116828053B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098519A (en) * 2022-06-15 2022-09-23 深圳前海微众银行股份有限公司 Data storage method and device
CN116112563A (en) * 2023-02-09 2023-05-12 南京邮电大学 Dual-strategy self-adaptive cache replacement method based on popularity prediction
CN116321303A (en) * 2023-03-20 2023-06-23 北京航空航天大学 Data caching method, device, equipment and readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11467967B2 (en) * 2018-08-25 2022-10-11 Panzura, Llc Managing a distributed cache in a cloud-based distributed computing environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098519A (en) * 2022-06-15 2022-09-23 深圳前海微众银行股份有限公司 Data storage method and device
CN116112563A (en) * 2023-02-09 2023-05-12 南京邮电大学 Dual-strategy self-adaptive cache replacement method based on popularity prediction
CN116321303A (en) * 2023-03-20 2023-06-23 北京航空航天大学 Data caching method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN116828053A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
US20170257452A1 (en) Systems and methods for data caching in a communications network
WO2021143883A1 (en) Adaptive search method and apparatus for neural network
CN112579194B (en) Block chain consensus task unloading method and device based on time delay and transaction throughput
CN112764936B (en) Edge calculation server information processing method and device based on deep reinforcement learning
CN114219097B (en) Federal learning training and predicting method and system based on heterogeneous resources
CN111294812B (en) Resource capacity-expansion planning method and system
CN111881358B (en) Object recommendation system, method and device, electronic equipment and storage medium
Gao 5G traffic prediction based on deep learning
CN112862060B (en) Content caching method based on deep learning
CN109218211B (en) Method, device and equipment for adjusting threshold value in control strategy of data stream
Qiu et al. OA-cache: Oracle approximation-based cache replacement at the network edge
CN111598457A (en) Method and device for determining quality of power wireless network
CN113271631B (en) Novel content cache deployment scheme based on user request possibility and space-time characteristics
CN116828053B (en) Data caching method and device, electronic equipment and storage medium
CN112597231A (en) Data processing method and device
CN116127400A (en) Sensitive data identification system, method and storage medium based on heterogeneous computation
CN115866687A (en) Service cooperative caching method in vehicle-mounted edge computing
CN114116528B (en) Memory access address prediction method and device, storage medium and electronic equipment
CN113837807B (en) Heat prediction method, heat prediction device, electronic equipment and readable storage medium
CN116266128A (en) Method and system for scheduling ecological platform resources
CN113297152B (en) Method and device for updating cache of edge server of power internet of things
CN115237555A (en) Method and system for scheduling edge computing tasks in industrial internet
Tao et al. Content popularity prediction in fog-rans: A bayesian learning approach
CN114416863A (en) Method, apparatus, and medium for performing model-based parallel distributed reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant