CN112379849B - Parallel deep learning training data input method and system based on sequence predictability - Google Patents

Parallel deep learning training data input method and system based on sequence predictability Download PDF

Info

Publication number
CN112379849B
CN112379849B CN202110062697.5A CN202110062697A CN112379849B CN 112379849 B CN112379849 B CN 112379849B CN 202110062697 A CN202110062697 A CN 202110062697A CN 112379849 B CN112379849 B CN 112379849B
Authority
CN
China
Prior art keywords
data
node
training
size
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110062697.5A
Other languages
Chinese (zh)
Other versions
CN112379849A (en
Inventor
何水兵
陈伟剑
杨斯凌
陈平
陈帅犇
曾令仿
任祖杰
杨弢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202110062697.5A priority Critical patent/CN112379849B/en
Publication of CN112379849A publication Critical patent/CN112379849A/en
Application granted granted Critical
Publication of CN112379849B publication Critical patent/CN112379849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a parallel deep learning training data input method based on sequence predictability, which fully utilizes the characteristic that an access sequence of data can be predetermined when data is prefetched and cached, determines the size of a prefetched data block when the data is prefetched from a bottom layer parallel file system by combining cache hit rate and disk access performance, and then performs data distribution and caching, so that the local hit rate of the first round of training in large-scale training is greatly improved. And data request combination is adopted in the training of the subsequent round, and cache replacement is carried out in advance according to the data to be used in the next round, so that the communication overhead in the whole distributed training process is reduced, and the data input speed of each node is accelerated. The invention also provides a data input system based on the method, which comprises a random sequence generation module, a data pre-fetching module and a cache replacement module and can accelerate the speed of reading data from the storage under the requirement of ensuring the random reading of global data.

Description

Parallel deep learning training data input method and system based on sequence predictability
Technical Field
The invention belongs to the field of computer science artificial intelligence, and particularly relates to the field of data input acceleration in a large-scale distributed neural network training scene.
Background
In order to train a deep neural network with higher prediction accuracy and stronger generalization, the amount of training data used by people is increasing, and therefore, distributed storage of the training data has become a necessary solution. A lot of research is focused on the calculation process and the communication process of distributed training, so that the calculation and communication of large-scale distributed neural network training are efficient, but when the number of distributed training nodes is very large, the data supply speed can become a key factor for limiting the whole training process.
When the traditional parallel file system cannot meet the requirement of the I/O speed, methods such as data prefetching and caching by using a burst buffer and designing a special file system are proposed to relieve the requirement, but when the data input is accelerated by the existing data prefetching and caching method, only the random access characteristic of a sample sequence is usually considered, the predictability of the I/O access sequence is not fully integrated into a relevant I/O optimization strategy, so that the data prefetched from a bottom file system or cached into the cache is not matched with the requirement of an upper training task, the local cache hit rate is reduced, the data input efficiency is still low, and the extremely high requirement of the training process on the I/O bandwidth cannot be effectively guaranteed.
Disclosure of Invention
In order to solve the problem of low data input efficiency in the large-scale distributed deep neural network training process, the invention provides a large-scale deep learning high-efficiency data input method and system based on sample sequence predictability. Particularly, when data is prefetched and cached, the cache hit rate of the first round of training in large-scale training is greatly improved by fully utilizing the characteristic that the access sequence of the data can be predetermined. And performing cache replacement according to data to be used in the next round in the training of the next round, and simultaneously adopting data request combination to reduce the communication overhead in the whole distributed training process, thereby accelerating the data input speed of each node.
The technical scheme is as follows:
the method comprises the following steps: when the first round of training of the neural network starts, the same global random sequence is generated at each node, and each node respectively takes out the training data number belonging to the node;
step two: determining the size of a pre-fetching data block when the data is pre-fetched from a bottom layer parallel file system by combining the cache hit rate and the disk access performance, namely the number of contained samples;
step three: caching the data block into each node according to the size of the pre-fetched data block determined in the step two and the allocation mode with the highest cache hit rate;
step four: in each training process after the first round, generating a random sequence of the next round in advance, and performing cache replacement on each node according to the random sequence to be accessed in the next round, locally cached data and data used in the round; and caching data to be used in the next round in the node in advance until the training is finished.
Further, the second step specifically includes the following substeps:
(2.1) expressing the size of the prefetch data block as bi, i is a natural number and expresses the iteration number, wherein the size of the initial prefetch data block is b0= N/M; n is the total number of training samples, and M is the number of nodes for parallel training;
(2.2) the main node performs analog data distribution on each node according to the size of the prefetched data block, and determines the cache hit rate hi of the system when the size of the current prefetched data block is taken as a value: hi = ni/N, wherein N is the total number of training data, and ni is the number of samples hit together in all nodes;
(2.3) after the local cache hit rate of each iteration is calculated, the sizes of hi and h (i-1) are compared; if hi is larger than h (i-1), the step (2.4) is carried out; otherwise, the size of the prefetch data block corresponding to the current iteration number is used as the size of the final prefetch data block;
(2.4) if bi is less than bmin, the size of the finally prefetched data block is b = bmin, otherwise, the size of the prefetched data block in the next iteration is half of the size of the prefetched data block in the previous iteration, and the step (2.2) is returned to calculate the cache hit rate again; where bmin = sxt is the minimum size of the prefetched data block, s is the seek time of the disk, and t is the sustained transmission rate of the disk.
Further, in the step (2.2), the main node performs data allocation to each node according to the size of the prefetch data block, specifically:
the main node divides all training data into blocks according to the size of the pre-fetched data block to obtain n blocks of data, and each node caches k = n/M blocks;
the main node starts traversing all data blocks from the node with the number of 1, finds out k blocks of data with the largest number of hit samples for the node 1 and distributes the k blocks of data to the node 1; then traversing all data blocks from the node with the number of 2 again, finding out the k block data with the maximum number of hit samples for the node 2, distributing the k block data to the node 2, and so on; wherein, the data blocks distributed by each node are mutually exclusive;
after the size of the finally prefetched data block is determined, the main node communicates the number of the k blocks of data distributed by each node to other nodes.
Further, in the fourth step, performing cache replacement according to the random sequence to be accessed in the next round, the locally cached data of each node, and the data used in the current round specifically includes: each node traverses each training data number distributed to the node in the next round, if the training data number is not in the local cache of the node, a remote request is initiated to exchange data, and the training data is deleted in the remote node; if the training data number is in the node local cache then it remains unchanged.
Further, in each round of training, if a certain node sends a plurality of requests to the same node, the request merging module merges the requests and sends the requests, so that small requests are prevented from being sent to the same node for multiple times.
Based on the method, the invention also provides a parallel deep learning training data input system based on sequence predictability, which comprises:
the random sequence generating module is used for generating the same global random sequence at each node;
the data prefetching module is used for determining the size of a prefetched data block and performing data block allocation and caching on each node according to the size of the prefetched data block;
and the cache replacement module is used for carrying out cache replacement in each node in advance according to the random sequence to be accessed in the next round, the locally cached data and the data used in the round, and caching the data to be used in the next round in the node in advance.
Further, the data prefetch module includes:
the prefetch granularity decision module is used for determining the size of a prefetch data block when data are prefetched from a bottom layer parallel file system by combining the cache hit rate and the disk access performance;
and the pre-fetching data block distribution module is used for distributing and caching data to each node according to the size of the pre-fetching data block.
Furthermore, the system also comprises a request merging module which is used for merging a plurality of small data requests of a certain node to the same node into a large request to be sent in each round of training.
The invention has the beneficial effects that: in the process of parallel training of the large-scale neural network, the speed of reading data from the storage can be accelerated under the requirement of ensuring random reading of global data.
Drawings
FIG. 1 is a diagram illustrating a conventional data prefetching method;
FIG. 2 is a diagram illustrating a data prefetching method according to the present invention.
Detailed Description
The present invention is described in detail below with reference to the accompanying drawings.
Fig. 1 shows a conventional data prefetching method, where if the number of training nodes is M, each training node first divides all training data into M groups, and caches the data from a parallel file system to the local. To facilitate metadata management, data is not moved between training nodes in training thereafter. The data prefetched from the bottom file system or cached in the cache is not matched with the requirement of the upper training task, so that the cache hit rate is reduced, and the data input efficiency is not high; in addition, in the multiple iterative training of each round of training, when one node requests several data from the same remote node for multiple times, the requests are not combined, and the requests of multiple small data are directly initiated.
Based on the above, the invention designs a high-efficiency data input system for large-scale deep learning training, which comprises a random sequence generation module, a data pre-fetching module and a cache replacement module; the data pre-fetching module comprises a pre-fetching granularity decision module and a pre-fetching data block distribution module. Fig. 2 is a schematic diagram of an efficient data input method based on sample sequence predictable large-scale deep learning training, which is provided by the present invention, and the present invention makes full use of the characteristic that an access sequence of data can be predetermined, and determines the size of a prefetch data block when prefetching data from a bottom-layer parallel file system by combining cache hit rate and disk access performance, and then performs data caching, so as to effectively improve the cache hit rate, as shown in fig. 2, the efficient data input method of the present invention is specifically implemented as follows:
the method comprises the following steps: when the first round of training is started, the random sequence generation module generates the same global random sequence at each node (wherein each node uses the training round number as a random seed, so the generated random sequences are the same), and then respectively extracts the training data numbers belonging to the node according to a remainder mode.
Step two: the pre-fetching granularity decision module comprehensively considers the factors of cache hit rate and disk access performance and determines the size of a pre-fetching data block when the data is pre-fetched from a bottom layer parallel file system.
In particular, a heuristic algorithm is employed to determine the prefetch data block size. The method mainly comprises the following steps:
(2.1) let the prefetch data block size be bi, i be a natural number, and represent the number of iterations, where the initial prefetch data block size b0= N/M, N is the total number of training samples, and M is the number of nodes in parallel training.
(2.2) the pre-fetching data block distributing module distributes data to each node according to the size of the pre-fetching data block, and determines the cache hit rate hi of the system when the size of the current pre-fetching data block is taken as a value: hi = ni/N, where N is the total number of training data and ni is the number of samples hit together in all nodes.
It should be noted that the size of the prefetch data block at this time is not finally determined, and only the prefetch data block allocation module is used for performing the pre-allocation, so as to obtain the cache hit rate hi corresponding to the size bi of the prefetch data block at this time, and not cache the data.
And (2.3) comparing the sizes of hi and h (i-1) after calculating the local cache hit rate of each round. If hi is larger than h (i-1), the step (2.4) is carried out; otherwise, the size of the prefetch data block corresponding to the current iteration number is used as the size of the final prefetch data block.
(2.4) if bi < bmin, finally prefetching the data block size b = bmin, otherwise, making the data block size of the next iteration be half of the data block size of the previous iteration, namely bi =1/2 b (i-1), and returning to the step (2.2) to recalculate the cache hit rate. Where bmin = sxt is the minimum size of the prefetched data block, s is the seek time of the disk, and t is the sustained transmission rate of the disk.
The step (2.4) is to avoid degrading the system performance due to the large amount of disk random access caused by the small size of the prefetched data block.
The main node performs data distribution on each node according to the size of the prefetch data block, and the method specifically comprises the following steps:
and the main node blocks all the training data according to the size of the pre-fetched data block to obtain n blocks of data, and each node caches k = n/M blocks.
The main node starts traversing all data blocks from the node with the number of 1, finds out k blocks of data with the largest number of hit samples for the node 1 and distributes the k blocks of data to the node 1; and then traversing the rest of the data blocks from the node with the number of 2 again, finding out the k blocks with the largest number of hit samples for the node 2, allocating the k blocks to the node 2, and so on until all the nodes are allocated completely. The data blocks which are already allocated by the previous nodes are not allocated any more, so that the data blocks which are finally allocated by each node are mutually exclusive.
And finally, after the size of the data block to be prefetched is determined, the main node communicates the number of the k blocks of data distributed by each node to other nodes.
Step three: and the pre-fetching data block allocation module selects an allocation mode with the highest hit rate to cache the data block into each computing node according to the size of the pre-fetching data block after determination.
Step four: the random sequence generation module generates a random sequence of the next round in advance, and the cache replacement module performs cache replacement according to the random sequence to be accessed in the next round, locally cached data and data used in the round; caching data to be used in the next round locally in advance; until the training is finished.
As a preferred scheme, in this step, performing cache replacement according to the random sequence to be accessed in the next round, the locally cached data of each node, and the data used in this round specifically includes:
each node traverses each training data number distributed to the node in the next round, if the training data number is not in the local cache of the node, a remote request is initiated to exchange data, and the training data is deleted in the remote node; if the training data number is in the node local cache then it remains unchanged. Because the global random sequences generated by each node are the same, each node can acquire the data distributed by each node in the previous round, namely the existing cache data of each node before the training of the current round is started; in other words, each node has a dynamic global metadata distribution information; thus, when a training data is not present at the node, the node with the data can be found to initiate a remote request cache replacement.
As a preferred embodiment, the efficient data input system further comprises a request merging module, and in the training process, if a certain node sends multiple requests to the same node, the request merging module merges and sends the multiple requests, so that multiple sending of small requests to the same node is avoided, time overhead of network transmission is reduced, and training efficiency is improved.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should all embodiments be exhaustive. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.

Claims (7)

1. A parallel deep learning training data input method based on sequence predictability is characterized by comprising the following steps:
the method comprises the following steps: when the first round of training of the neural network starts, the same global random sequence is generated at each node, and each node obtains the respective training data number;
step two: determining the size of a pre-fetching data block when the data is pre-fetched from a bottom layer parallel file system according to the training data number of each node and by combining the cache hit rate and the disk access performance; the method specifically comprises the following substeps:
(2.1) expressing the size of the prefetch data block as bi, i is a natural number and expresses the iteration number, wherein the size of the initial prefetch data block is b0= N/M; n is the total number of training samples, and M is the number of nodes for parallel training;
(2.2) the main node performs analog data distribution on each node according to the size of the prefetched data block, and determines the cache hit rate hi of the system when the size of the current prefetched data block is taken as a value: hi = ni/N, wherein N is the total number of training samples, and ni is the number of samples which are hit together in all nodes;
(2.3) after the local cache hit rate of each iteration is calculated, the sizes of hi and h (i-1) are compared; if hi is larger than h (i-1), the step (2.4) is carried out; otherwise, the size of the prefetch data block corresponding to the current iteration number is used as the size of the final prefetch data block;
(2.4) if bi is less than bmin, the size of the finally prefetched data block is b = bmin, otherwise, the size of the prefetched data block in the next iteration is half of the size of the prefetched data block in the previous iteration, and the step (2.2) is returned to calculate the cache hit rate again; wherein bmin = sxt is the minimum size of the prefetch data block, s is the seek time of the disk, and t is the continuous transmission rate of the disk;
step three: caching the data block into each node according to the size of the pre-fetched data block determined in the step two and the allocation mode with the highest cache hit rate;
step four: in each training process after the first round, generating a random sequence of the next round in advance, and performing cache replacement on each node according to the random sequence to be accessed in the next round, locally cached data and data used in the round; and caching data to be used in the next round in the node in advance until the training is finished.
2. The sequence-based predictable parallel deep learning training data input method according to claim 1, wherein in the step (2.2), the master node performs analog data distribution on each node according to the size of the prefetched data block, specifically:
the main node divides all training data into blocks according to the size of the pre-fetched data block to obtain n blocks of data, and each node caches k = n/M blocks;
the main node starts traversing all data blocks from the node with the number of 1, finds out k blocks of data with the largest number of hit samples for the node 1 and distributes the k blocks of data to the node 1; then traversing all data blocks from the node with the number of 2 again, finding out the k block data with the maximum number of hit samples for the node 2, distributing the k block data to the node 2, and so on; wherein, the data blocks distributed by each node are mutually exclusive;
and finally, after the size of the data block to be prefetched is determined, the main node communicates the number of the k blocks of data distributed by each node to other nodes.
3. The sequence-predictable parallel deep learning training data input method according to claim 1, wherein in the fourth step, performing cache replacement according to the random sequence to be accessed in the next round, the locally cached data of each node, and the data used in the current round specifically comprises:
each node traverses each training data number distributed to the node in the next round, if the training data number is not in the local cache of the node, a remote request is initiated to exchange data, and the training data is deleted in the remote node; if the training data number is in the node local cache then it remains unchanged.
4. The method as claimed in claim 1, wherein a node is used to transmit multiple requests from the same node in a combined manner during each round of training.
5. A training data input system based on the sequence-based predictable parallel deep learning training data input method of claim 1, comprising:
the random sequence generating module is used for generating the same global random sequence at each node;
the data prefetching module is used for determining the size of a prefetched data block and performing data block allocation and caching on each node according to the size of the prefetched data block;
and the cache replacement module is used for carrying out cache replacement in each node in advance according to the random sequence to be accessed in the next round, the locally cached data and the data used in the round, and caching the data to be used in the next round in the node in advance.
6. The training data input system of claim 5, wherein the data pre-fetch module comprises:
the prefetch granularity decision module is used for determining the size of a prefetch data block when data are prefetched from a bottom layer parallel file system by combining the cache hit rate and the disk access performance;
and the pre-fetching data block distribution module is used for distributing and caching data to each node according to the size of the pre-fetching data block.
7. The training data input system of claim 5, further comprising a request combining module for combining multiple small data requests of a node into one large request for transmission in each round of training.
CN202110062697.5A 2021-01-18 2021-01-18 Parallel deep learning training data input method and system based on sequence predictability Active CN112379849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110062697.5A CN112379849B (en) 2021-01-18 2021-01-18 Parallel deep learning training data input method and system based on sequence predictability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110062697.5A CN112379849B (en) 2021-01-18 2021-01-18 Parallel deep learning training data input method and system based on sequence predictability

Publications (2)

Publication Number Publication Date
CN112379849A CN112379849A (en) 2021-02-19
CN112379849B true CN112379849B (en) 2021-04-09

Family

ID=74582007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110062697.5A Active CN112379849B (en) 2021-01-18 2021-01-18 Parallel deep learning training data input method and system based on sequence predictability

Country Status (1)

Country Link
CN (1) CN112379849B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563499A (en) * 2021-12-02 2023-01-03 华为技术有限公司 Method, device and system for training model and computing node
CN114968588A (en) * 2022-06-07 2022-08-30 之江实验室 Data caching method and device for multi-concurrent deep learning training task
CN116303974B (en) * 2023-05-04 2023-08-01 之江实验室 Response method and device based on target generation type response language model
CN116501696B (en) * 2023-06-30 2023-09-01 之江实验室 Method and device suitable for distributed deep learning training prefetching cache management

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180314981A1 (en) * 2017-04-28 2018-11-01 Cisco Technology, Inc. Data sovereignty compliant machine learning
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium
CN111126619A (en) * 2019-12-06 2020-05-08 苏州浪潮智能科技有限公司 Machine learning method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180314981A1 (en) * 2017-04-28 2018-11-01 Cisco Technology, Inc. Data sovereignty compliant machine learning
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium
CN111126619A (en) * 2019-12-06 2020-05-08 苏州浪潮智能科技有限公司 Machine learning method and device

Also Published As

Publication number Publication date
CN112379849A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN112379849B (en) Parallel deep learning training data input method and system based on sequence predictability
CN108710639B (en) Ceph-based access optimization method for mass small files
CN103885728B (en) A kind of disk buffering system based on solid-state disk
US11928580B2 (en) Interleaving memory requests to accelerate memory accesses
CN104063330B (en) Data prefetching method and device
KR20130020050A (en) Apparatus and method for managing bucket range of locality sensitivie hash
CN106528451B (en) The cloud storage frame and construction method prefetched for the L2 cache of small documents
CN112667528A (en) Data prefetching method and related equipment
CN107368608A (en) The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN115712583B (en) Method, device and medium for improving distributed cache cross-node access performance
CN112486994A (en) Method for quickly reading data of key value storage based on log structure merging tree
Choi et al. Learning future reference patterns for efficient cache replacement decisions
CN107426315A (en) A kind of improved method of the distributed cache system Memcached based on BP neural network
CN112199304A (en) Data prefetching method and device
CN113821477A (en) Metadata caching method, system, equipment and medium
CN113064907A (en) Content updating method based on deep reinforcement learning
CN109168023B (en) Method for caching scalable video stream
CN105530303B (en) A kind of network-caching linear re-placement method
CN110381540A (en) The dynamic buffering update method of real-time response time-varying file popularity based on DNN
WO2017049488A1 (en) Cache management method and apparatus
CN107015865B (en) DRAM cache management method and system based on time locality
WO2022148306A1 (en) Data elimination method and apparatus, cache node, and cache system
CN114462590B (en) Importance-aware deep learning data cache management method and system
CN117076415A (en) Distributed deep learning caching method based on sample importance sampling
CN110362399B (en) Plant root system optimization method suitable for cloud storage copy layout

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant