CN110245094A - A kind of block grade cache prefetching optimization method and system based on deep learning - Google Patents
A kind of block grade cache prefetching optimization method and system based on deep learning Download PDFInfo
- Publication number
- CN110245094A CN110245094A CN201910526384.3A CN201910526384A CN110245094A CN 110245094 A CN110245094 A CN 110245094A CN 201910526384 A CN201910526384 A CN 201910526384A CN 110245094 A CN110245094 A CN 110245094A
- Authority
- CN
- China
- Prior art keywords
- data
- memory
- prediction
- module
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0868—Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a kind of block grade cache prefetching optimization method based on deep learning, it include: the I/O data obtained from test data set as unit of byte, I/O data in blocks is converted by the I/O data, whether the I/O data after judging conversion hits in the buffer, sequential prediction is carried out to the I/O data after conversion if not, to obtain multiple memory blocks, I/O data after conversion is stored in the IO queue of memory, and judge whether the IO queue has expired, if it is all I/O datas in the IO queue of memory are inputted in trained Seq2Seq model, with the I/O data predicted, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction.The present invention excavates the correlation of IO according to the method using deep learning, and the prediction of I/O sequence is completed using the Seq2Seq model based on LSTM, finally predicts to combine with sequential prediction by I/O sequence, to complete prefetching for caching, promotes the hit rate of caching.
Description
Technical field
The invention belongs to computer memory technical fields, cache more particularly, to a kind of block grade based on deep learning
Prefetching optimization method and system.
Background technique
It is become more and more important in the performance of big data era, storage system, in order to further enhance the property of storage system
Can, it needs to prefetch the block grade access of local disk.
The forecasting method of current block grade access mainly has fixed prefetching algorithm, sequence prefetching algorithm, based on application hint
Prefetching algorithm, prefetching algorithm based on data mining etc..It is relatively simple wherein to fix prefetching algorithm realization, need to only prefetch current visit
Ask n data behind data;Sequence prefetching algorithm is to a kind of improvement of fixed prefetching algorithm, it can be according to data
Access characteristics, the opportunity that adjust automatically prefetches and the quantity prefetched;Prefetching algorithm based on application hint is by being supplied to upper layer
Using an interface, prefetched according to the instruction completion of upper layer application;Prefetching algorithm based on data mining is visited by excavating history
Ask that data completion prefetches, such as: the correlation of C-Miner method mining data block by way of finding Frequent Subsequence, and
It is translated into correlation rule completion and prefetches work, Block2Vec method excavates the correlation of memory block based on deep learning, leads to
Crossing the COS distance calculated between memory block vector can also complete to prefetch.
However, the forecasting method of above-mentioned piece of grade access haves the defects that some can not ignore: fixed prefetching algorithm with it is suitable
Sequence prefetching algorithm is all just with the principle of locality of data access, and prefetching efficiency and to prefetch accuracy rate all relatively low;
The interface of upper layer application one reception and registration prefetched instruction is provided to based on the prefetching algorithm that application implies, this is in many systems
It is unable to get support;Prefetching algorithm based on data mining, the ability of mining data access characteristics is also than relatively limited, not
In same access module, it is widely different to prefetch accuracy rate performance.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of block grade of base deep learning cachings
Prefetching optimization method, it is intended that excavating the correlation of IO according to the method using deep learning, and using based on LSTM's
Seq2Seq model completes the prediction of I/O sequence, finally predicts to combine with sequential prediction by I/O sequence, to complete the pre- of caching
It takes, promotes the hit rate of caching.
To achieve the above object, according to one aspect of the present invention, a kind of block grade caching based on deep learning is provided
Prefetching optimization method, comprising:
(1) I/O data as unit of byte is obtained from test data set, converts the I/O data in blocks
I/O data;
(2) whether the I/O data after judging conversion hits in the buffer, and if it is process terminates, and otherwise enters step
(3);
(3) sequential prediction is carried out to the I/O data after conversion, to obtain multiple memory blocks;
(4) I/O data after conversion is stored in the IO queue of memory, and judges whether the IO queue has expired, if it is
(5) are then entered step, otherwise return step (1);
(5) all I/O datas in the IO queue of memory are inputted in the trained Seq2Seq model based on LSTM, with
The I/O data predicted, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction;
(6) according to cache replacement algorithm, and depositing in the memory block replacement caching obtained using step (3) and step (5)
Store up block.
Preferably, the I/O data after step (1) conversion is the storage block number with its logical block address LBA and the I/O data
BlockNum is indicated jointly.
Preferably, corresponding multiple data blocks are obtained according to the I/O data of prediction, is to parse the I/O data first, to obtain
Its logical block address LBA and its storage block number BlockNum is obtained, and then obtains all storages included by the I/O data of the prediction
Block.
Preferably, judge I/O data whether in memory hit be check I/O data all memory blocks whether in the buffer
Hit, if it is indicates that the I/O data is hit in the buffer, otherwise indicates that the I/O data is not hit in the buffer.
Preferably, it is pre- according to the logical address of the last one memory block in the I/O data for carrying out sequential prediction to I/O data
Survey the memory block that will be accessed.
Preferably, the Seq2Seq model based on LSTM used in step (5) is through the following steps that training obtained:
(5-1) obtains training dataset, concentrates the I/O data as unit of byte to be converted into the training data and is with block
The I/O data of unit;
All I/O datas that (5-2) is obtained after converting to step (5-1) carry out segment processing, after obtaining multiple segmentations
I/O data, and store the I/O data after all segmentations;
(5-3) is handled the I/O data after all segmentations using term vector generating algorithm, corresponding to obtain I/O data
Vector;
(5-4) carries out non-overlapping sequence sampling to the I/O data after all segmentations, to obtain multiple sequence samples;
I/O data institute after (5-5) the Seq2Seq model of basis based on LSTM and the segmentation for using step (5-3) to obtain is right
The vector answered carries out repetition training to multiple sequence samples that step (5-4) obtains, until reaching default the number of iterations.
Preferably, the continuous bag of words in Word2vec algorithm specifically used in step (5-3), the dimension of vector
It is 50.
Preferably, the cache replacement algorithm that step (6) uses can be lru algorithm, FIFO algorithm or ARC algorithm.
It is another aspect of this invention to provide that providing a kind of block grade cache prefetching optimization system based on deep learning, wrap
It includes:
First module, for obtaining I/O data as unit of byte from test data set, by the I/O data be converted into
Memory block is the I/O data of unit;
Second module, for judging whether the I/O data after conversion hits in the buffer, if it is process terminates, otherwise
Into third module;
Third module, for carrying out sequential prediction to the I/O data after conversion, to obtain multiple memory blocks;
Whether 4th module for the I/O data after conversion to be stored in the IO queue of memory, and judges the IO queue
It has been expired that, if yes then enter the 5th module, otherwise return to the first module;
5th module inputs trained based on LSTM's for all I/O datas in the IO queue by memory
In Seq2Seq model, with the I/O data predicted, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction;
6th module, for being replaced according to cache replacement algorithm, and using the memory block that third module and the 5th module obtain
Change the memory block in caching.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show
Beneficial effect:
(1) present invention is able to solve prefetching efficiency present in existing method and prefetches the relatively low problem of accuracy rate:
(6) are arrived due to using step (3) in the present invention, sequential prediction are combined with the Seq2Seq prediction based on LSTM, sufficiently benefit
With the principle of locality of data access and the correlation principle of data, therefore, the efficiency and accuracy rate prefetched is all promoted.
(2) present invention, which is able to solve present in existing method, is provided to upper layer application interface, conveys prefetched instruction
The problem of, (6) are arrived due to using step (1) in the present invention, whole process is isolated with upper layer application, the finger without upper layer application
Show, accordingly, it is not necessary to provide upper layer application for conveying the interface of prefetched instruction.
(3) it is big to be able to solve predictablity rate under different access mode present in existing method performance difference by the present invention
Problem: arriving (5) due to using step (2) in the present invention, by detecting cache hit situation, by sequential prediction and is based on LSTM
Seq2Seq prediction combine, to adapt to different access modules, therefore, under different access modules, the accuracy rate that prefetches
All perform well.
(4) present invention carries out the prediction of I/O sequence by the Seq2Seq model based on LSTM, more than calculating vector
Chordal distance not only improves forecasting efficiency and also reduces memory overhead.
(5) present invention has not only excavated the correlation of IO, has more reached IO dimensionality reduction by IO2Vec technique drill IO vector
Purpose.Compared to the vector that simple One-Hot is indicated, when the IO vector of IO2Vec training greatly reduces model training
Time loss and memory consumption, and improve the accuracy rate of model.
(6) in the test of caching, hit rate prefetches the prefetching algorithm designed in the present invention relative to no use
For caching, averagely promotion 10%-30%;Relative to the caching for only using sequence prefetching algorithm, also there is the promotion of 3%-10%.
Detailed description of the invention
Fig. 1 is the flow chart of the block grade cache prefetching optimization method the present invention is based on deep learning.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
As shown in Figure 1, the present invention provides a kind of block grade cache prefetching optimization method based on deep learning, including it is following
Step:
(1) input and output (Input/output, abbreviation IO) data as unit of byte are obtained from test data set, it will
The I/O data is converted into the I/O data with block (Block) for unit;
Specifically, the storage block size of the I/O data after converting in this step is 4096KB.
I/O data after conversion is with its logical block address (Logical block address, abbreviation LBA) and the IO number
According to storage block number (BlockNum) jointly indicate.
(2) whether the I/O data after judging conversion hits in the buffer, and if it is process terminates, and otherwise enters step
(3);
It is to check all memory blocks of I/O data whether slow specifically, judging whether I/O data hits in the buffer
Middle hit is deposited, if it is indicates that the I/O data is hit in the buffer, otherwise indicates that the I/O data is not hit in the buffer.
(3) sequential prediction (Sequential prediction) is carried out to the I/O data after conversion, to obtain multiple storages
Block;
Specifically, carrying out sequential prediction to I/O data is the logical address according to the last one memory block in the I/O data
Predict the memory block (quantity predicted in the present invention 4-16, determine with specific reference to sequence detection result) that will be accessed;Example
Such as, if the logical address of the last one memory block is 10 in current I/O data, the logical address after prediction is 11,12,
13 etc., it is corresponding with multiple memory blocks.
(4) I/O data after conversion is stored in the IO queue of memory, and judges whether the IO queue has expired, if it is
(5) are then entered step, otherwise return step (1);
(5) all I/O datas input in the IO queue of memory is trained based on shot and long term memory (Long short-
Term memory, abbreviation LSTM) sequence in series model (Sequence to Sequence, abbreviation Seq2Seq), with
The I/O data predicted, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction;
Specifically, obtain corresponding multiple data blocks according to the I/O data of prediction, be parse the I/O data first, thus
Its logical block address (Logical block address, abbreviation LBA) and its storage block number (BlockNum) are obtained, and then just
It can obtain all memory blocks included by the I/O data of the prediction.
Seq2Seq model used in this step based on LSTM is through the following steps that training obtained:
(5-1) obtains training dataset, concentrates the I/O data as unit of byte to be converted into the training data and is with block
The I/O data of unit;
All I/O datas that (5-2) is obtained after converting to step (5-1) carry out segment processing, after obtaining multiple segmentations
I/O data, and store the I/O data after all segmentations;
Specifically, the section gap in this step is 10 milliseconds;
(5-3) is handled the I/O data after all segmentations using term vector generating algorithm (word2vec), to obtain
The corresponding vector of I/O data;
In this step, be using Word2vec algorithm in continuous bag of words (Continuous Bag-Of-
Words Model, abbreviation CBOW), the dimension of vector is 50.
(5-4) carries out non-overlapping sequence sampling to the I/O data after all segmentations, to obtain multiple sequence samples;
Specifically, the sequence length during sequential sampling is 3;
Non-overlap samples does not intersect between the I/O data for referring to front and back double sampling.
I/O data institute after (5-5) the Seq2Seq model of basis based on LSTM and the segmentation for using step (5-3) to obtain is right
The vector answered carries out repetition training to multiple sequence samples that step (5-4) obtains, until reaching default the number of iterations;
Specifically, presetting the number of iterations in this step is 1 to 1000 times, preferably 500 times.
(6) according to cache replacement algorithm, and depositing in the memory block replacement caching obtained using step (3) and step (5)
Store up block.
Specifically, cache replacement algorithm used in the present invention can be least recently used (Least recently
Used, abbreviation LRU) algorithm, first in first out (First in first out, abbreviation FIFO) algorithm or adaptive displacement high speed it is slow
Deposit (Adaptive replacement cache, abbreviation ARC) algorithm.
In general, the present invention has the following beneficial effects:
(1) present invention is able to solve prefetching efficiency present in existing method and prefetches the relatively low problem of accuracy rate:
(6) are arrived due to using step (3) in the present invention, sequential prediction are combined with the Seq2Seq prediction based on LSTM, sufficiently benefit
With the principle of locality of data access and the correlation principle of data, therefore, the efficiency and accuracy rate prefetched is all promoted.
(2) present invention, which is able to solve present in existing method, is provided to upper layer application interface, conveys prefetched instruction
The problem of, (6) are arrived due to using step (1) in the present invention, whole process is isolated with upper layer application, the finger without upper layer application
Show, accordingly, it is not necessary to provide upper layer application for conveying the interface of prefetched instruction.
(3) it is big to be able to solve predictablity rate under different access mode present in existing method performance difference by the present invention
Problem: arriving (5) due to using step (2) in the present invention, by detecting cache hit situation, by sequential prediction and is based on LSTM
Seq2Seq prediction combine, to adapt to different access modules, therefore, under different access modules, the accuracy rate that prefetches
All perform well.
(4) present invention carries out the prediction of I/O sequence by the Seq2Seq model based on LSTM, more than calculating vector
Chordal distance not only improves forecasting efficiency and also reduces memory overhead.
(5) present invention has not only excavated the correlation of IO, has more reached IO dimensionality reduction by IO2Vec technique drill IO vector
Purpose.Compared to the vector that simple One-Hot is indicated, when the IO vector of IO2Vec training greatly reduces model training
Time loss and memory consumption, and improve the accuracy rate of model.
(6) in the test of caching, hit rate prefetches the prefetching algorithm designed in the present invention relative to no use
For caching, averagely promotion 10%-30%;Relative to the caching for only using sequence prefetching algorithm, also there is the promotion of 3%-10%.
The present invention optimizes the method for deep learning applied to cache prefetching, and is experimentally confirmed its validity, at this
The innovative meaning of technical field.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (9)
1. a kind of block grade cache prefetching optimization method based on deep learning, which comprises the following steps:
(1) I/O data as unit of byte is obtained from test data set, converts IO number in blocks for the I/O data
According to;
(2) whether the I/O data after judging conversion hits in the buffer, and if it is process terminates, and otherwise enters step (3);
(3) sequential prediction is carried out to the I/O data after conversion, to obtain multiple memory blocks;
(4) I/O data after conversion is stored in the IO queue of memory, and judges whether the IO queue has expired, if it is into
Enter step (5), otherwise return step (1);
(5) all I/O datas in the IO queue of memory are inputted in the trained Seq2Seq model based on LSTM, to obtain
The I/O data of prediction, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction;
(6) according to cache replacement algorithm, and the memory block in caching is replaced using the memory block that step (3) and step (5) obtain.
2. according to claim 1 piece of grade cache prefetching optimization method, which is characterized in that the IO number after step (1) conversion
According to being indicated jointly with its logical block address LBA and the storage block number BlockNum of the I/O data.
3. according to claim 2 piece of grade cache prefetching optimization method, which is characterized in that obtained according to the I/O data of prediction
Corresponding multiple data blocks are to parse the I/O data first, to obtain its logical block address LBA and its storage block number
BlockNum, and then obtain all memory blocks included by the I/O data of the prediction.
4. according to claim 1 piece of grade cache prefetching optimization method, which is characterized in that judge whether I/O data is caching
Middle hit is to check whether all memory blocks of I/O data hit in the buffer, if it is indicates that the I/O data is ordered in the buffer
In, otherwise indicate that the I/O data is not hit in the buffer.
5. according to claim 1 piece of grade cache prefetching optimization method, which is characterized in that carry out sequential prediction to I/O data
It is the memory block for predicting to access according to the logical address of the last one memory block in the I/O data.
6. according to claim 1 piece of grade cache prefetching optimization method, which is characterized in that step is based on used in (5)
The Seq2Seq model of LSTM is through the following steps that training obtained:
(5-1) obtains training dataset, concentrates the I/O data as unit of byte to be converted into blocks the training data
I/O data;
All I/O datas that (5-2) is obtained after converting to step (5-1) carry out segment processing, to obtain the IO number after multiple segmentations
According to, and store the I/O data after all segmentations;
(5-3) is handled the I/O data after all segmentations using term vector generating algorithm, with obtain I/O data it is corresponding to
Amount;
(5-4) carries out non-overlapping sequence sampling to the I/O data after all segmentations, to obtain multiple sequence samples;
Corresponding to I/O data after the segmentation that (5-5) is obtained according to the Seq2Seq model based on LSTM and using step (5-3)
Vector carries out repetition training to multiple sequence samples that step (5-4) obtains, until reaching default the number of iterations.
7. according to claim 6 piece of grade cache prefetching optimization method, which is characterized in that be specifically to make in step (5-3)
Continuous bag of words in word2vec algorithm, the dimension of vector are 50.
8. according to claim 1 piece of grade cache prefetching optimization method, which is characterized in that the caching that step (6) uses replaces
Scaling method can be lru algorithm, FIFO algorithm or ARC algorithm.
9. a kind of block grade cache prefetching optimization system based on deep learning characterized by comprising
First module converts the I/O data to store for obtaining the I/O data as unit of byte from test data set
Block is the I/O data of unit;
Second module, for judging whether the I/O data after conversion hits in the buffer, if it is process terminates, and otherwise enters
Third module;
Third module, for carrying out sequential prediction to the I/O data after conversion, to obtain multiple memory blocks;
4th module for the I/O data after conversion to be stored in the IO queue of memory, and judges whether the IO queue has expired,
If yes then enter the 5th module, the first module is otherwise returned;
5th module inputs the trained Seq2Seq mould based on LSTM for all I/O datas in the IO queue by memory
In type, with the I/O data predicted, and corresponding multiple memory blocks are obtained according to the I/O data of the prediction;
6th module is used for according to cache replacement algorithm, and is replaced and delayed using the memory block that third module and the 5th module obtain
Memory block in depositing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910526384.3A CN110245094B (en) | 2019-06-18 | 2019-06-18 | Block-level cache prefetching optimization method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910526384.3A CN110245094B (en) | 2019-06-18 | 2019-06-18 | Block-level cache prefetching optimization method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110245094A true CN110245094A (en) | 2019-09-17 |
CN110245094B CN110245094B (en) | 2020-12-29 |
Family
ID=67887848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910526384.3A Active CN110245094B (en) | 2019-06-18 | 2019-06-18 | Block-level cache prefetching optimization method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245094B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114065947A (en) * | 2021-11-15 | 2022-02-18 | 深圳大学 | Data access speculation method and device, storage medium and electronic equipment |
TWI763168B (en) * | 2019-12-30 | 2022-05-01 | 大陸商上海商湯智能科技有限公司 | Data processing method and apparatus, computer device, storage medium |
CN116822657A (en) * | 2023-08-25 | 2023-09-29 | 之江实验室 | Method and device for accelerating model training, storage medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786717A (en) * | 2016-03-22 | 2016-07-20 | 华中科技大学 | DRAM (dynamic random access memory)-NVM (non-volatile memory) hierarchical heterogeneous memory access method and system adopting software and hardware collaborative management |
CN106021128A (en) * | 2016-05-31 | 2016-10-12 | 东南大学—无锡集成电路技术研究所 | Data prefetcher based on correlation of strides and data and prefetching method of data prefetcher |
CN106681990A (en) * | 2015-11-05 | 2017-05-17 | 华中科技大学 | Method for reading caching data under mobile cloud storage environment |
US20180315158A1 (en) * | 2017-04-28 | 2018-11-01 | Intel Corporation | Programmable coarse grained and sparse matrix compute hardware with advanced scheduling |
CN109639760A (en) * | 2018-11-02 | 2019-04-16 | 西北工业大学 | It is a kind of based on deeply study D2D network in cache policy method |
US20190130101A1 (en) * | 2018-12-27 | 2019-05-02 | Li Chen | Methods and apparatus for detecting a side channel attack using hardware performance counters |
CN109788566A (en) * | 2019-01-18 | 2019-05-21 | 南京邮电大学 | Network resource allocation method based on depth enhancing study |
-
2019
- 2019-06-18 CN CN201910526384.3A patent/CN110245094B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106681990A (en) * | 2015-11-05 | 2017-05-17 | 华中科技大学 | Method for reading caching data under mobile cloud storage environment |
CN105786717A (en) * | 2016-03-22 | 2016-07-20 | 华中科技大学 | DRAM (dynamic random access memory)-NVM (non-volatile memory) hierarchical heterogeneous memory access method and system adopting software and hardware collaborative management |
CN106021128A (en) * | 2016-05-31 | 2016-10-12 | 东南大学—无锡集成电路技术研究所 | Data prefetcher based on correlation of strides and data and prefetching method of data prefetcher |
US20180315158A1 (en) * | 2017-04-28 | 2018-11-01 | Intel Corporation | Programmable coarse grained and sparse matrix compute hardware with advanced scheduling |
CN109639760A (en) * | 2018-11-02 | 2019-04-16 | 西北工业大学 | It is a kind of based on deeply study D2D network in cache policy method |
US20190130101A1 (en) * | 2018-12-27 | 2019-05-02 | Li Chen | Methods and apparatus for detecting a side channel attack using hardware performance counters |
CN109788566A (en) * | 2019-01-18 | 2019-05-21 | 南京邮电大学 | Network resource allocation method based on depth enhancing study |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI763168B (en) * | 2019-12-30 | 2022-05-01 | 大陸商上海商湯智能科技有限公司 | Data processing method and apparatus, computer device, storage medium |
CN114065947A (en) * | 2021-11-15 | 2022-02-18 | 深圳大学 | Data access speculation method and device, storage medium and electronic equipment |
CN116822657A (en) * | 2023-08-25 | 2023-09-29 | 之江实验室 | Method and device for accelerating model training, storage medium and electronic equipment |
CN116822657B (en) * | 2023-08-25 | 2024-01-09 | 之江实验室 | Method and device for accelerating model training, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110245094B (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103488463B (en) | The renewal of branch history register is suppressed by loop-ending branch | |
CN110245094A (en) | A kind of block grade cache prefetching optimization method and system based on deep learning | |
US6611910B2 (en) | Method for processing branch operations | |
US8490065B2 (en) | Method and apparatus for software-assisted data cache and prefetch control | |
US7958317B2 (en) | Cache directed sequential prefetch | |
EP1150213B1 (en) | Data processing system and method | |
JP3618385B2 (en) | Method and system for buffering data | |
CN101002178A (en) | System, apparatus and method for issuing predictions from an inventory to access a memory | |
US20030079088A1 (en) | Prefetching mechanism for data caches | |
EP1271308A2 (en) | Apparatus and method for branch prediction based on history table | |
CN104657286B (en) | A kind of classification storage method and device | |
US20070150660A1 (en) | Inserting prefetch instructions based on hardware monitoring | |
US7047362B2 (en) | Cache system and method for controlling the cache system comprising direct-mapped cache and fully-associative buffer | |
CN111324556B (en) | Method and system for prefetching a predetermined number of data items into a cache | |
CN102707933B (en) | Method and apparatus for managing return stack | |
CN102163144A (en) | Hardware data pre-fetching method of embedded processor | |
CN104572026B (en) | Data processing method and device for being prefetched | |
US20210124586A1 (en) | Apparatus and method for handling incorrect branch direction predictions | |
CN107844380A (en) | A kind of multi-core buffer WCET analysis methods for supporting instruction prefetch | |
Feng et al. | Dynamic access distance driven cache replacement | |
CN117609110A (en) | Caching method, cache, electronic device and readable storage medium | |
US11561796B2 (en) | Linked miss-to-miss instruction prefetcher | |
CN108874690A (en) | The implementation method and processor of data pre-fetching | |
CN115934170A (en) | Prefetching method and device, prefetching training method and device, and storage medium | |
CN102662720A (en) | Optimization method of compiler of multi-issue embedded processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |