CN114637609B - Data acquisition system of GPU (graphic processing Unit) based on conflict detection - Google Patents

Data acquisition system of GPU (graphic processing Unit) based on conflict detection Download PDF

Info

Publication number
CN114637609B
CN114637609B CN202210546502.9A CN202210546502A CN114637609B CN 114637609 B CN114637609 B CN 114637609B CN 202210546502 A CN202210546502 A CN 202210546502A CN 114637609 B CN114637609 B CN 114637609B
Authority
CN
China
Prior art keywords
data acquisition
data
cache
requests
address information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210546502.9A
Other languages
Chinese (zh)
Other versions
CN114637609A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltd filed Critical Muxi Integrated Circuit Shanghai Co ltd
Priority to CN202210546502.9A priority Critical patent/CN114637609B/en
Publication of CN114637609A publication Critical patent/CN114637609A/en
Application granted granted Critical
Publication of CN114637609B publication Critical patent/CN114637609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/065Partitioned buffers, e.g. allowing multiple independent queues, bidirectional FIFO's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a data acquisition system of a GPU (graphics processing Unit) based on conflict detection, which comprises a conflict detection module and P caches { C) positioned in the GPU 1 ,C 2 ,…C P In which C is p Represents the p-th cache; the conflict detection module is used for acquiring M first data acquisition requests, fusing the first data acquisition requests with the same virtual address information through conflict detection to generate N second data acquisition requests, and directionally outputting each second data acquisition request to a corresponding cache C p Corresponding second get data request queue L p Is { R 1 p ,R 2 p ,…,R pS p }; p caches are all connected with the memory, and the P caches process corresponding second data acquisition requests in parallel, C p For allocating pS cycles, for each R i p Allocate a cycle to process L one by one p Each R in (1) i p . The invention improves the data acquisition efficiency of the GPU.

Description

Data acquisition system of GPU (graphic processing Unit) based on conflict detection
Technical Field
The invention relates to the technical field of GPU data processing, in particular to a data acquisition system of a GPU based on conflict detection.
Background
When a GPU (Graphics Processing Unit) accesses data of a memory (memory), a Scheduler (Scheduler) typically sends a request for obtaining data (request), and the data is obtained from the memory based on the request for obtaining data. Since the speed of directly obtaining data from the external memory is slow and cannot be matched with the processing speed of the scheduler, a cache (cache) matched with the processing speed of the scheduler is usually arranged between the scheduler and the memory, and corresponding data is obtained through the cache. In order to improve the data acquisition efficiency, the scheduler generally issues multiple data access requests at the same time, but since the cache can only acquire data from one cache line in one cycle, when the scheduler issues multiple data acquisition requests and the data requests correspond to multiple cache lines, a conflict exists, multiple cycles are required to complete the data access operation, and the data acquisition efficiency is low.
Disclosure of Invention
The invention aims to provide a data acquisition system of a GPU based on conflict detection, and the data acquisition efficiency of the GPU is improved.
The invention provides a data acquisition system of a GPU (graphics processing Unit) based on conflict detection, which comprises a conflict detection module and P caches { C) positioned in the GPU 1 ,C 2 ,…C P In which C is p Representing the pth cache, wherein the value range of P is 1 to P, and P is a positive integer greater than or equal to 2;
the conflict detection module is used for acquiring M first data acquisition requests, fusing the first data acquisition requests with the same virtual address information through conflict detection to generate N second data acquisition requests, wherein N is not more than M, and directionally outputting each second data acquisition request to a corresponding cache C p Corresponding second get data request queue L p Is { R 1 p ,R 2 p ,…, R pS p }, pS is C p Corresponding second quantity of requests for data, R i p Represents L p In the ith second data acquisition request, pS is more than or equal to 0 and less than or equal to N, and the value range of i is from 1 to pS;
the P caches are all connected with the memory, and the P caches process corresponding second data acquisition requests in parallel, C p For allocating pS cycles, for each R i p Allocate a cycle to process L one by one p Each R in (1) i p
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the data acquisition system of the GPU based on the conflict detection can achieve considerable technical progress and practicability, has industrial wide utilization value and at least has the following advantages:
according to the method and the device, through the conflict detection of the M first data acquisition requests, the first data acquisition requests with the same virtual address information are fused, the number of the data acquisition requests is reduced, P parallel caches are set, N second data acquisition requests are processed in parallel, and the data acquisition efficiency of the GPU is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic diagram of a data acquisition system of a GPU based on collision detection according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a GPU data processing system based on a multi-input single-output FIFO structure according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description will be given with reference to the accompanying drawings and preferred embodiments of a data acquisition system of a GPU based on collision detection and its effects.
The first embodiment,
An embodiment of the present invention provides a data acquisition system of a GPU based on collision detection, as shown in fig. 1, including a collision detection module and P caches { C) located in the GPU 1 ,C 2 ,…C P In which C p And representing the pth cache, wherein the value range of P is 1 to P, and P is a positive integer greater than or equal to 2.
The conflict detection module is used for acquiring M first data acquisition requests, fusing the first data acquisition requests with the same virtual address information through conflict detection to generate N second data acquisition requests, wherein N is less than or equal to M, and counting each second acquisition numberDirectionally outputting to corresponding cache according to the request, C p Corresponding second get data request queue L p Is { R 1 p ,R 2 p ,…, R pS p }, pS is C p Corresponding second quantity of requests for data, R i p Represents L p And in the ith second data acquisition request, pS is more than or equal to 0 and less than or equal to N, and the value range of i is from 1 to pS. As shown in the example of fig. 1, M first get data requests may be issued by the scheduler. It should be noted that the caches can only process one cache line in one cycle, and the first data obtaining requests with the same virtual address information need to access the same cache line, so that the first data obtaining requests with the same virtual address information are fused, the number of the second data obtaining requests is reduced, the number of the required cycles is reduced, and the data processing efficiency of the GPU is improved.
The P caches are all connected with the memory, and the P caches process corresponding second data acquisition requests in parallel, C p For allocating pS cycles, for each R i p Allocate a cycle to process L one by one p Each R in (1) i p . It should be noted that the memory may be a memory in the GPU, or may be a memory outside the GPU, for example, a memory in a Central Processing Unit (CPU), depending on a specific application scenario.
As an embodiment, the first get data request includes a target byte number, offset information, cache identification information, and virtual address information. The target BYTE number may be set by setting access format information, for example, if the format is BYTE, the target BYTE number corresponds to one BYTE; if the format is WORD, the format corresponds to two bytes; the format is DWORD, four bytes, and so on. The offset information, the cache identification information and the virtual address information are stored into U-bit data according to a preset format, wherein the 0-a bit is used for storing the offset information, the a +1 to b bits are used for storing the cache identification information, the b +1 to U-1 bits are used for storing the virtual address information, and the a and the b satisfy 2 a =W,2 b-a-1 And = P, W is the bit width of the cache. Further illustrated is a specific example:the cache may specifically be set to have a bit width of 128 bytes, a depth of 64 lines, and a U of 48, which corresponds to 0-47 bits, and then, the 0-6 bits may specifically be set as an offset, and when the corresponding cache line is located, the start position of the acquired data may be determined based on the offset. Assuming that 4 caches are set, i.e., P =4, and the scheduler issues 16 first data acquisition requests, i.e., M =16, each time, then 7-8 bits may be set as cache identification information, and 9-47 bits may be set as virtual address information, so that the upper bits may be set as address information, and the lower bits may be set as offset information, which is convenient for data acquisition.
As an embodiment, the conflict detection module includes Y comparators, a fusion unit, and a distribution unit, where Y is a combined number of any combinations of every two of the M first data acquisition requests, and it should be noted that the number of Y may be directly obtained by calculation based on permutation and combination, and is not described herein again. Each comparator is used for inputting a group of first data acquisition requests to perform conflict detection, and specifically comprises the steps of comparing whether the virtual address information in the group of first data acquisition requests is the same or not, and outputting a judgment result to the fusion unit. It should be noted that, if the virtual address information is the same, it indicates that the corresponding first get data request does not conflict, and if the virtual address information is different, it indicates that the corresponding first get data request conflicts. For example, two input ports of a first comparator are used for inputting a first get data request and a second first get data request for comparison, a second comparator is used for inputting a first get data request and a third first get data request for comparison, and so on.
As an embodiment, the merging unit is configured to merge all first data obtaining requests with the same virtual address information to generate a second data obtaining request, and output the second data obtaining request to the allocating unit, where the second data obtaining request includes offset information corresponding to all the merged first data obtaining requests, and shares the same cache identification information and virtual address information, so that multiple first data obtaining requests can be merged into one second data obtaining request, the number of data obtaining requests is reduced, the number of cycles required for data processing is reduced, and the data processing efficiency of the GPU is improved. It should be noted that, in the M first get data requests, there may be a case where the virtual address information is different from that in any other first get data request, and therefore, the merging unit is further configured to directly generate a first get data request with different virtual address information from that in any other first get data request as a corresponding second get data request, that is, directly take the information in the first get data request as the information in the corresponding second get data request, and output the information to the allocating unit.
The distribution unit is used for transmitting the second data acquisition requests to the corresponding caches based on the cache identification information in each second data acquisition request, and the N second data acquisition requests can be divided into P paths through the distribution unit.
As an embodiment, in the P caches, each cache corresponds to one physical address storage interval and is used for acquiring data corresponding to a physical address in the corresponding physical address storage interval from the memory, and the P physical address storage intervals are not overlapped. It can be understood that, based on the correspondence between each cache and the physical address storage interval, when the upstream device issues the first data obtaining request, the upstream device may directly specify the corresponding cache. Each physical address storage interval comprises a plurality of physical addresses, the cache comprises a plurality of cache lines, each cache line is used for storing an address identifier and data corresponding to one physical address, namely each cache line stores data corresponding to one physical address, and the address identifier is physical address information.
As a preferred example, the widths and depths of the P caches are the same, for example, the P caches are all set to be caches with widths of 128 bytes and 64 lines, and the widths and depths of the P caches are the same, which is convenient for hardware design and layout, and can also improve GPU data processing efficiency.
As an example, C p The method is specifically used for:
at C p In the ith cycle of the allocation, based on R i p Judging the current C according to the physical address information corresponding to the medium virtual address information p If the corresponding data is stored in the cache line, positioning to a starting point in the corresponding cache line based on the offset information, and acquiring data of the target byte number from the starting point. It should be noted that, technologies for mapping a virtual address to a physical address in the prior art all fall within the protection scope of the present application, and are not described herein again. If the physical address information does not exist, generating a third data acquisition request based on the corresponding physical address information, acquiring corresponding data from the memory based on the third data acquisition request, and storing the corresponding data in the memory to the corresponding C p And in the corresponding cache line, positioning to a starting point in the corresponding cache line based on the offset information, and acquiring data of the number of target bytes from the starting point. As will be understood by those skilled in the art, the existing method for obtaining the corresponding data from the memory based on the third data obtaining request is stored in the corresponding C p The implementation manners in the corresponding cache lines all fall within the protection scope of the present invention, and are not described herein again.
It can be understood that the initial state of the cache does not store data, but as the system runs, the cache may continuously store data and may reach a full state, as an embodiment, in C p Acquiring corresponding data from the memory based on the third data acquisition request and storing the corresponding data in the corresponding C p In the process of the corresponding cache line, C p And is also used for: judging the current C p And if the cache line is full, selecting one cache line according to the storage time and/or the use frequency, clearing the data in the cache line, and then acquiring corresponding data from the memory and storing the corresponding data into the cache line. Specifically, the cache line to be cleared may be selected only according to the storage time, that is, the cache line with the longest storage time is cleared. The cache line to be cleared can also be selected only according to the use frequency, namely the cache line with the lowest use frequency within the current preset time period is cleared. Or taking both the storage time and the use frequency into consideration, setting corresponding weights to measure, selecting cache lines to be cleared, and directly adopting the existing LRU (least recent utilized) algorithm to replacecache line。
In the first embodiment, through collision detection of the M first data obtaining requests, the first data obtaining requests with the same virtual address information are fused, so that the number of the data obtaining requests is reduced, P parallel caches are set, and N second data obtaining requests are processed in parallel, thereby improving the data obtaining efficiency of the GPU.
Example II,
In the first embodiment, when the P caches are in parallel processing, a situation that none of the multiple caches hit (miss) may occur, and in this situation, the P caches may output multiple third data obtaining requests in parallel in one cycle, and the multiple third data obtaining requests that are output in parallel are simultaneously stored in one FIFO, and those skilled in the art can understand that, in the data processing of the GPU, besides the scenario of the first embodiment, there may be other scenarios that require parallel input of request information in one FIFO.
The second embodiment of the present invention provides a GPU data processing system based on a multi-input single-output FIFO structure, as shown in fig. 2, including a mapper, a FIFO, and a write pointer, it can be understood that, based on the first embodiment, the mapper, the FIFO, and the write pointer are located between P caches and memory, and are all located in the GPU. The mapper comprises P input ports and P output ports { E 1 ,E 2 ,…E P },E p Representing the P-th output port, wherein the value range of P is 1 to P, the P input ports are used for inputting Q third acquired data requests in parallel and mapping the Q third acquired data requests to the first Q output ports { E 1 ,E 2 ,…E Q Q is less than or equal to P, and { E ≦ P 1 ,E 2 ,…E Q Q third data in the FIFO are stored in the FIFO.
Based on the first embodiment, as an example, the mapper is connected to P parallel processing caches, an output port of each cache is connected to an input port of the mapper, and in a cycle, when there are Q cache miss data, Q caches input corresponding third data obtaining requests to corresponding input ports in parallel. The technical details of the third fetch data request output by the P parallel-processing caches are described in one embodiment and will not be expanded here.
Based on the first embodiment, as an example, the third data obtaining request includes physical address information, and the FIFO is used for outputting the third data obtaining request in the FIFO to the memory, and obtaining corresponding data from the memory based on the physical address information and storing the corresponding data in the corresponding cache. The relevant technical details in the first embodiment are not expanded here.
The FIFO is a multi-input single-output FIFO and is used for inputting Q third data acquisition requests in parallel, the third data acquisition requests in the single output FIFO, that is, storing corresponding information in Q lines of the FIFO in parallel, the output is the same as that of the existing FIFO, and reading can be performed line by line, which is not described herein again.
The write pointer always points to the next line of the current FIFO to be stored with data, and it can be understood that the next line of the current FIFO to be stored with data refers to the line that is stored the closest to the current time, and after the storage is finished, the next line that can be stored with coarse data. The line value pointed by the current write pointer is WR, and after the mapper stores Q third data acquisition requests into the FIFO in parallel, the WR is updated. As an example, the write pointer is always set at E 1 The above.
As an embodiment, the mapper is further configured to determine whether the number of remaining storable lines in the current FIFO is greater than or equal to the current Q value, directly execute C1 if the number of remaining storable lines in the current FIFO is greater than or equal to the current Q value, otherwise, execute C1 when the number of remaining storable lines in the current FIFO is greater than or equal to the current Q value:
c1, output port E j The third data acquisition request is stored to the WR + j-1 th line of the FIFO, the value range of j is 1 to Q, whether WR + Q is larger than T or not is judged, if WR + Q is larger than T, WR = WR + Q-T is set, and if WR + Q is smaller than or equal to T, WR + Q is setSet WR = WR + Q, where T is the depth of the FIFO.
As an example, the P input ports are { F } 1 ,F 2 ,…F P },F p Represents the P-th input ports for inputting Q third get data requests in parallel and mapping the Q third get data requests to the first Q output ports { E } 1 ,E 2 ,…E Q The method specifically comprises the following steps:
s1, if { F 1 ,F 2 ,…F P When a third data acquisition request is input in all the data acquisition requests, P = Q, E p =F p Otherwise, go to S2;
s2, according to F 1 ,F 2 ,…F P Is sequentially traversed to map a third get data request from the y-th input port having a third get data request to E y Y ranges from 1 to Q, Q<P。
To illustrate a specific example, assume P =4, F 2 And F 4 If the input port inputs the corresponding third data acquisition request, F is set 2 Is mapped to output port E 1 Will F 4 Mapping of the third get data request of the input port to output port E 2 Output port E 1 And E 2 To input port F 2 And F 4 In parallel with the third get data request into the FIFO, wherein E 1 F is to be 2 Into the WR line of the FIFO, E 2 F is to be 2 Into the WR +1 th line of the FIFO.
According to the embodiment of the invention, by setting the mapper, the FIFO with multiple input and single output ports and the write pointer, Q pieces of third acquired data request information which are processed in parallel can be input into the FIFO in parallel, so that the blockage of any third acquired data request information acquisition channel is avoided, and the data acquisition efficiency of the GPU is improved.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A data acquisition system for a GPU based on collision detection,
comprises a conflict detection module positioned in a GPU and P caches { C 1 ,C 2 ,…C P In which C is p Representing the pth cache, wherein the value range of P is 1 to P, and P is a positive integer greater than or equal to 2;
the conflict detection module is used for acquiring M first data acquisition requests, fusing the first data acquisition requests with the same virtual address information through conflict detection to generate N second data acquisition requests, wherein N is not more than M, and directionally outputting each second data acquisition request to a corresponding cache C p Corresponding second get data request queue L p Is { R 1 p ,R 2 p ,…, R pS p }, pS is C p Corresponding second quantity of requests for data, R i p Represents L p In the ith second data acquisition request, pS is more than or equal to 0 and less than or equal to N, and the value range of i is from 1 to pS;
the P caches are all connected with the memory, and the P caches process corresponding second data acquisition requests in parallel, C p For allocating pS cycles, for each R i p Allocating a cycle to process L one by one p Each R in (1) i p
The first data acquisition request comprises the number of target bytes, offset information, cache identification information and virtual address information;
the offset information, the cache identification information and the virtual address information are stored into U-bit data according to a preset format, wherein the 0-a bitUsed for storing offset information, a +1 to b bits are used for storing cache identification information, b +1 to U-1 bits are used for storing virtual address information, and a and b satisfy 2 a =W,2 b-a-1 = P, W is the bit width of the cache.
2. The system of claim 1,
the conflict detection module comprises Y comparators, a fusion unit and a distribution unit, wherein Y is the combination number of M first data acquisition requests randomly combined in pairs, each comparator is used for inputting a group of first data acquisition requests to carry out conflict detection, and specifically comprises the steps of comparing whether the virtual address information in the group of first data acquisition requests is the same or not and outputting the judgment result to the fusion unit;
the fusion unit is used for fusing all first data acquisition requests with the same virtual address information to generate a second data acquisition request, and outputting the second data acquisition request to the distribution unit, wherein the second data acquisition request comprises corresponding offset information in all the fused first data acquisition requests and shares the same cache identification information and virtual address information; the fusion unit is further used for directly generating a first data acquisition request with different virtual address information from other first data acquisition requests into a corresponding second data acquisition request and outputting the second data acquisition request to the distribution unit;
the distribution unit is used for transmitting the second data acquisition requests to the corresponding caches based on the cache identification information in each second data acquisition request.
3. The system of claim 1,
in the P caches, each cache corresponds to a physical address storage interval and is used for acquiring data corresponding to physical addresses in the corresponding physical address storage intervals from the memory, and the P physical address storage intervals are not overlapped;
each physical address storage interval comprises a plurality of physical addresses, the cache comprises a plurality of cache lines, and each cache line is used for storing address identification and data corresponding to one physical address.
4. The system of claim 1,
the width and the depth of the P caches are the same.
5. The system of claim 1,
C p the method is specifically used for:
at C p In the ith cycle of the allocation, based on R i p The physical address information corresponding to the medium virtual address information judges the current C p If the corresponding data is stored, positioning to a starting point in the corresponding cache line based on the offset information, and acquiring data of the number of target bytes from the starting point; if the physical address information does not exist, generating a third data acquisition request based on the corresponding physical address information, acquiring corresponding data from the memory based on the third data acquisition request, and storing the corresponding data in the memory to the corresponding C p And in the corresponding cache line, positioning to a starting point in the corresponding cache line based on the offset information, and acquiring data of the target byte quantity from the starting point.
6. The system of claim 5,
at C p Acquiring corresponding data from the memory based on a third data acquisition request and storing the corresponding data in a corresponding C p In the process of the corresponding cache line, C p And is also used for:
judging the current C p And if the cache line is full, selecting one cache line according to the storage time and/or the use frequency, clearing the data in the cache line, and then acquiring corresponding data from the memory and storing the corresponding data into the cache line.
CN202210546502.9A 2022-05-20 2022-05-20 Data acquisition system of GPU (graphic processing Unit) based on conflict detection Active CN114637609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210546502.9A CN114637609B (en) 2022-05-20 2022-05-20 Data acquisition system of GPU (graphic processing Unit) based on conflict detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210546502.9A CN114637609B (en) 2022-05-20 2022-05-20 Data acquisition system of GPU (graphic processing Unit) based on conflict detection

Publications (2)

Publication Number Publication Date
CN114637609A CN114637609A (en) 2022-06-17
CN114637609B true CN114637609B (en) 2022-08-12

Family

ID=81953030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210546502.9A Active CN114637609B (en) 2022-05-20 2022-05-20 Data acquisition system of GPU (graphic processing Unit) based on conflict detection

Country Status (1)

Country Link
CN (1) CN114637609B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209681A (en) * 2015-10-21 2017-09-26 华为技术有限公司 A kind of storage device access methods, devices and systems
CN114428749A (en) * 2022-04-07 2022-05-03 沐曦科技(北京)有限公司 Detector for verifying cache

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6011194B2 (en) * 2012-09-21 2016-10-19 富士通株式会社 Arithmetic processing device and control method of arithmetic processing device
CN103955435B (en) * 2014-04-09 2017-01-11 上海理工大学 Method for establishing access by fusing multiple levels of cache directories
CN104461400B (en) * 2014-12-25 2017-09-19 浪潮(北京)电子信息产业有限公司 The method and apparatus for handling access request conflict
CN105550979A (en) * 2015-12-11 2016-05-04 中国航空工业集团公司西安航空计算技术研究所 High-data-throughput texture cache hierarchy structure
CN106776377B (en) * 2016-12-12 2020-04-28 中国航空工业集团公司西安航空计算技术研究所 Address merging processing circuit for concurrently reading multiple memory units
CN106683158B (en) * 2016-12-12 2020-06-09 中国航空工业集团公司西安航空计算技术研究所 Modeling system of GPU texture mapping non-blocking storage Cache
US11194722B2 (en) * 2018-03-15 2021-12-07 Intel Corporation Apparatus and method for improved cache utilization and efficiency on a many core processor
CN110457238B (en) * 2019-07-04 2023-01-03 中国民航大学 Method for slowing down GPU (graphics processing Unit) access request and pause when instructions access cache

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209681A (en) * 2015-10-21 2017-09-26 华为技术有限公司 A kind of storage device access methods, devices and systems
CN114428749A (en) * 2022-04-07 2022-05-03 沐曦科技(北京)有限公司 Detector for verifying cache

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YHFT-DSPX片内存储器的设计与实现;肖利亚;《中国优秀硕士学位论文全文数据库》;20120415;第2012卷(第04期);全文 *
面向多兴趣区域图像处理应用的高效无冲突并行访问存储模型;徐金波等;《计算机学报》;20081115;第31卷(第11期);全文 *

Also Published As

Publication number Publication date
CN114637609A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN101751246B (en) The system and method for prefetch data
CN103218309B (en) The method of multistage instruction cache prefetch, system and equipment
US9348762B2 (en) Technique for accessing content-addressable memory
US20190266193A1 (en) Data processing method for bloom filter, and bloom filter
US20130046951A1 (en) Parallel dynamic memory allocation using a nested hierarchical heap
US20130198419A1 (en) Lock-free fifo
EP0507066A1 (en) Ownership interlock for cache data units
CN105677580A (en) Method and device for accessing cache
JP2010134929A (en) Compression status bit cache and backing storage device
CN101918925B (en) Second chance replacement mechanism for a highly associative cache memory of a processor
TW200933524A (en) Memory systems, memory accessing methods and graphic processing systems
US6745292B1 (en) Apparatus and method for selectively allocating cache lines in a partitioned cache shared by multiprocessors
US20110072438A1 (en) Fast mapping table register file allocation algorithm for simt processors
CN106528049A (en) Branch predictor for updating generation of random number of events in multi-memory-bank conditional branch predictor
US8151058B2 (en) Vector computer system with cache memory and operation method thereof
CN105446897A (en) High-speed cache Hashing
CN110297787A (en) The method, device and equipment of I/O equipment access memory
US7562204B1 (en) Identifying and relocating relocatable kernel memory allocations in kernel non-relocatable memory
EP0675443A1 (en) Apparatus and method for accessing direct mapped cache
JP2009015509A (en) Cache memory device
CN109710309B (en) Method for reducing memory bank conflict
US11016802B2 (en) Techniques for ordering atomic operations
CN114637609B (en) Data acquisition system of GPU (graphic processing Unit) based on conflict detection
KR20190030189A (en) Cache storage
CN111341374B (en) Memory test method and device and readable memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant