WO2022099925A1 - 自适应的面向大图的统一内存管理方法及系统 - Google Patents
自适应的面向大图的统一内存管理方法及系统 Download PDFInfo
- Publication number
- WO2022099925A1 WO2022099925A1 PCT/CN2021/072376 CN2021072376W WO2022099925A1 WO 2022099925 A1 WO2022099925 A1 WO 2022099925A1 CN 2021072376 W CN2021072376 W CN 2021072376W WO 2022099925 A1 WO2022099925 A1 WO 2022099925A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gpu
- memory
- graph
- data
- read
- Prior art date
Links
- 238000007726 management method Methods 0.000 title claims abstract description 32
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 10
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0207—Addressing or allocation; Relocation with multidimensional access, e.g. row/column, matrix
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/30—Providing cache or TLB in specific location of a processing system
- G06F2212/302—In image processor or graphics adapter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/455—Image or video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6028—Prefetching based on hints or prefetch instructions
Definitions
- the invention relates to a technology in the field of graphics processing, in particular to a method and system for dynamically configuring memory for a read strategy of large image data whose size exceeds the display memory capacity adaptively under a unified memory architecture.
- Unified memory refers to adding a unified memory space to the existing memory management method, so that programs can use a pointer to directly access the memory of the central processing unit (CPU) or the video memory of the graphics processing unit (GPU). data stored in.
- This technology enables the graphics processor to increase the available address space so that the GPU can process graph data that exceeds the memory capacity. But using this technique directly often comes with a significant performance penalty.
- the present invention proposes an adaptive, large-graph-oriented unified memory management method and system.
- the characteristics of the graph data different graph algorithms are adopted, and combined with the size of the available memory of the GPU, the unified memory can be significantly improved.
- the performance of processing large graphs exceeding the video memory capacity under the memory architecture including improving GPU bandwidth utilization, reducing the number of page faults and the overhead of dealing with page faults, and speeding up the running time of graph computing programs.
- the invention relates to an adaptive large-picture-oriented unified memory management method, which sequentially checks whether the current GPU memory is full and judges whether the size of the current graph data is in the order of priority for different types of graph data in graph computing applications. If the available memory capacity of the GPU is exceeded, the policy configuration of unified memory management is performed.
- the different types of graph data include: vertex offset (VertexOffset), vertex attribute label (VertexProperty), edge (Edge) and vertex frontier to be processed (Frontier), wherein: VertexOffset, VertexProperty, Edge are compressed sparse row formats (CSR) three arrays.
- VertexOffset vertex offset
- VertexProperty vertex attribute label
- Edge vertex frontier to be processed
- VertexOffset VertexProperty
- Edge are compressed sparse row formats (CSR) three arrays.
- the priority order refers to: the graph data structure is accessed in descending order of times during the execution of the graph algorithm, specifically: vertex properties, vertex offsets, front lines, and edges.
- the graph algorithms can be divided into traversal algorithms or computational algorithms, including but not limited to single source shortest path algorithm (SSSP), breadth-first search algorithm (BFS), page ranking algorithm (PageRank, PR), connected component algorithm. (Connected Components, CC).
- SSSP single source shortest path algorithm
- BFS breadth-first search algorithm
- PageRank page ranking algorithm
- CC connected component algorithm
- the GPU memory judgment calls cudaMemGetInfo to check the remaining capacity of the current GPU memory.
- Data Exceeded Judge whether the size of the comparison data exceeds the size of the available memory of the GPU.
- the unified memory management strategy configuration adopts but is not limited to setting the management strategy of the current graph data by calling cudaMemPrefetchAsync and cudaMemAdvise, wherein: cudaMemPrefetchAsync can move some data to GPU video memory in advance; cudaMemAdvise can set data usage for specified data Hints (Memory Usage Hint, hereinafter referred to as hints) are used to help the GPU driver control data movement in an appropriate way and improve the final performance.
- the optional data usage hints include AccessedBy and ReadMostly. These instructions are for NVIDIA's various series of GPUs, specifically:
- the invention as a whole solves the technical problem that the existing GPU does not have the ability to process a large image beyond the display memory.
- the present invention uses a unified memory technology to manage graph data, adopts a targeted management strategy for different graph data structures according to a specific priority order, and according to the size of the graph data and the relative size of GPU available memory, Types of Graph Algorithms Adjusting the management strategy of graph data significantly improves the operating efficiency of graph algorithms.
- Fig. 1 is the system schematic diagram of the present invention
- FIG. 2 is a schematic diagram of a flow chart of a memory management policy setting according to the present invention.
- an adaptive large-picture-oriented unified memory management system involved in this embodiment includes: a system parameter setting module, a data reading module, and a memory management strategy setting module, wherein: a system parameter setting module Call the CUDA programming interface to obtain the operating parameters of the memory management strategy and initialize it.
- the data reading module reads the graph data file from the memory and builds the corresponding graph data structure in the CPU memory.
- the memory management strategy setting module supports CUDA8 by calling the graph data structure.
- the application program interface of .0 sets the strategy of data pre-reading and prompting.
- the operating parameters include: memory full (GPUIsFull), currently available memory capacity of the GPU (availGPUMemSize), and read-ahead rate ⁇ .
- the initialization refers to: setting GPUIsFull to false; obtaining availGPUMemSize through cudaMemGetInfo.
- the read-ahead rate ⁇ is set to 0.5 for a traversal graph algorithm (such as BFS), and is set to 0.8 for a computational graph algorithm (such as CC).
- the APIs described to support CUDA 8.0 include, but are not limited to, those that allow the same functionality as the explicit memory copy and pinning APIs without reinstating the limitations of explicit GPU memory allocation: explicit prefetch (cudaMemPrefetchAsync) and memory Use hints (cudaMemAdvise).
- this example relates to an adaptive memory management method based on the above system, including the following steps:
- Step 1 (B0 in the figure): Obtain the initial values of the running parameters (GPUIsFull, availGPUMemSize, ⁇ ).
- Step 2 (B1, C0 in the figure): Set a memory management strategy for each data Data in the data structure (VertexProperty, VertexOffset, Frontier, Edge) in turn, and judge each data in turn:
- Step 2.1 when the value of the variable GPUIsFull is false, execute step 2.1.1; otherwise, execute step 2.1.2.
- Step 2.1.1 when the size of Data is less than availGPUMemSize, perform step 2.1.1.1; otherwise, perform step 2.1.1.2.
- Step 2.1.2 (B8 in the figure): call cudaMemAdvise to set the prompt of Data to AccessedBy; go back to step 2.
- the graph algorithm is executed under different data sets, and the execution time of the algorithm is measured, that is, starting from the GPU The total running time to the end, excluding the time of preprocessing and data transfer. During the measurement, the algorithm is repeated 5 times, and the average value of the 5 execution times is taken.
- the datasets described are multiple graph datasets of different sizes, including social network graph datasets (LiveJournal, Orkut, Twitter, Friendster), and Internet snapshot graph datasets (UK-2005, SK-2005, UK- union), where Livejournal contains 5M vertices and 69M edges, with a volume of 1.4GB; UK-union contains 133M vertices and 5.5B edges, with a volume of 110GB.
- social network graph datasets LiveJournal, Orkut, Twitter, Friendster
- Internet snapshot graph datasets UK-2005, SK-2005, UK- union
- the graph algorithms described are four graph algorithms, SSSP, BFS, PR, and CC, wherein SSSP and BFS are ergodic algorithms, and PR and CC are computational algorithms.
- BFS and SSSP the algorithm takes the first vertex in each graph dataset as the source vertex; for PR, set 0.85 as the decay coefficient and 0.01 as the fault-tolerant read.
- PR the condition for the algorithm to end running is that the algorithm converges, or the number of iterations reaches 100.
- the present invention can significantly shorten the running time of the graph calculation program.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Memory System (AREA)
Abstract
Description
Claims (6)
- 一种自适应的面向大图的统一内存管理方法,其特征在于,对图计算应用中的不同类型的图数据按照优先级顺序,依次通过GPU内存判断检查当前GPU内存是否已满、通过数据超出判断当前的图数据的大小是否超出GPU的可用内存容量再进行统一内存管理策略配置;所述的统一内存管理策略配置通过调用cudaMemPrefetchAsync和cudaMemAdvise指令对当前图数据的管理策略进行设置,其数据使用提示包括AccessedBy和ReadMostly;所述的统一内存管理策略配置具体包括:①针对顶点性质的图数据,当GPU内存已满时,设置VertexProperty的提示为AccessedBy;否则,即GPU内存未满且当VertexProperty未超出GPU的可用内存容量时,设置VertexProperty的预读量为VertexProperty的大小;当VertexProperty超出GPU的可用内存容量时,设置VertexProperty的提示为AccessedBy,并设置VertexProperty的预读量为:预读率×GPU可用内存容量,单位为字节;②针对顶点偏移的图数据,当GPU内存已满时,设置VertexOffset的提示为AccessedBy;否则,即GPU内存未满且当VertexOffset未超出GPU的可用内存容量时,设置VertexOffset的预读量为VertexOffset的大小;当VertexOffset超出GPU的可用内存容量时,设置VertexOffset的提示为AccessedBy,并设置VertexOffset的预读量为:预读率GPU可用内存容量,单位为字节;③针对前线的图数据,当GPU内存已满时,设置Frontier的提示为AccessedBy;否则,即GPU内存未满且当Frontier未超出GPU的可用内存容量时,设置Frontier的预读量为Frontier的大小;当Frontier超出GPU的可用内存容量时,设置Frontier的提示为AccessedBy,并设置Frontier的预读量为:预读率GPU可用内存容量,单位为字节;④针对边的图数据,当GPU内存已满时,设置Edge的提示为AccessedBy;否则,即GPU内存未满且当Edge未超出GPU的可用内存容量时,设置Edge的预读量为Edge的大小;当Edge超出GPU的可用内存容量时,设置Edge的提示为AccessedBy,并设置Edge的预读量为:预读率GPU可用内存容量,单位为字节。
- 根据权利要求1所述的自适应的面向大图的统一内存管理方法,其特征是,所述的图算法为遍历型算法或计算型算法,对应预读率τ对于遍历型图算法设置为0.5,对于计算型图算法设置为0.8。
- 根据权利要求1所述的自适应的面向大图的统一内存管理方法,其特征是,具体包括:步骤1:获取运行参数的初始值;步骤2:依次对图数据结构中的每个数据设置内存管理策略,针对其中每条数据依次判断:步骤2.1:当运行参数中的变量GPUIsFull的值为false时执行步骤2.1.1;否则执行步骤2.1.2;步骤2.1.1:当图数据的大小小于availGPUMemSize时执行步骤2.1.1.1;否则执行步骤2.1.1.2;步骤2.1.1.1:调用cudaMemPrefetchAsync,将图数据预取到GPU内存中;设置AvailGPUMemSize-=Data的大小;返回步骤2;步骤2.1.1.2:设置运行参数中的变量GPUIsFull的值为true时,调用cudaMemAdvise将Data的提示设为AccessedBy;调用cudaMemPrefetchAsync预取大小为运行参数中的预读率τ×availGPUMemSize大小的图数据到GPU内存中;返回步骤2;步骤2.1.2:调用cudaMemAdvise将Data的提示设为AccessedBy;返回步骤2。
- 根据权利要求3所述的自适应的面向大图的统一内存管理方法,其特征是,所述的运行参数包括:内存已满(GPUIsFull)、GPU当前可用的内存容量(availGPUMemSize)以及预读率τ。
- 根据权利要求4所述的自适应的面向大图的统一内存管理方法,其特征是,所述的初始化是指:将GPUIsFull设置为false;通过cudaMemGetInfo获取availGPUMemSize。
- 一种实现上述任一权利要求所述方法的自适应的面向大图的统一内存管理系统,其特征在于,包括:系统参数设置模块、数据读取模块、内存管理策略设置模块,其中:系统参数设置模块调用CUDA编程接口获取内存管理策略运行参数并进行初始化,数据读取模块从存储器读取图数据文件,在CPU内存中构建相应的图数据结构,内存管理策略设置模块对图数据结构通过调用支持CUDA8.0的应用程序接口设置数据的预读、提示的策略。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/040,905 US20230297234A1 (en) | 2020-11-10 | 2021-01-18 | Adaptive unified memory management method and system for large-scale graphs |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011244031.3A CN112346869B (zh) | 2020-11-10 | 2020-11-10 | 自适应的面向大图的统一内存管理方法及系统 |
CN202011244031.3 | 2020-11-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022099925A1 true WO2022099925A1 (zh) | 2022-05-19 |
Family
ID=74362382
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/072376 WO2022099925A1 (zh) | 2020-11-10 | 2021-01-18 | 自适应的面向大图的统一内存管理方法及系统 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230297234A1 (zh) |
CN (1) | CN112346869B (zh) |
WO (1) | WO2022099925A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999946A (zh) * | 2012-09-17 | 2013-03-27 | Tcl集团股份有限公司 | 一种3d图形数据处理方法、装置及设备 |
CN104835110A (zh) * | 2015-04-15 | 2015-08-12 | 华中科技大学 | 一种基于gpu的异步图数据处理系统 |
US20160125566A1 (en) * | 2014-10-29 | 2016-05-05 | Daegu Gyeongbuk Institute Of Science And Technology | SYSTEM AND METHOD FOR PROCESSING LARGE-SCALE GRAPHS USING GPUs |
CN110187968A (zh) * | 2019-05-22 | 2019-08-30 | 上海交通大学 | 异构计算环境下的图数据处理加速方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8373710B1 (en) * | 2011-12-30 | 2013-02-12 | GIS Federal LLC | Method and system for improving computational concurrency using a multi-threaded GPU calculation engine |
KR20130093995A (ko) * | 2012-02-15 | 2013-08-23 | 한국전자통신연구원 | 계층적 멀티코어 프로세서의 성능 최적화 방법 및 이를 수행하는 멀티코어 프로세서 시스템 |
US9430400B2 (en) * | 2013-03-14 | 2016-08-30 | Nvidia Corporation | Migration directives in a unified virtual memory system architecture |
CN103226540B (zh) * | 2013-05-21 | 2015-08-19 | 中国人民解放军国防科学技术大学 | 基于分组多流的gpu上多区结构网格cfd加速方法 |
US9400767B2 (en) * | 2013-12-17 | 2016-07-26 | International Business Machines Corporation | Subgraph-based distributed graph processing |
US9886736B2 (en) * | 2014-01-20 | 2018-02-06 | Nvidia Corporation | Selectively killing trapped multi-process service clients sharing the same hardware context |
CN111930498B (zh) * | 2020-06-29 | 2022-11-29 | 苏州浪潮智能科技有限公司 | 一种高效的gpu资源分配优化方法和系统 |
-
2020
- 2020-11-10 CN CN202011244031.3A patent/CN112346869B/zh active Active
-
2021
- 2021-01-18 WO PCT/CN2021/072376 patent/WO2022099925A1/zh active Application Filing
- 2021-01-18 US US18/040,905 patent/US20230297234A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999946A (zh) * | 2012-09-17 | 2013-03-27 | Tcl集团股份有限公司 | 一种3d图形数据处理方法、装置及设备 |
US20160125566A1 (en) * | 2014-10-29 | 2016-05-05 | Daegu Gyeongbuk Institute Of Science And Technology | SYSTEM AND METHOD FOR PROCESSING LARGE-SCALE GRAPHS USING GPUs |
CN104835110A (zh) * | 2015-04-15 | 2015-08-12 | 华中科技大学 | 一种基于gpu的异步图数据处理系统 |
CN110187968A (zh) * | 2019-05-22 | 2019-08-30 | 上海交通大学 | 异构计算环境下的图数据处理加速方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112346869B (zh) | 2021-07-13 |
US20230297234A1 (en) | 2023-09-21 |
CN112346869A (zh) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991011B (zh) | 基于cpu多线程与gpu多粒度并行及协同优化的方法 | |
KR101253012B1 (ko) | 이종 플랫폼에서 포인터를 공유시키는 방법 및 장치 | |
US8196147B1 (en) | Multiple-processor core optimization for producer-consumer communication | |
US20080005473A1 (en) | Compiler assisted re-configurable software implemented cache | |
US11720496B2 (en) | Reconfigurable cache architecture and methods for cache coherency | |
JP2015505091A (ja) | キャッシュのプレローディングにgpuコントローラを使用するための機構 | |
WO2009123492A1 (en) | Optimizing memory copy routine selection for message passing in a multicore architecture | |
WO2022233195A1 (zh) | 神经网络权值存储方法、读取方法及相关设备 | |
CN113590508A (zh) | 动态可重构的内存地址映射方法及装置 | |
US8380962B2 (en) | Systems and methods for efficient sequential logging on caching-enabled storage devices | |
CN116227599A (zh) | 一种推理模型的优化方法、装置、电子设备及存储介质 | |
JPH05274252A (ja) | コンピュータシステムにおけるトランザクション実行方法 | |
CN112801856B (zh) | 数据处理方法和装置 | |
CN113448897B (zh) | 适用于纯用户态远端直接内存访问的优化方法 | |
WO2022099925A1 (zh) | 自适应的面向大图的统一内存管理方法及系统 | |
CN116438543A (zh) | 数据和模型并行化中的共享存储器空间 | |
US20230126783A1 (en) | Leveraging an accelerator device to accelerate hash table lookups | |
CN116132369A (zh) | 云网关服务器中多网口的流量分发方法及相关设备 | |
TWI718634B (zh) | 卷積神經網路的特徵圖存取方法及其系統 | |
Feng et al. | Understanding Scalability of Multi-GPU Systems | |
CN112487352B (zh) | 可重构处理器上快速傅里叶变换运算方法及可重构处理器 | |
US11915138B2 (en) | Method and device for reducing a size of a neural network model | |
US20240054179A1 (en) | Systems and methods for inference system caching | |
US20240070107A1 (en) | Memory device with embedded deep learning accelerator in multi-client environment | |
CN108762666B (zh) | 一种存储系统的访问方法、系统、介质及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21890463 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21890463 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21890463 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.01.2024) |