WO2013113206A1 - 智能缓存及智能终端 - Google Patents
智能缓存及智能终端 Download PDFInfo
- Publication number
- WO2013113206A1 WO2013113206A1 PCT/CN2012/077953 CN2012077953W WO2013113206A1 WO 2013113206 A1 WO2013113206 A1 WO 2013113206A1 CN 2012077953 W CN2012077953 W CN 2012077953W WO 2013113206 A1 WO2013113206 A1 WO 2013113206A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- data
- memory
- intelligent
- unit
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90339—Query processing by using parallel associative memories or content-addressable memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1012—Design facilitation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/601—Reconfiguration of cache memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6032—Way prediction in set-associative cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
- G06F2212/621—Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
Definitions
- the present invention relates to an intelligent cache technology, and more particularly to a flexible and configurable software-defined intelligent cache using a tight combination of software and hardware, and a smart terminal having the above intelligent cache. Background technique
- the traditional Cache Memory utilizes the temporal or spatial locality of the execution of the processor program by temporarily storing the most recent and/or frequently executed instructions and data in the Cache Memory of the processor unit, when accessing an instruction or When the data is first accessed, the Cache Memory is accessed. If the Cache Memory is missing ( Miss ), the access memory is slower and the storage space is larger.
- FIG. 1 is a schematic diagram of a typical cache structure.
- the processor core first searches for the cache in the cache. Tag array to confirm whether the required instruction or data is in the cache. Once Cache Miss, the lookup tag and data comparison operations will be invalid, and then access the next level of memory, which wastes multiple processor execution. Cycle and Cache power consumption.
- group association, complex replacement algorithm, prefetch, speculative read, and hierarchical multi-level Cache structure are usually used. Obviously, these performance enhancements are completely dependent on increasing hardware complexity and chip area. The overhead is exchanged. Since the cache structure shown in FIG. 1 is a typical cache structure, the functions of the various parts and the working principle thereof will not be described here.
- TCM Tightly Coupled Memory
- SRAM static random access memory
- Static static random access memory close to the processor core Random Access Memory
- the content of the TCM cannot be replaced in real time, the capacity is fixed, and generally small.
- the refresh of the TCM is completely dependent on the software scheduling. Before refreshing the TCM, the software needs to find out when to refresh and perform the corresponding configuration operations, and the TCM cannot be accessed during the configuration process, which limits the application of the TCM.
- CAM Content Addressable Memory
- the main purpose of the present invention is to provide an intelligent cache and an intelligent terminal, which can be flexibly defined, configured, and reconstructed by a software for a specific application, and can solve the high complexity, high overhead, high power consumption, and delay of the traditional Cache.
- the unpredictable drawbacks can also solve the problem of inefficient TCM data update, low flexibility of storage unit, and narrow application.
- An intelligent cache includes a general interface, a software definition and reconstruction unit, a control unit, a storage unit, and an intelligent processing unit; wherein:
- a general purpose interface for receiving configuration information, and/or control information, and/or data information from a kernel or bus, and returning target data;
- a software definition and reconstruction unit configured to define a memory as a required cache cache memory according to configuration information
- a control unit configured to control the read/write cache memory, and monitor the instruction or the data stream in real time, and control the storage unit to load the required data in advance according to the system information, the characteristics of the task to be executed, and the characteristics of the used data structure;
- a storage unit configured by a plurality of storage modules, for buffering data; and, according to the definition of the software definition and the reconstruction unit, the storage module is combined into a required Cache memory; the intelligent processing unit is configured to process the input
- the data is output, and the data is transferred, transformed, and operated between a plurality of structures defined in the control unit.
- the required Cache memory may be configured to include at least one of the following types of memory:
- TCM Tightly coupled memory TCM, content addressed memory CAM, cache Cache.
- the universal interface further includes a consistency interface in a multi-core environment.
- the software definition and reconstruction unit is further configured to define a Cache memory of a similar structure of a plurality of different attributes, and the same-structure Cache memory of the different attributes includes at least one of the following structure memories: a fully associated Cache, 16-way associative Cache, 4-way associative Cache,
- the software definition and reconstruction unit is further configured to dynamically reconstruct the idle storage module during the working process.
- the intelligent processing unit transfers, transforms, and calculates data between the multiple structures defined in the control unit, including:
- Matrix operations bit-level operations, data lookups, data sorting, data comparison, logic operations, set/reset, read-modify-write operations, and additions and subtractions, additions and subtractions.
- the intelligent processing unit is further configured to fill and update data and transfer data to the next level of memory.
- control unit performs data loading or automatic data loading according to the data block size defined by the software definition and the reconstruction unit; and defines a dedicated storage in the storage unit. Storage area, loading abnormal or cluttered control programs.
- An intelligent terminal includes the aforementioned smart cache.
- the smart terminal includes a computer, or a notebook, or a mobile phone, or a personal digital assistant, or a gaming machine or the like.
- the intelligent cache of the invention allows the kernel to process only complex operations and cluttered control, and submits a large amount of data that is frequently used and processed simply to the intelligent processing unit of the smart cache for processing, and the intelligent processing unit not only processes the data.
- the data processing is close to the memory, thereby reducing the dependence on the bus, reducing the burden on the kernel, thereby achieving performance, power consumption, cost, etc. Balance.
- the control unit is flexibly organized, managed, and intelligently processed. With the close cooperation of the unit, an efficient storage system can be realized.
- Figure 1 is a schematic diagram of a typical cache structure
- FIG. 2 is a schematic structural diagram of a smart cache according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a smart cache according to Embodiment 1 of the present invention.
- FIG. 4 is a schematic structural diagram of a smart cache according to Embodiment 2 of the present invention.
- FIG. 5 is a schematic structural diagram of a smart cache according to Embodiment 3 of the present invention.
- FIG. 6 is a schematic structural diagram of a smart cache according to Embodiment 4 of the present invention.
- FIG. 7 is a schematic structural diagram of a smart cache according to Embodiment 5 of the present invention.
- FIG. 8 is a schematic structural diagram of a smart cache according to Embodiment 6 of the present invention.
- FIG. 9 is a schematic structural diagram of a smart cache according to Embodiment 7 of the present invention.
- FIG. 10 is a schematic structural diagram of a smart cache according to Embodiment 8 of the present invention.
- FIG. 11 is a schematic structural diagram of a smart cache according to Embodiment 9 of the present invention. detailed description
- the basic idea of the present invention is to provide an intelligent cache, which comprises a general interface, a software definition and reconstruction unit, a control unit, a storage unit and an intelligent processing unit; wherein: a general interface for receiving configuration information from a kernel or a bus And / or control information, and / or data information, and return target data; software definition and reconstruction unit for defining the storage system as the required cache cache memory according to the configuration information; control unit for controlling read and write Cache memory, and, in real time, monitoring instructions or data streams, controlling the storage unit to load required data in advance according to system information, characteristics of tasks to be executed, and characteristics of used data structures; storage unit, by a large number of storage modules Constructing, for caching data; and, according to the definition of the software definition and the reconstruction unit, taking the associated array of cached data (eg, Cache TAG) function and combining with the data cache storage unit into a required storage system structure; Processing unit, for processing the number of input and output According to the data, the data is transferred, transformed and operated between a
- the intelligent cache of the present invention mainly includes five processing units, namely, a general interface (GI, General Interface), a software definition and reconstruction unit (SDRU), and a control unit (CU, Control).
- GI General Interface
- SDRU software definition and reconstruction unit
- CU control unit
- SDRU Software Definition and Reconstruction Unit
- TCM TCM
- TCAM TCAM
- Cache Cache
- Cache Memorys of the same structure can be compared with other different kinds of storage structures. Exist, such as TCM, TCAM, etc. It is also possible to dynamically reconstruct the idle memory during the work process to achieve the purpose of making full use of the system storage resources; the control unit (CU) not only controls the reading and writing of the memory, but also monitors the command or data stream in real time, according to the system information, the task to be executed.
- the storage unit (MU) is composed of a large number of storage modules, The functions can be defined as needed. They can be used to store indexes, tags, identifiers, data or other information. These storage modules can be freely combined to implement complex storage structures, such as implementing the aforementioned TCM, TCAM or Cache;
- the unit (IPU) can process the input and output data of the SDCM, and can also transfer, transform and calculate the data between several structures defined in the MU, such as matrix operations, bit-level operations, data search, data sorting, comparison, Logical operation, set/reset, read-modify-write, etc. , and simple calculations such as increase, decrease, addition and subtraction.
- the IPU can also be used to fill and update data with the CU and transfer data to the next level of memory.
- the entire storage system can be defined as a Cache, TCM, CAM or other storage structure according to the requirements of the software, and the attributes of these storage structures can be configured, such as the size of the Cache, the degree of association, the row size, the allocation strategy, Write back mode, etc.;
- Cache interface can also be configured as a consistent interface for multi-core architecture; you can even define a page table buffer (TLB, Translation Lookaside Buffer) for Cache to achieve virtual and real address translation.
- TLB Translation Lookaside Buffer
- the size of TCM and CAM can also be configured. It can even be configured as a storage system structure with multiple Cache structures and TCMs.
- the present invention does not describe the connection relationship between the functional units. Those skilled in the art should understand that the above processing units can be connected by a bus or a dedicated interface.
- FIG. 3 is a schematic structural diagram of a smart cache according to Embodiment 1 of the present invention, as shown in FIG. 3,
- SDCM defines a fully associative Cache, a 4-way associative Cache, CAM, TCM, and other types of storage structures.
- all input or output data or indexes of the storage structure defined by the SDCM can be directly processed by the intelligent processing unit in the storage system, including transformation, bit insertion, bit lifting, resetting, shifting, and bit shifting.
- Simple operations such as inversion, increase/decrease, addition and subtraction, etc., do not have to hand all data processing tasks to the kernel, and data can flow between storage structures in a storage system to save bus bandwidth and reduce processor
- the kernel is only responsible for complex operations and control, thereby improving processing performance.
- IF_CU is used to indicate all the units of the SDCM of the embodiment of the present invention except the Memory and Intelligent Processing Unit.
- the SDCM can be defined as the Cache structure shown in FIG. 4 by a simple definition command, and under the combined control of the control unit, the foregoing
- the defined Cache can work like a normal Cache.
- the Cache defined here will not use complex replacement algorithms, nor will it use prefetching and speculative reading to improve performance.
- the method according to the information provided by the software, the characteristics of the tasks and data, the control unit completes the organization and management of the data, completes the loading of the data from the next-level storage system and updates the next-level storage system.
- FIG. 5 is a schematic structural diagram of a smart cache according to Embodiment 3 of the present invention.
- SDCM is defined as a TCAM
- more memory blocks are needed to read data records in parallel, and data is The readout is controlled by the IF_CU, and the parallel read data (Parallel Data) is simultaneously compared with the keyword (Key) in the intelligent processing unit (IPU), and the IPU outputs the result (Result) to indicate whether the data is successfully found. If the data is successfully found, IF_CU reads the corresponding index data from the memory block of the Data Record and outputs it.
- FIG. 6 is a schematic structural diagram of a smart cache according to Embodiment 4 of the present invention.
- the control unit when the SDCM is defined as a TCM, the control unit only needs to perform a simple function of reading and writing a ram operation, and the TCM can be at least 1 at least.
- the memory block has a read/write cycle of at least 1 clock cycle.
- different sizes of TCMs can be combined by selecting a different number of memory blocks.
- the SDCM can also be defined as a storage system having a simple data processing capability, such as bit operation, data search, matrix operation, etc., but the processing capability of the SDCM is not limited thereto.
- FIG. 7 is a schematic structural diagram of a smart cache according to Embodiment 5 of the present invention, and an example of an intelligent cache structure capable of bit operation defined by SDCM.
- the data to be processed is stored in a Cache, and the kernel only needs to send A leading data '0', or a leading T, bit inversion command, SDCM can return the result to the kernel or still stored in the Cache. How SDCM is transparent to the kernel.
- FIG. 8 is a schematic diagram showing the structure of an intelligent cache according to Embodiment 6 of the present invention.
- the SDCM can be defined as the structure shown in FIG. 8.
- the kernel only needs to give the storage location information of the initial matrix and the size of the matrix data block, and then give the command to start the calculation, which can be transposed by SDCM.
- SDCM first uses the IF_CU to initialize the column vector of the initial matrix. The Cache is read out into the TCM and written back to the Cache as a row vector.
- the TCM is composed of a plurality of small blocks, and is written into a single block.
- simultaneous reading of multiple blocks can be realized, so that bit extraction can be performed from the read data, and the inversion can be realized.
- all address offset calculations are performed by the IPU.
- the transposed matrix of the initial matrix is formed in the Cache. The processing time required for this transposition is related to the size of the matrix, and the software needs to know the delay in completing the transpose.
- FIG. 9 is a schematic structural diagram of a smart cache according to Embodiment 7 of the present invention, which is an example of SDCM used as a shared memory.
- the SDCM is connected to an operating system through a universal interface, and the connection mode may be a standard bus or an on-chip network, which is in the system.
- the location may be a shared storage system as shown in FIG. 9.
- the SDCM has a function of a master interface in addition to a slave interface, and the SDCM can initiate data transmission at any time and location.
- SDCM can also be used as a private memory for the kernel, or it can form a hierarchical storage structure.
- 10 is a schematic structural diagram of a smart cache according to Embodiment 8 of the present invention.
- FIG. 11 is a schematic structural diagram of a smart cache according to Embodiment 9 of the present invention.
- SDCM is used as a private memory in a core or multiple.
- An example of a hierarchical storage system you can connect SDCMs into a Symmetrical Multiple Processor (SMP) structure, as shown in Figure 10; or you can connect to an asymmetric multiprocessor (AMP, Asymmetric Multiple).
- SMP Symmetrical Multiple Processor
- AMP asymmetric multiprocessor
- Processor asymmetric multiprocessor
- SDCM can also be connected into many structures according to the needs of the application, and even one or more SDCMs can be added to other existing storage structures.
- the invention also describes an intelligent terminal, which comprises the above intelligent cache.
- the above intelligent terminal includes all intelligent terminals having a CPU control unit such as a computer, or a notebook, or a mobile phone, or a personal digital assistant, or a game machine.
- the intelligent cache of the invention allows the kernel to process only complex operations and cluttered control, and submits a large amount of data that is frequently used and processed simply to the intelligent processing unit of the smart cache for processing, and the intelligent processing unit not only processes the data.
- the intelligent processing unit For simple single data, and also for the processing of the entire specific data structure, try to close the data processing to the memory, thereby reducing the dependence on the bus, reducing the burden on the kernel, thereby achieving performance, power consumption, cost, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本发明公开了一种智能缓存及智能终端,智能缓存包括:通用接口,用于接收来自内核或总线的配置信息、和\或控制信息、和\或数据信息,并返回目标数据;软件定义和重构单元,用于根据配置信息将存储器定义为所需的缓存存储器;控制单元,用于控制读写Cache,以及,实时监控指令或数据流;存储单元,由大量的存储模块构成,用于缓存数据;以及,根据所述软件定义和重构单元的定义,由所述存储模块组合成所需的Cache存储器;智能处理单元,用于处理输入输出数据,将数据在所述控制单元中定义的多个结构之间进行转移、变换和运算。本发明能根据软件的运行状态、执行任务的特征、数据结构的特性,在控制单元灵活的组织、管理下以及智能处理单元的紧密配合下,实现高效的存储系统。
Description
智能緩存及智能终端 技术领域
本发明涉及智能緩存技术, 尤其涉及一种利用软硬件紧密结合的方法 实现灵活的、 可配置的软件定义的智能緩存, 以及具有上述智能緩存的智 能终端。 背景技术
传统的高速緩存(Cache Memory )利用处理器程序执行的时间或空间 局部性, 通过将最近和 /或经常执行的指令和数据暂存在靠近处理器单元的 Cache Memory内, 当需要访问某个指令或数据时先访问 Cache Memory,如 果 Cache Memory缺失( Miss ), 则访问速度较慢、 存储空间较大的下一级 存储器。
图 1为典型的 cache结构示意图, 如图 1所示, 由于载入 Cache的指令 或数据只能根据程序动态执行的情况来实时更新, 因此处理器核每次访存 时首先要搜索 Cache中的标签( Tag )阵列, 以确认所需指令或数据是否在 Cache之中, 一旦 Cache Miss, 查找 Tag和数据比较操作都将无效, 然后再 去访问下一级存储器, 这样浪费了多个处理器执行周期和 Cache功耗。 为 了增加 Cache的命中 (Hit )率, 通常采用组相联、 复杂的替换算法、 预取、 推测读以及层次化的多级 Cache结构等, 显然这些性能提升完全是依靠增 加硬件复杂度与芯片面积开销来换取的。 由于图 1 所示的緩存结构为现有 典型的 cache结构, 这里不再赘述其各部分的功能及其工作原理。
Cache还有一个缺点是 Hit与 Miss的访问延迟完全不一样, 无法预测 访存的延迟大小, 很多场合引入了紧耦合存储器 (TCM, Tightly Coupled Memory )。 TCM 是靠近处理器内核的静态随机存储器 ( SRAM , Static
Random Access Memory ), 其特点是速度快且延迟固定。 TCM的内容不能 实时替换、 容量固定且一般较小。 TCM的刷新完全依赖于软件调度, 刷新 TCM之前软件需要找出何时刷新并进行相应的配置操作, 且配置过程中 TCM不能被访问, 这都限制了 TCM的应用。
内容寻址存储器( CAM, Content Addressable Memory )是一种专用存 储器, 作为通用模块在针对某些具体应用场景时不能最大限度地发挥其性 能, 而且是通过将所有存储条目与输入条目同时并行比较的方式查询, 这 种方式的硬件代价非常大, 成本高。
综上, 完全依靠硬件复杂度和功耗或完全依靠软件的干预来提升性能 都是比较困难的, 而且处理器执行与访存的粒度(按指令) 比较细, 资源 被固定的分类和划分, 不仅效率低, 且浪费系统的存储资源。 如果将软硬 件紧密结合, 根据程序执行情况和数据结构的特征进行灵活的、 智能的处 理, 性能提升空间会更大, 性能、 功耗、 成本等会更平衡。 发明内容
有鉴于此, 本发明的主要目的在于提供一种智能緩存及智能终端, 能 针对具体应用, 由软件灵活定义、 配置和重构, 能解决传统 Cache的高复 杂度、 高开销、 高能耗以及延迟不可预测等的弊端, 也能解决 TCM数据更 新的低效以及存储单元灵活度低、 应用面窄的问题。
为达到上述目的, 本发明的技术方案是这样实现的:
一种智能緩存, 包括通用接口、 软件定义和重构单元、 控制单元、 存 储单元和智能处理单元; 其中:
通用接口, 用于接收来自内核或总线的配置信息、 和\或控制信息、 和\ 或数据信息, 并返回目标数据;
软件定义和重构单元, 用于根据配置信息将存储器定义为所需的緩存 Cache存储器;
控制单元, 用于控制读写 Cache存储器, 以及, 实时监控指令或数据 流, 根据系统信息、 将要执行的任务的特征以及使用到的数据结构的特性, 控制所述存储单元提前装载所需数据;
存储单元, 由大量的存储模块构成, 用于緩存数据; 以及, 根据所述 软件定义和重构单元的定义,由所述存储模块组合成所需的 Cache存储器; 智能处理单元, 用于处理输入输出数据, 将数据在所述控制单元中定 义的多个结构之间进行转移、 变换和运算。
优选地, 所述所需的 Cache存储器可以配置为包括以下种类存储器中 的至少一个:
紧耦合存储器 TCM、 内容寻址存储器 CAM、 高速緩存 Cache。
优选地, 所述通用接口中还包括多核环境下的一致性接口。
优选地, 所述软件定义和重构单元还用于定义多个不同属性的同类结 构的 Cache存储器, 所述不同属性的同类结构 Cache存储器包括以下结构 存储器的至少一种:全相联的 Cache、 16路相联的 Cache, 4路相联的 Cache,
2路相联的 Cache、 直接映射的 Cache。
优选地, 所述软件定义和重构单元还用于在工作过程中动态重构闲置 存储模块。
优选地, 所述智能处理单元将数据在所述控制单元中定义的多个结构 之间进行转移、 变换和运算, 包括:
矩阵运算、 比特级操作、 数据查找、 数据排序、 数据比较、 逻辑操作、 置位 /复位、 读 -修改 -写的操作, 以及增减量、 加减的运算。
优选地, 所述智能处理单元还用于对数据进行填充和更新, 以及将数 据转移到下一级存储器。
优选地, 所述控制单元根据所述软件定义和重构单元定义的数据块大 小进行数据装载或自动进行数据装载; 并在所述存储单元中定义专门的存
储区域, 装载异常或杂乱的控制程序。
一种智能终端, 包括前述的智能緩存。
所述智能终端包括计算机、 或笔记本、 或手机、 或个人数字助理、 或 游戏机等。
本发明的智能緩存, 可以让内核仅需要处理复杂的运算和杂乱的控制, 而将频繁使用且处理简单的大量数据提交给智能緩存的智能处理单元来处 理, 且智能处理单元对数据的处理不仅针对于简单的单个数据, 同时也针 对整个特定数据结构的处理等, 尽量将数据处理靠近存储器, 从而减少了 对总线的依赖, 减轻了内核的负担, 从而达到性能、 功耗、 成本等之间的 平衡。 在与软件的密切配合下, 即使没有预取、 推测读、 复杂的替换算法, 但根据软件的运行状态、 执行任务的特征、 数据结构的特性, 在控制单元 灵活的组织、 管理下以及智能处理单元的紧密配合下, 能够实现高效的存 储系统。 附图说明
图 1为典型的 cache结构示意图;
图 2为本发明实施例的智能緩存的组成结构示意图;
图 3为本发明实施例一的智能緩存的组成结构示意图;
图 4为本发明实施例二的智能緩存的组成结构示意图;
图 5为本发明实施例三的智能緩存的组成结构示意图;
图 6为本发明实施例四的智能緩存的组成结构示意图;
图 7为本发明实施例五的智能緩存的组成结构示意图;
图 8为本发明实施例六的智能緩存的组成结构示意图;
图 9为本发明实施例七的智能緩存的组成结构示意图;
图 10为本发明实施例八的智能緩存的组成结构示意图;
图 11为本发明实施例九的智能緩存的组成结构示意图。
具体实施方式
本发明的基本思想为: 提供一种智能緩存, 其包括通用接口、 软件定 义和重构单元、 控制单元、 存储单元和智能处理单元; 其中: 通用接口, 用于接收来自内核或总线的配置信息、 和\或控制信息、 和\或数据信息, 并 返回目标数据; 软件定义和重构单元, 用于根据配置信息将存储系统定义 为所需的緩存 Cache存储器; 控制单元, 用于控制读写 Cache存储器, 以 及, 实时监控指令或数据流, 根据系统信息、 将要执行的任务的特征以及 使用到的数据结构的特性, 控制所述存储单元提前装载所需数据; 存储单 元, 由大量的存储模块构成, 用于緩存数据; 以及, 根据所述软件定义和 重构单元的定义, 而承担緩存数据的关联阵列 (例如 Cache TAG )功能并 和数据緩存存储单元组合成所需的存储系统结构; 智能处理单元, 用于处 理输入输出数据, 将数据在所述控制单元中定义的多个结构之间进行转移、 变换和运算。
为使本发明的目的, 技术方案和优点更加清楚明白, 以下举实施例并 参照附图, 对本发明进一步详细说明。
图 2为本发明实施例的智能緩存的组成结构示意图, 需要说明的是, 本发明实施例的智能緩存也称为软件定义的緩存(SDCM, Software Define Cache Memory );在本申请文件中, SDCM即表示本发明实施例的智能緩存。 如图 2所示,本发明智能緩存主要包括五个处理单元,分别为通用接口( GI, General Interface ) , 软件定义和重构单元 ( SDRU , Software Define and Reconfiguration Unit )、 控制单元 ( CU, Control Unit ), 存储单元 (MU, Memory Unit )和智能处理单元(IPU, Intelligence Processing Unit ); 其中: 通用接口 (GI )用于接收来自内核和 /或总线的配置、 控制与数据信息, 并 向内核和 /或总线返回目标数据, 同时包含数据直接加载接口, 如直接内存 存取 ( DMA , Direct Memory Access )接口等, 也包括多核环境下的一致性
接口。 软件定义和重构单元(SDRU )根据配置信息将 SDCM 的 Memory 定义为需要的存储系统结构, 如定义为 TCM、 TCAM或 Cache等, 可以同 时定义多个不同属性的同类结构的 Cache Memory, 如全相联的 Cache、 4 路相联的 Cache, 16路相联的 Cache、 2路相联的 Cache,直接映射的 Cache 等,这些不同属性的同类结构的 Cache Memory可以和其他不同种类的存储 结构同时存在, 例如 TCM、 TCAM等。 也可以在工作过程中动态重构闲置 Memory, 达到充分利用系统存储资源的目的; 控制单元( CU )除了控制读 写 Memory夕卜, 还实时监控指令或数据流, 根据系统信息、 将要执行的任 务的特征、 使用到的数据结构的特性, 在智能处理单元的密切配合下, 提 前装载需要的数据, 以最大限度地提高命中率; 存储单元(MU ) 由大量的 存储模块构成, 这些存储模块的功能完全可根据需要而定义, 可用于存储 索引、 Tag、 标识、 数据或者其他信息, 这些存储模块间可以自由组合, 以 实现复杂的存储结构, 如实现前述的 TCM、 TCAM或 Cache等; 智能处理 单元(IPU )可以处理 SDCM的输入输出数据, 也可以将数据在 MU中定 义的几个结构之间进行转移、 变换和运算, 如进行矩阵运算、 比特级操作、 数据查找、 数据排序、 比较、 逻辑操作、 置位 /复位、 读 -修改 -写等操作, 以及增减量、加减等简单的运算。 IPU还能配合 CU实现进行数据的填充和 更新, 以及将数据转移到下一级存储器。
基于 SDCM, 根据软件的需求, 可以将整个存储系统定义成 Cache、 TCM、 CAM或其他存储结构, 并且这些存储结构的属性都可以配置, 例如 Cache的大小、 相联度、 行尺寸、 分配策略、 写回方式等; 也可以将 Cache 接口配置成一致性接口, 以适用于多核架构; 甚至可以为 Cache定义一个 页表緩沖器(TLB, Translation Lookaside Buffer ), 实现虚实地址的转换。 TCM、 CAM 的大小也可配置, 甚至可以配置为同时有多种 Cache结构和 TCM并存的存储系统结构。
下述的实施例中, 由于结构相对简单, 本发明不再——赘述各功能单 元之间的连接关系, 本领域技术人员应当理解, 上述各处理单元之间可以 通过总线或专用接口实现连接。
图 3为本发明实施例一的智能緩存的组成结构示意图, 如图 3所示,
SDCM定义了全相联 Cache、 4路相联的 Cache、 CAM, TCM以及其他类 型的存储结构。 本发明中, 所有 SDCM定义的存储结构的输入或输出的数 据或索引等, 都可以通过智能处理单元在存储系统内部直接进行处理, 包 括变换、 位插入、 位提起、 置复位、 移位、 位反转、 增减量、 加减等简单 的运算, 不必将所有数据处理任务都交给内核, 而且数据可以在一个存储 系统内各个存储结构之间进行流动, 以达到节省总线带宽, 减轻处理器负 担的目的, 内核仅负责复杂的运算和控制, 从而提高处理性能。
本发明的技术方案能实现各种所需的存储系统结构, 即可以根据实际 需要而自由定义存储系统结构, 以下再给出比较常用的几种应用实施方式。 在图 4至图 8中, 使用 IF_CU表示本发明实施例的 SDCM的除 Memory和 智能处理单元以外的其他所有单元。
图 4为本发明实施例二的智能緩存的组成结构示意图, 如图 4所示, 通过简单的定义命令即可将 SDCM定义为图 4所示的 Cache结构, 在控制 单元的组合控制下, 上述定义的 Cache可以像一般的 Cache那样地工作, 当然为了降低处理复杂度, 以及尽可能与软件紧密配合, 这里定义的 Cache 将不采用复杂的替换算法, 也不采用预取和推测读等提高性能的方法, 而 是根据软件提供的信息、 任务和数据的特征, 由控制单元完成对数据的组 织和管理, 完成数据从下一级存储系统的装载和更新下一级存储系统。 数 据的装载和更新到下一级的粒度是按任务进行的, 不会在任务还在处理时 将需要的数据替换出去。 本领域技术人员应当理解, 根据本发明提供的智 能緩存结构, 实现图 4所示的緩存配置是容易实现的。
图 5为本发明实施例三的智能緩存的组成结构示意图, 如图 5所示, 在将 SDCM定义成 TCAM时, 需要用到较多的存储块( Block )来并行读 出数据记录, 数据的读出由 IF_CU控制, 并行读出的数据 ( Parallel Data ) 在智能处理单元(IPU ) 中与关键字 (Key ) 同时进行比较, IPU通过输出 结果(Result )来表示是否成功找到数据。 如果成功找到该数据, 则 IF_CU 会从数据记录(Data Record ) 的存储块中读出相应的索引数据并输出。
图 6为本发明实施例四的智能緩存的组成结构示意图, 如图 6所示, 将 SDCM定义成 TCM时, 控制单元只需完成简单的读写 ram操作类似的 功能, TCM最少可以只需 1个存储块, 读写周期最少为 1个时钟周期, 当 然也可以通过选用数目不同的存储块来组合得到不同大小的 TCM。
本发明中,还可以将 SDCM定义成具有简单数据处理能力的存储系统, 如比特操作、 数据搜索、 矩阵运算等, 但 SDCM的处理能力不限于此。
在 IPU的控制下, 在 SDCM内部进行前导 '0' 或前导 T、 位反转 (最高位与最低位, 次高位与次低位等以此类推的所有位交换)等数据位 的操作比较方便。 需要说明的是, 上述的数据位的操作是数据存储处理中 常规的处理方式, 本发明不再赘述其实现细节。
图 7为本发明实施例五的智能緩存的组成结构示意图, 为 SDCM定义 的能进行比特操作的智能緩存结构示例, 如图 7所示, 将待处理的数据存 储在 Cache中, 内核只需要发送一条读取数据的前导 '0, 或前导 T、 位 反转命令, SDCM就可以将结果返回给内核或仍然存放在 Cache中, SDCM 如何计算对于内核来说是透明的。
还可以利用图 7所示的结构实现数据搜索, 内核只需通知 SDCM待搜 索的数组和感兴趣的数据, 再通过一个简单的搜索任务指令即可由 IF_CU 负责从 Cache中读取数组中的数据并提交给 IPU, 由 IPU比较判别是否为 目标数据, 如果找到目标数据, 则通过接口返回数据所在的地址。 本发明
的这种结构, 不但减轻了处理器的负担, 并且节省了处理器总线带宽, 可 以使处理器关注于其他任务的处理。
SDCM还可以承担一些大数据量的数据变换, 例如矩阵转置。 图 8为 本发明实施例六的智能緩存的组成结构示意图, 为 SDCM定义的能进行矩 阵运算的智能緩存结构示例, 如图 8所示, 可以将 SDCM定义成图 8所示 的结构。 在进行矩阵转置时, 内核只需给出初始矩阵的存储位置信息以及 矩阵数据块的大小, 然后给出启动计算的命令即可由 SDCM 完成转置, SDCM首先通过 IF_CU将初始矩阵的列向量从 Cache读出到 TCM中并作 为行向量写回 Cache中。这里, TCM由多个小 Block组成,写入单个 Block, 而在 IPU的控制下, 可以实现对多个 Block的同时读, 这样从读出的数据 中进行位提取, 即可实现倒置。 另外, 所有的地址偏移量计算均由 IPU完 成, 转移完所有列向量数据后即在 Cache 中形成初始矩阵的转置矩阵。 这 种转置需要花费的处理时间与矩阵的大小有关, 软件需要知道完成转置的 延迟。
图 9为本发明实施例七的智能緩存的组成结构示意图, 是 SDCM作为 共享式存储器使用的示例, SDCM通过通用接口与操作系统相连接, 连接 方式可为标准总线或片上网络, 其在系统中的位置可以为如图 9所示的共 享存储系统, 另外, SDCM除了具有从(slave )接口外, 还具有主(master ) 接口的功能, SDCM可以在任何时候和位置发起数据传输。
当然, SDCM也可以作为内核的私有存储器, 也可以形成层次化的存 储结构。 图 10为本发明实施例八的智能緩存的组成结构示意图, 图 11为 本发明实施例九的智能緩存的组成结构示意图, 如图 10、 图 11 所示, 为 SDCM 作为核内私有存储器或多层次的存储系统使用的示例; 既可以将 SDCM连接成对称多处理(SMP, Symmetrical Multiple Processor ) 结构, 如图 10所示; 也可以连接成非对称多处理器(AMP, Asymmetric Multiple
Processor )结构, 如图 11所示; 无论是 SMP或 AMP, 都可以实现 SDCM 间的一致性。
本领域技术人员应当理解, 还可以根据应用需要, 将 SDCM连接成很 多结构, 甚至可以在现有的其他存储结构中添加一个或多个 SDCM。
本发明还记载了一种智能终端, 该智能终端中包括上述的智能緩存。 上述智能终端包括计算机、 或笔记本、 或手机、 或个人数字助理、 或游戏 机等具有 CPU控制单元的所有智能终端。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。
工业实用性
本发明的智能緩存, 可以让内核仅需要处理复杂的运算和杂乱的控制, 而将频繁使用且处理简单的大量数据提交给智能緩存的智能处理单元来处 理, 且智能处理单元对数据的处理不仅针对于简单的单个数据, 同时也针 对整个特定数据结构的处理等, 尽量将数据处理靠近存储器, 从而减少了 对总线的依赖, 减轻了内核的负担, 从而达到性能、 功耗、 成本等间的平
Claims
1、 一种智能緩存, 其中, 所述智能緩存包括通用接口、 软件定义和重 构单元、 控制单元、 存储单元和智能处理单元; 其中:
通用接口, 用于接收来自内核或总线的配置信息、 和\或控制信息、 和\ 或数据信息, 并返回目标数据;
软件定义和重构单元, 用于根据配置信息将存储器定义为所需的緩存 Cache存储器;
控制单元, 用于控制读写 Cache存储器, 以及, 实时监控指令或数据 流, 根据系统信息、 将要执行的任务的特征以及使用到的数据结构的特性, 控制所述存储单元提前装载所需数据;
存储单元, 由大量的存储模块构成, 用于緩存数据; 以及, 根据所述 软件定义和重构单元的定义,由所述存储模块组合成所需的 Cache存储器; 智能处理单元, 用于处理输入输出数据, 将数据在所述控制单元中定 义的多个结构之间进行转移、 变换和运算。
2、 根据权利要求 1所述的智能緩存, 其中, 所述所需的 Cache存储器 可以配置为包括以下种类存储器中的至少一个:
紧耦合存储器 TCM、 内容寻址存储器 CAM、 高速緩存 Cache。
3、 根据权利要求 1所述的智能緩存, 其中, 所述通用接口中还包括多 核环境下的一致性接口。
4、 根据权利要求 1所述的智能緩存, 其中, 所述软件定义和重构单元 还用于定义多个不同属性的同类结构的 Cache存储器, 所述不同属性的同 类结构 Cache存储器包括以下结构存储器的至少一种: 全相联的 Cache、 16路相联的 Cache、4路相联的 Cache、2路相联的 Cache、直接映射的 Cache。
5、 根据权利要求 1所述的智能緩存, 其中, 所述软件定义和重构单元 还用于在工作过程中动态重构闲置存储模块。
6、 根据权利要求 1所述的智能緩存, 其中, 所述智能处理单元将数据 在所述控制单元中定义的多个结构之间进行转移、 变换和运算, 包括: 矩阵运算、 比特级操作、 数据查找、 数据排序、 数据比较、 逻辑操作、 置位 /复位、 读 -修改 -写的操作, 以及增减量、 加减的运算。
7、 根据权利要求 1所述的智能緩存, 其中, 所述智能处理单元还用于 对数据进行填充和更新, 以及将数据转移到下一级存储器。
8、 根据权利要求 1所述的智能緩存, 其中, 所述控制单元根据所述软 件定义和重构单元定义的数据块大小进行数据装载或自动进行数据装载; 并在所述存储单元中定义专门的存储区域, 装载异常或杂乱的控制程序。
9、 一种智能终端, 其中, 所述智能终端包括权利要求 1至 8中任一项 所述的智能緩存。
10、 根据权利要求 9所述的智能终端, 其中, 所述智能终端包括计算 机、 或笔记本、 或手机、 或个人数字助理、 或游戏机。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12867234.2A EP2808783B1 (en) | 2012-02-01 | 2012-06-29 | Smart cache and smart terminal |
US14/375,720 US9632940B2 (en) | 2012-02-01 | 2012-06-29 | Intelligence cache and intelligence terminal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210022513.3A CN103246542B (zh) | 2012-02-01 | 2012-02-01 | 智能缓存及智能终端 |
CN201210022513.3 | 2012-02-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013113206A1 true WO2013113206A1 (zh) | 2013-08-08 |
Family
ID=48904397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2012/077953 WO2013113206A1 (zh) | 2012-02-01 | 2012-06-29 | 智能缓存及智能终端 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9632940B2 (zh) |
EP (1) | EP2808783B1 (zh) |
CN (1) | CN103246542B (zh) |
WO (1) | WO2013113206A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150108148A (ko) * | 2014-03-17 | 2015-09-25 | 한국전자통신연구원 | 캐시의 부분연관 재구성을 이용한 캐시 제어 장치 및 캐시 관리 방법 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902502B (zh) * | 2014-04-09 | 2017-01-04 | 上海理工大学 | 一种可扩展的分离式异构千核系统 |
WO2015198356A1 (en) * | 2014-06-23 | 2015-12-30 | Unicredit S.P.A. | Method and system for database processing units |
CN105830038B (zh) * | 2014-06-30 | 2019-03-05 | 华为技术有限公司 | 一种访问存储设备的方法和主机 |
US9658963B2 (en) * | 2014-12-23 | 2017-05-23 | Intel Corporation | Speculative reads in buffered memory |
CN106708747A (zh) * | 2015-11-17 | 2017-05-24 | 深圳市中兴微电子技术有限公司 | 一种存储器切换方法及装置 |
US10915453B2 (en) * | 2016-12-29 | 2021-02-09 | Intel Corporation | Multi level system memory having different caching structures and memory controller that supports concurrent look-up into the different caching structures |
CN106936901B (zh) * | 2017-02-27 | 2019-09-17 | 烽火通信科技股份有限公司 | 一种基于msa协议的双向通信系统及其实现方法 |
CN107229722A (zh) * | 2017-06-05 | 2017-10-03 | 商丘医学高等专科学校 | 一种智能数学运算处理系统 |
CN113396402A (zh) * | 2019-02-14 | 2021-09-14 | 瑞典爱立信有限公司 | 用于控制存储器处理的方法和装置 |
CN112306558A (zh) * | 2019-08-01 | 2021-02-02 | 杭州中天微系统有限公司 | 处理单元、处理器、处理系统、电子设备和处理方法 |
CN111090393A (zh) * | 2019-11-22 | 2020-05-01 | Oppo广东移动通信有限公司 | 存储数据处理方法、存储数据处理装置及电子装置 |
CN113419709B (zh) * | 2021-06-22 | 2023-03-24 | 展讯通信(上海)有限公司 | 软件优化方法及装置、电子设备、可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1746865A (zh) * | 2005-10-13 | 2006-03-15 | 上海交通大学 | 数字信号处理器可重构指令高速缓存部分的实现方法 |
CN101814039A (zh) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | 一种基于GPU的Cache模拟器及其空间并行加速模拟方法 |
US20110145626A2 (en) * | 2003-05-30 | 2011-06-16 | Steven Frank | Virtual processor methods and apparatus with unified event notification and consumer-produced memory operations |
CN102289390A (zh) * | 2010-06-01 | 2011-12-21 | 微软公司 | 系统管理程序调度器 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5367653A (en) * | 1991-12-26 | 1994-11-22 | International Business Machines Corporation | Reconfigurable multi-way associative cache memory |
US6678790B1 (en) * | 1997-06-09 | 2004-01-13 | Hewlett-Packard Development Company, L.P. | Microprocessor chip having a memory that is reconfigurable to function as on-chip main memory or an on-chip cache |
US6334177B1 (en) * | 1998-12-18 | 2001-12-25 | International Business Machines Corporation | Method and system for supporting software partitions and dynamic reconfiguration within a non-uniform memory access system |
EP1045307B1 (en) * | 1999-04-16 | 2006-07-12 | Infineon Technologies North America Corp. | Dynamic reconfiguration of a micro-controller cache memory |
US6347346B1 (en) | 1999-06-30 | 2002-02-12 | Chameleon Systems, Inc. | Local memory unit system with global access for use on reconfigurable chips |
US6931488B2 (en) | 2001-10-30 | 2005-08-16 | Sun Microsystems, Inc. | Reconfigurable cache for application-based memory configuration |
US20050138264A1 (en) | 2003-02-27 | 2005-06-23 | Fujitsu Limited | Cache memory |
US7039756B2 (en) | 2003-04-28 | 2006-05-02 | Lsi Logic Corporation | Method for use of ternary CAM to implement software programmable cache policies |
US7257678B2 (en) * | 2004-10-01 | 2007-08-14 | Advanced Micro Devices, Inc. | Dynamic reconfiguration of cache memory |
JP4366298B2 (ja) * | 2004-12-02 | 2009-11-18 | 富士通株式会社 | 記憶装置、その制御方法及びプログラム |
US7467280B2 (en) | 2006-07-05 | 2008-12-16 | International Business Machines Corporation | Method for reconfiguring cache memory based on at least analysis of heat generated during runtime, at least by associating an access bit with a cache line and associating a granularity bit with a cache line in level-2 cache |
CN101788927B (zh) * | 2010-01-20 | 2012-08-01 | 哈尔滨工业大学 | 一种基于fpga的自适应星载计算机实现内部资源动态分配的方法 |
-
2012
- 2012-02-01 CN CN201210022513.3A patent/CN103246542B/zh active Active
- 2012-06-29 EP EP12867234.2A patent/EP2808783B1/en active Active
- 2012-06-29 WO PCT/CN2012/077953 patent/WO2013113206A1/zh active Application Filing
- 2012-06-29 US US14/375,720 patent/US9632940B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110145626A2 (en) * | 2003-05-30 | 2011-06-16 | Steven Frank | Virtual processor methods and apparatus with unified event notification and consumer-produced memory operations |
CN1746865A (zh) * | 2005-10-13 | 2006-03-15 | 上海交通大学 | 数字信号处理器可重构指令高速缓存部分的实现方法 |
CN101814039A (zh) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | 一种基于GPU的Cache模拟器及其空间并行加速模拟方法 |
CN102289390A (zh) * | 2010-06-01 | 2011-12-21 | 微软公司 | 系统管理程序调度器 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2808783A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150108148A (ko) * | 2014-03-17 | 2015-09-25 | 한국전자통신연구원 | 캐시의 부분연관 재구성을 이용한 캐시 제어 장치 및 캐시 관리 방법 |
KR102317248B1 (ko) * | 2014-03-17 | 2021-10-26 | 한국전자통신연구원 | 캐시의 부분연관 재구성을 이용한 캐시 제어 장치 및 캐시 관리 방법 |
Also Published As
Publication number | Publication date |
---|---|
EP2808783B1 (en) | 2019-11-27 |
EP2808783A4 (en) | 2015-09-16 |
CN103246542B (zh) | 2017-11-14 |
US9632940B2 (en) | 2017-04-25 |
CN103246542A (zh) | 2013-08-14 |
EP2808783A1 (en) | 2014-12-03 |
US20150309937A1 (en) | 2015-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013113206A1 (zh) | 智能缓存及智能终端 | |
CN110741356B (zh) | 多处理器系统中的中继一致存储器管理 | |
US8954674B2 (en) | Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems | |
US9384134B2 (en) | Persistent memory for processor main memory | |
US7526608B2 (en) | Methods and apparatus for providing a software implemented cache memory | |
JP3197866B2 (ja) | キャッシュの操作を改良する方法及びコンピュータ・システム | |
JP2010532517A (ja) | 連想度を設定可能なキャッシュメモリ | |
CN104937568B (zh) | 用于多页尺寸转换后备缓冲器(tlb)的装置和方法 | |
US20040225840A1 (en) | Apparatus and method to provide multithreaded computer processing | |
CN104169892A (zh) | 并发访问的组相联溢出缓存 | |
CN105393210A (zh) | 用于模拟共享存储器结构的存储器单元 | |
US20210224213A1 (en) | Techniques for near data acceleration for a multi-core architecture | |
WO2023165317A1 (zh) | 内存访问方法和装置 | |
TWI453584B (zh) | 處理非對準式記憶體存取的設備、系統及方法 | |
Zhang et al. | Fuse: Fusing stt-mram into gpus to alleviate off-chip memory access overheads | |
CN112527729A (zh) | 一种紧耦合异构多核处理器架构及其处理方法 | |
US20170109277A1 (en) | Memory system | |
CN117435251A (zh) | 一种后量子密码算法处理器及其片上系统 | |
US10620958B1 (en) | Crossbar between clients and a cache | |
Lu et al. | Achieving efficient packet-based memory system by exploiting correlation of memory requests | |
US9804985B2 (en) | Hardware abstract data structure, data processing method and system | |
Sahoo et al. | CAMO: A novel cache management organization for GPGPUs | |
WO2024183678A1 (zh) | 获取数据对象的锁的方法、网卡以及计算设备 | |
CN118279126B (zh) | 图形处理单元显存处理方法、服务器、产品、设备及介质 | |
KR20240092601A (ko) | 모드 변환 가능한 온칩 메모리 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12867234 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14375720 Country of ref document: US Ref document number: 2012867234 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |