WO2014206229A1 - Accelerator and data processing method - Google Patents

Accelerator and data processing method Download PDF

Info

Publication number
WO2014206229A1
WO2014206229A1 PCT/CN2014/080162 CN2014080162W WO2014206229A1 WO 2014206229 A1 WO2014206229 A1 WO 2014206229A1 CN 2014080162 W CN2014080162 W CN 2014080162W WO 2014206229 A1 WO2014206229 A1 WO 2014206229A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory access
access request
memory
selector
accelerator
Prior art date
Application number
PCT/CN2014/080162
Other languages
French (fr)
Chinese (zh)
Inventor
崔泽汉
陈明宇
刘垚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014206229A1 publication Critical patent/WO2014206229A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Abstract

An accelerator and a data processing method, which are used for upgrading existing computer equipment to improve the data processing efficiency of the computer equipment. The accelerator comprises a controller interface, a row address judgement unit, a first selector, an acceleration register unit, an acceleration engine, a bus control arbiter, a second selector and a memory interface.

Description

一种加速器以及数据处理方法 本申请要求于 2013 年 06 月 28 日提交中国专利局、 申请号为 201310269782.4, 发明名称为"一种加速器以及数据处理方法,,的中国专利申请 的优先权, 其全部内容通过引用结合在本申请中。  Accelerator and data processing method The present application claims priority to Chinese Patent Application No. 201310269782.4, entitled "Accelerator and Data Processing Method," which is filed on June 28, 2013. The content is incorporated herein by reference.
技术领域 Technical field
本发明涉及计算机数据处理领域, 尤其涉及一种加速器以及数据处理方 法。  The present invention relates to the field of computer data processing, and more particularly to an accelerator and a data processing method.
背景技术 Background technique
在计算机系统中, 主存储器( Main Memory )的存取速度一直比中央处理 器操作速度慢得多,使中央处理器的高速处理能力不能充分发挥,很多的时间 被浪费在等待数据返回上, 整个计算机系统的工作效率受到影响。  In a computer system, the access speed of the main memory is always much slower than that of the central processing unit, so that the high-speed processing capability of the central processing unit cannot be fully utilized, and a lot of time is wasted waiting for data to be returned. The efficiency of computer systems is affected.
为了緩和中央处理器和主存储器之间速度不匹配的矛盾,在存储层次上增 加高速緩冲存储器 (Cache )是一种比较通用的方法。 高速緩冲存储器的容量 只有主存储器的几千分之一,但它的存取速度比主存储器要快的多。根据程序 局部性原理, 正在使用的主存储器某一单元在将来被再次访问的可能性很大 (时间局部性),并且其附近的那些单元被用到的可能性也很大(空间局部性)。 因而, 当中央处理器存取主存储器某一单元时,计算机硬件就自动地将包括该 单元在内的那一组单元内容调入高速緩冲存储器,中央处理器即将存取的主存 储器单元很可能就在刚刚调入到高速緩冲存储器中的那一组单元内。 于是, 中 央处理器就可以对高速緩冲存储器进行存取。在整个处理过程中,如果中央处 理器绝大多数存取主存储器的操作能为存取高速緩冲存储器所代替,计算机的 处理速度就能显著提高。  In order to alleviate the conflict of speed mismatch between the central processing unit and the main memory, it is a more general method to increase the cache level (Cache) at the storage level. The size of the cache is only a few thousandth of the main memory, but it is much faster than the main memory. According to the principle of program locality, a unit of the main memory being used is highly likely to be accessed again in the future (time locality), and the possibility that those units in the vicinity are used is also large (spatial locality). . Thus, when the central processor accesses a unit of the main memory, the computer hardware automatically transfers the set of unit contents including the unit into the cache memory, and the main memory unit to be accessed by the central processing unit is very It may be in the set of cells that have just been loaded into the cache. Thus, the central processor can access the cache. Throughout the process, if most of the central processor's access to the main memory can be replaced by access to the cache, the processing speed of the computer can be significantly improved.
虽然高速緩冲存储器可以显著提高性能,但是其容量有限,如果中央处理 器要存取的单元不在高速緩冲存储器中, 仍然需要存取延迟较高的主存储器, 这仍然是制约性能的关键因素。 同时,如果局部性差的数据单元被取到高速緩 冲存储器, 可能会把一些局部性好的数据单元替换出高速緩冲存储器, 即所谓 的高速緩冲存储器污染( cache pollution ) , 这会造成中央处理器需要不断的 去访问处理速度较低的主存储器, 导致计算机整体的运行效率受到影响。  Although the cache can significantly improve performance, its capacity is limited. If the unit to be accessed by the central processor is not in the cache, the main memory with high access latency is still required, which is still a key factor limiting performance. . At the same time, if a locally poor data unit is fetched into the cache, some locally good data units may be replaced by the cache, the so-called cache pollution, which causes the central The processor needs to constantly access the main memory with low processing speed, which affects the overall operating efficiency of the computer.
在现有技术中, 如图 1所示, 在内存控制器中增加加速器, 把局部性差的数据 单元操作放在加速器中执行。 由于不需要将数据单元取到中央处理器, 而直接 在离主存储器更近的内存控制器中处理, 因此节省了一部分访存延迟。 同时, 由于这些数据单元操作局部性比较差, 因此不用将其取到高速緩冲存储器, 不 会对中央处理器后续的操作造成性能影响。 In the prior art, as shown in FIG. 1, an accelerator is added to the memory controller to make the data with poor locality. Unit operations are performed in the accelerator. Since there is no need to fetch the data unit to the central processor and process it directly in the memory controller closer to the main memory, a portion of the memory access latency is saved. At the same time, since these data units are relatively poor in local operation, they are not taken into the cache memory and do not have a performance impact on subsequent operations of the central processing unit.
但是, 这种现有技术需要修改内存控制器, 而内存控制器和中央处理器通常集 成在一个中央处理器芯片内,因此改动内存控制器需要对整个中央处理器芯片 进行重新设计, 验证, 流片和测试, 成本太高; 并且, 难以对现有的计算机实 现改造升级。 However, this prior art requires modification of the memory controller, and the memory controller and the central processing unit are usually integrated in one central processing chip, so changing the memory controller requires redesigning, verifying, and streaming the entire central processing chip. Pieces and tests are too costly; and it is difficult to retrofit existing computers.
发明内容 Summary of the invention
本发明实施例提供了一种加速器以及数据处理方法,用于对现有计算机设 备进行升级, 提高计算机设备的数据处理效率。  Embodiments of the present invention provide an accelerator and a data processing method for upgrading an existing computer device to improve data processing efficiency of the computer device.
本发明实施例第一方面提供的加速器, 包括: The accelerator provided by the first aspect of the embodiments of the present invention includes:
控制器接口, 行地址判断单元, 第一选择器, 加速寄存单元, 加速引擎, 总线控制裁决器, 第二选择器和存储器接口;  Controller interface, row address judging unit, first selector, accelerating register unit, acceleration engine, bus control arbiter, second selector and memory interface;
所述控制器接口用于接收内存控制器传输的访存请求, 所述访存请求包 括: 正常访存请求和力口速访存请求;  The controller interface is configured to receive a memory access request transmitted by the memory controller, where the memory access request includes: a normal memory access request and a power port speed access request;
所述行地址判断单元用于根据所述访存请求的行地址判断所述访存请求 的请求类型, 生成并向所述第一选择器发送第一控制信号; 若所述访存请求为 正常访存请求, 则生成向所述第二选择器传输的第一控制信号; 若所述访存请 求为加速访存请求, 则生成向所述加速寄存单元传输的第一控制信号;  The row address determining unit is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector; if the memory access request is normal And acquiring, by the memory access request, a first control signal transmitted to the second selector; if the memory access request is an accelerated memory access request, generating a first control signal transmitted to the acceleration register unit;
所述第一选择器用于根据所述第一控制信号选择所述访存请求的传输方 向;  The first selector is configured to select a transmission direction of the memory access request according to the first control signal;
所述加速寄存单元用于存储所述加速访存请求的处理信息;  The acceleration registration unit is configured to store processing information of the accelerated memory access request;
所述加速引擎用于通过向所述加速寄存单元调用所述加速访存请求的处 理信息, 并根据所述处理信息通过所述第二选择器访问主存储器, 以执行所述 加速访存请求的数据处理操作;  The acceleration engine is configured to execute the accelerated memory access request by invoking processing information of the accelerated memory access request to the acceleration registration unit, and accessing a main memory through the second selector according to the processing information. Data processing operation;
所述总线控制裁决器用于生成并向所述第二选择器发送第二控制信号; 所述第二选择器用于接收第一选择器传输的正常访存请求,加速引擎传输 的加速访存请求以及总线控制裁决器发送的第二控制信号;并根据所述第二控 制信号选择当前访问所述主存储器的访存请求; 所述存储器接口用于向所述主存储器传输所述访存请求,以及向所述第二 选择器传输所述访存请求对应的响应数据。 The bus control arbiter is configured to generate and send a second control signal to the second selector; the second selector is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and The bus controls a second control signal sent by the arbiter; and selects a current access request to access the main memory according to the second control signal; The memory interface is configured to transmit the memory access request to the main memory, and transmit response data corresponding to the memory access request to the second selector.
结合本发明实施例第一方面提供的加速器, 在第一种可能实现的方案中, 所述总线控制裁决器具体用于:当所述第二选择器需要选择所述正常访存请求 和所述加速访存请求中的任意一个访问所述主存储器时,生成优先处理所述正 常访存请求的第二控制信号。  With reference to the accelerator provided by the first aspect of the embodiments of the present invention, in a first possible implementation, the bus control arbiter is specifically configured to: when the second selector needs to select the normal memory access request and the When any one of the accelerated memory access requests accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
结合本发明实施例第一种可能实现的加速器, 在第二种可能实现的方案 中, 所述总线控制裁决器具体还用于: 当所述存储器接口正在处理所述加速访 存请求时, 所述第二选择器收到了所述正常访存请求, 则判断所述正常访存请 求的访问类型, 若是写存请求, 则推迟至所述存储器接口释放后再发送; 若是 读存请求, 则通过所述第一选择器向所述内存控制器发送纠错码 ECC错误消 息。  In conjunction with the first possible implementation of the embodiment of the present invention, in a second possible implementation, the bus control arbiter is further configured to: when the memory interface is processing the accelerated memory access request, If the second selector receives the normal memory access request, it determines the access type of the normal memory access request, and if it is a write request, defers to the memory interface and then sends the message; if it is a read request, it passes The first selector transmits an error correction code ECC error message to the memory controller.
结合本发明实施例第一方面提供的加速器, 在第三种可能实现的方案中, 所述加速寄存单元包括:  With reference to the accelerator provided by the first aspect of the embodiments of the present invention, in a third possible implementation, the acceleration registering unit includes:
命令队列, 用于存储所述加速访存请求为加速命令时的命令信息, 所述命 令信息包括命令类型、 源操作数或源操作数地址;  a command queue, configured to store command information when the accelerated memory access request is an acceleration command, where the command information includes a command type, a source operand, or a source operand address;
配置寄存器, 用于存储所述加速访存请求为配置请求时的配置信息, 所述 配置信息包括主存储器的物理地址到行地址和列地址的映射关系;  a configuration register, configured to store configuration information when the accelerated memory access request is a configuration request, where the configuration information includes a mapping relationship between a physical address of the main memory and a row address and a column address;
结果寄存器, 用于存储所述加速访存请求的执行状态和响应数据。  a result register, configured to store an execution state and response data of the accelerated memory access request.
结合本发明实施例第三种可能实现的方案, 在第四种可能实现的方案中, 所述第一选择器还用于: 当所述结果寄存器返回所述加速访存请求的响应数 据,且所述第二选择器返回所述正常访存请求的响应数据时,根据所述总线控 制裁决器生成的第二控制信号选择向所述控制器接口传输的响应数据。  With reference to the third possible implementation of the embodiment of the present invention, in a fourth possible implementation, the first selector is further configured to: when the result register returns response data of the accelerated memory access request, and When the second selector returns the response data of the normal memory access request, the response data transmitted to the controller interface is selected according to the second control signal generated by the bus control arbiter.
结合本发明实施例第三种可能实现的方案, 在第五种可能实现的方案中, 所述第二选择器还用于:  With reference to the third possible implementation of the embodiment of the present invention, in a fifth possible implementation, the second selector is further configured to:
当所述存储器接口返回所述访存请求的响应数据时,根据所述响应数据所 述对应的访存请求的请求类型,选择向所述加速引擎或所述第一选择器传输所 述响应数据。  When the memory interface returns the response data of the memory access request, selecting to transmit the response data to the acceleration engine or the first selector according to the request type of the corresponding memory access request of the response data .
结合本发明实施例第一方面提供的加速器, 在第六种可能实现的方案中, 所述加速器还包括: 路由模块, 用于将所述加速访存请求传输到对应的主存储器中, 所述路由 模块分别与所述加速引擎和另一个加速器相连接,当所述加速访存请求所需要 的数据不在本地加速器所连接的主存储器时,所述加速引擎将所述加速访存请 求传输到所述路由模块,所述路由模块再将所述加速访存请求传输到另一个加 速器,使得所述另一个加速器根据所述加速访存请求对所述另一个加速器连接 的主存储器进行数据访问。 With reference to the accelerator provided by the first aspect of the embodiment of the present invention, in the sixth possible implementation, the accelerator further includes: a routing module, configured to transmit the accelerated memory access request to a corresponding main memory, where the routing module is respectively connected to the acceleration engine and another accelerator, where data required for the accelerated memory access request is not local The acceleration engine transmits the accelerated memory access request to the routing module when the accelerator is connected to the main memory, and the routing module transmits the accelerated memory access request to another accelerator, so that the other accelerator Data access is performed to the main memory connected to the other accelerator according to the accelerated memory access request.
结合本发明实施例第六种可能实现的方案, 在第七种可能实现的方案中, 所述路由模块与所述第二选择器相连接,使得所述路由模块将接收到的另一个 加速器发送的加速访存请求,并通过所述第二选择器传输至本地加速器所连接 的主存储器中。  With reference to the sixth possible implementation of the embodiment of the present invention, in a seventh possible implementation, the routing module is connected to the second selector, so that the routing module sends another accelerator that is received. The accelerated fetch request is transmitted to the main memory to which the local accelerator is connected through the second selector.
本发明实施例第一方面提供的数据处理方法, 包括:  The data processing method provided by the first aspect of the embodiments of the present invention includes:
加速器接收内存控制器传输的访存请求, 所述访存请求包括: 正常访存请 求和力口速访存请求;  The accelerator receives the memory access request transmitted by the memory controller, where the memory access request includes: a normal memory access request and a power port speed access request;
所述加速器根据所述访存请求的行地址判断所述访存请求的请求类型;若 访存请求为力口速访存请求, 则对所述力口速访存请求进行緩存, 并在所述力口速器 内对所述加速访存请求进行处理; 若所述正常访存请求, 则将所述正常访存请 求传输至主存储器进行处理。  Determining, by the accelerator, the request type of the memory access request according to the row address of the memory access request; if the memory access request is a power port speed access memory request, buffering the power port speed access memory request, and The accelerated memory access request is processed in the power porter; if the normal memory access request is received, the normal memory access request is transmitted to the main memory for processing.
结合本发明实施例第二方面提供的方法,在第一种可能实现的方案中, 所 述方法还包括:  With reference to the method provided by the second aspect of the embodiments of the present invention, in the first possible implementation, the method further includes:
当所述加速器需要选择所述正常访存请求和所述加速访存请求中的任意 一个访问所述主存储器时, 优先选择所述正常访存请求。  When the accelerator needs to select any one of the normal memory access request and the accelerated memory access request to access the main memory, the normal memory access request is preferentially selected.
结合本发明实施例第二方面提供的方法,在第二种可能实现的方案中, 所 述方法还包括:  With reference to the method provided by the second aspect of the embodiments of the present invention, in a second possible implementation, the method further includes:
当所述加速器的存储器接口正在处理所述加速访存请求时,所述加速器的 第二选择器收到了所述正常访存请求, 则判断所述正常访存请求的访问类型, 若是写存请求,则将所述正常访存请求推迟至所述加速器的存储器接口释放后 再发送至所述主存储器;若是读存请求,则向所述内存控制器发送糾错码 ECC 错误消息。  When the accelerator interface of the accelerator is processing the accelerated memory access request, the second selector of the accelerator receives the normal memory access request, and determines the access type of the normal memory access request, if the memory access request is And delaying the normal memory access request until the memory interface of the accelerator is released, and then sending the error to the memory controller to the memory controller; if the memory request is read, sending an error correction code ECC error message to the memory controller.
从以上技术方案可以看出, 本发明实施例具有以下优点:  As can be seen from the above technical solutions, the embodiments of the present invention have the following advantages:
本发明实施例中的加速器通过控制器接口和存储器接口分别与计算机设 备中的内存控制器和主存储器连接, 当接收到内存控制器传输的访存请求时, 由行地址判断单元判断该访存请求为正常访存请求或加速访存请求, 其中,正 常访存请求对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 若所述访存请求为正常访存请求, 则行地址判断单元 指示第一选择器向所述第二选择器发送该正常访存请求,使得第二选择器直接 将该正常访存请求传输给主存储器进行处理; 若所述访存请求为加速访存请 求,则行地址判断单元指示第一选择器向所述加速寄存单元发送该加速访存请 求,使得加速寄存单元对该加速访存请求中的处理信息进行緩存, 并触发加速 引擎对该加速访存请求对应的局部性较差的数据单元进行处理,从而使得局部 性较好的数据单元能集中在高速緩冲存储器中处理,而加速引擎对局部性较差 的数据单元的处理性能也强于中央处理器,因此提高了计算机的数据单元处理 速度; 并且, 本发明实施例中的加速器可以通过控制器接口和存储器接口分别 与计算机设备中的内存控制器和主存储器连接, 兼容现有计算机的硬件结构, 实现了对现有计算机设备的数据处理能力的升级。 The accelerator in the embodiment of the present invention is respectively configured with a computer through a controller interface and a memory interface. The memory controller and the main memory connection in the standby device, when receiving the memory access request transmitted by the memory controller, the row address determining unit determines that the memory access request is a normal memory access request or an accelerated memory access request, wherein the normal memory access The requesting corresponding request is a locally better data unit, and the accelerated access request corresponding to the request is a locally poor data unit; if the memory access request is a normal memory access request, the row address determining unit indicates the first selection Transmitting the normal memory access request to the second selector, so that the second selector directly transfers the normal memory access request to the main memory for processing; if the memory access request is an accelerated memory access request, the row address is determined. The unit instructs the first selector to send the accelerated memory access request to the acceleration register unit, so that the acceleration registration unit caches the processing information in the accelerated memory access request, and triggers the localization of the acceleration engine corresponding to the accelerated memory access request. Poor data units are processed so that locally better data units can be concentrated in the cache, and The processing performance of the data unit of the poorly localized data unit is also stronger than that of the central processing unit, thereby improving the processing speed of the data unit of the computer; and the accelerator in the embodiment of the present invention can be respectively connected to the computer equipment through the controller interface and the memory interface. The memory controller and the main memory connection are compatible with the hardware structure of the existing computer, and the data processing capability of the existing computer equipment is upgraded.
附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例中所需要使用的附图作简单地介绍,显而易见地, 下面描述中的附图仅仅是 本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的 前提下, 还可以根据这些附图获得其他的附图。  BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only Some embodiments of the invention may also be used to obtain other figures from these figures without departing from the art.
图 1是现有技术中计算机的一个结构示意图;  1 is a schematic structural view of a computer in the prior art;
图 2是本发明实施例中加速器的一个结构示意图;  2 is a schematic structural view of an accelerator in an embodiment of the present invention;
图 3是本发明实施例中加速器的另一个结构示意图;  3 is another schematic structural view of an accelerator in an embodiment of the present invention;
图 4是本发明实施例中计算机的一个结构示意图;  4 is a schematic structural diagram of a computer in an embodiment of the present invention;
图 5是本发明实施例中加速器的另一个结构示意图;  FIG. 5 is another schematic structural diagram of an accelerator in an embodiment of the present invention; FIG.
图 6是本发明实施例中数据处理方法的一个流程示意图。  FIG. 6 is a schematic flow chart of a data processing method in an embodiment of the present invention.
具体实施方式 detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 本发明实施例提供了一种加速器以及数据处理方法,用于对现有计算机设 备进行升级, 提高计算机设备的数据处理效率。 The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Embodiments of the present invention provide an accelerator and a data processing method for upgrading an existing computer device to improve data processing efficiency of the computer device.
请参阅图 2, 本发明实施例中加速器的一个实施例包括:  Referring to FIG. 2, an embodiment of the accelerator in the embodiment of the present invention includes:
控制器接口 101, 行地址判断单元 102, 第一选择器 103, 加速寄存单元 104,加速引擎 105, 总线控制裁决器 106, 第二选择器 107和存储器接口 108; 上述各个单元的连接关系如图 2所示。  Controller interface 101, row address determining unit 102, first selector 103, acceleration register unit 104, acceleration engine 105, bus control arbiter 106, second selector 107 and memory interface 108; 2 is shown.
所述控制器接口 101用于接收内存控制器 20传输的访存请求, 具体的, 所述访存请求为中央处理器需要访问主存储器进行读写操作的指令,所述访存 请求中携带有主存储器的访存地址。  The controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20. Specifically, the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access request carries The memory address of the main memory.
在本发明实施例中, 访存请求包括: 正常访存请求和加速访存请求; 正常 访存请求为对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 在本发明实施例在生成访问请求之前,会对将要被请 求的数据单元进行数据分析,确定该数据单元局部性的好坏, 具体可以通过设 置一个阔值判定, 当数据单元的局部性大于或等于某一阔值时, 可以确定该数 据单元的局部性较好, 则对应生成正常访存请求; 当数据单元的局部性小于某 一阔值时, 可以确定该数据单元的局部性较差, 则对应生成加速访存请求。  In the embodiment of the present invention, the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison. In the embodiment of the present invention, before the access request is generated, data analysis is performed on the data unit to be requested, and the locality of the data unit is determined. Specifically, the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
在现有技术中,局部性较好的数据单元指的是被再次访问的几率较大的数 据单元(时间局部性), 或若一数据单元被访问过一次后, 该数据单元存储位 置附件的其它数据单元也被访问的几率较大, 则该数据单元的局部性较好(空 间局部性)。  In the prior art, a locally better data unit refers to a data unit (time locality) that is more likely to be accessed again, or if a data unit is accessed once, the data unit stores a location attachment. The probability that other data units are also accessed is greater, and the locality of the data unit is better (spatial locality).
所述行地址判断单元 102 用于根据所述访存请求的行地址判断所述访存 请求的请求类型, 生成并向所述第一选择器发送第一控制信号; 若所述访存请 求为正常访存请求, 则生成向所述第二选择器 107传输的第一控制信号; 若所 述访存请求为加速访存请求,则生成向所述加速寄存单元 104传输的第一控制 信号。  The row address determining unit 102 is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector, if the memory access request is The normal access request generates a first control signal transmitted to the second selector 107. If the memory access request is an accelerated memory access request, the first control signal transmitted to the acceleration register unit 104 is generated.
在实际应用的内存控制器接口协议中,访存请求中携带的访存地址会被拆 分为行地址和列地址两部分, 行地址先发送, 间隔预置时长后发送列地址, 并 且列地址发送后固定的节拍数必须返回数据。  In the actual memory controller interface protocol, the memory access address carried in the memory access request is split into two parts: the row address and the column address. The row address is sent first, and the column address is sent after the preset time interval, and the column address is sent. The number of fixed beats after sending must return data.
在图 1的现有技术中, 由于数据接口设置在内存控制器内, 因此, 地址是 不被拆分为行地址和列地址的, 由于没有"固定拍数返回数据"的限制, 其可以 在收到地址后再进行判断和切换数据通路, 这通常需要一拍时间。 而在本发明 实施例中,使用由于使用的是内存控制器的外部接口连接本发明的加速器, 因 此, 需要遵守"列地址发送后固定的节拍数必须返回数据,,的限定; 为了避免延 时, 本发明实施例利用了行地址发送和列地址发送之间的间隔时间,在访存请 求全部发送完成之前,使用行地址判断单元 102通过行地址对访存请求的请求 类型进行判断, 从而节省了因判断而等待的时间, 提高了数据传输的效率。 In the prior art of FIG. 1, since the data interface is set in the memory controller, the address is If it is not split into a row address and a column address, since there is no restriction on "fixed beat return data", it can judge and switch the data path after receiving the address, which usually takes one beat time. In the embodiment of the present invention, since the accelerator of the present invention is connected by using an external interface of the memory controller, it is necessary to comply with the limitation that the number of fixed beats after the column address is sent must be returned, in order to avoid delay. The embodiment of the present invention utilizes the interval between the row address transmission and the column address transmission, and uses the row address determination unit 102 to determine the request type of the memory access request by using the row address before the completion of all the fetch request is completed, thereby saving The time spent waiting for judgment improves the efficiency of data transmission.
所述第一选择器 103 用于根据所述第一控制信号选择所述访存请求的传 输方向; 具体的, 该传输方向为向第二选择器 107发送, 或向加速寄存单元 104发送。 在实际应用中, 第一选择器 103可以同时具备控制器接口 101的解 复用器功能和多路选择器的功能; 对于控制器接口 101的访存请求, 其功能为 解复用器, 根据行地址判断单元 102产生的控制信号, 输出到第二选择器 107 或加速寄存单元 104; 对于总线控制裁决器 106, 第二选择器 107和加速寄存 单元 104等单元返回的数据信息, 其功能为多路选择器,根据总线控制裁决器 106产生的控制信号从其中选择一个输出到控制器接口 101。  The first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104. In practical applications, the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
所述加速寄存单元 104用于存储所述加速访存请求的处理信息; 所述加速引擎 105用于通过向所述加速寄存单元 104调用所述加速访存请 求的处理信息 (具体可以为加速命令), 并根据所述处理信息通过所述第二选 择器 107访问主存储器, 以执行所述加速访存请求的数据处理操作 (具体为, 对该加速访存请求中指示的局部性差的数据单元进行运算处理;);  The acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing ;);
所述总线控制裁决器 106 用于生成并向所述第二选择器发送第二控制信 号; 使得当所述第二选择器 107需要同时处理两种访存请求时, 可以按一定的 规则分配处理次序, 避免不同的访存请求在处理流程中发送冲突。  The bus control arbiter 106 is configured to generate and send a second control signal to the second selector; such that when the second selector 107 needs to simultaneously process two memory access requests, the processing may be allocated according to a certain rule. Order, avoiding different memory access requests sending conflicts in the processing flow.
所述第二选择器 107用于接收第一选择器传输的正常访存请求,加速引擎 传输的加速访存请求以及总线控制裁决器发送的第二控制信号;并根据所述第 二控制信号选择当前访问所述主存储器 30的访存请求; 在实际应用中, 第二 选择器 107也同时具备存储器接口 108的解复用器功能和多路选择器的功能; 对于第一选择器 103上传输的正常访存请求和加速引擎 105上传输的加速访存 请求以及写数据, 其功能为多路选择器,根据总线控制裁决器 106的总线仲裁 结果选择一个输出到存储器接口 108; 对于存储器接口 108上返回的数据, 其 功能为解复用器,根据总线控制裁决器 106的总线仲裁结果,将返回数据输出 到加速引擎 105或第一选择器 103。 The second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; For the normal memory access request transmitted on the first selector 103 and the accelerated memory access request and write data transmitted on the acceleration engine 105, the function is a multiplexer, and an output is selected according to the bus arbitration result of the bus control arbiter 106. The memory interface 108; for the data returned on the memory interface 108, functions as a demultiplexer that outputs the return data to the acceleration engine 105 or the first selector 103 in accordance with the bus arbitration result of the bus control arbiter 106.
所述存储器接口 108用于向所述主存储器 30传输所述访存请求, 以及向 所述第二选择器传输所述访存请求对应的响应数据。  The memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
本发明实施例中的加速器通过控制器接口和存储器接口分别与计算机设 备中的内存控制器和主存储器连接, 当接收到内存控制器传输的访存请求时, 由行地址判断单元判断该访存请求为正常访存请求或加速访存请求, 其中,正 常访存请求对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 若所述访存请求为正常访存请求, 则行地址判断单元 指示第一选择器向所述第二选择器发送该正常访存请求,使得第二选择器直接 将该正常访存请求传输给主存储器进行处理; 若所述访存请求为加速访存请 求,则行地址判断单元指示第一选择器向所述加速寄存单元发送该加速访存请 求,使得加速寄存单元对该加速访存请求中的处理信息进行緩存, 并触发加速 引擎对该加速访存请求对应的局部性较差的数据单元进行处理,从而使得局部 性较好的数据单元能集中在高速緩冲存储器中处理,而加速引擎对局部性较差 的数据单元的处理性能也强于中央处理器,因此提高了计算机的数据单元处理 速度; 并且, 本发明实施例中的加速器可以通过控制器接口和存储器接口分别 与计算机设备中的内存控制器和主存储器连接, 兼容现有计算机的硬件结构, 实现了对现有计算机设备的数据处理能力的升级。  The accelerator in the embodiment of the present invention is respectively connected to the memory controller and the main memory in the computer device through the controller interface and the memory interface. When receiving the memory access request transmitted by the memory controller, the row address determining unit determines the memory access. The request is a normal memory access request or an accelerated memory access request, wherein the normal memory access request corresponds to a locally better data unit, and the accelerated memory access request corresponds to a locally poor data unit; The storage request is a normal memory access request, and the row address determining unit instructs the first selector to send the normal memory access request to the second selector, so that the second selector directly transmits the normal memory access request to the main memory for processing. If the memory access request is an accelerated memory access request, the row address determining unit instructs the first selector to send the accelerated memory access request to the acceleration register unit, so that the acceleration registration unit processes the information in the accelerated memory access request. Caching, and triggering the acceleration engine to perform the localized data unit corresponding to the accelerated memory access request Therefore, the data unit with better locality can be concentrated in the cache processing, and the acceleration engine has better processing performance on the data unit with poor locality than the central processing unit, thereby improving the processing speed of the data unit of the computer. Moreover, the accelerator in the embodiment of the present invention can be respectively connected to the memory controller and the main memory in the computer device through the controller interface and the memory interface, and is compatible with the hardware structure of the existing computer, and realizes data processing on the existing computer device. Upgrade of capabilities.
由于同时存在正常访存请求和力口速访存请求两种访存请求,而同一时刻主 存储器只能处理一个请求, 因此, 总线控制裁决器在实际应用中会遇到多种请 求冲突的情况, 本发明实施例提供了相应的解决方案, 请参阅图 3, 本发明实 施例中加速器的另一个实施例包括:  Since there are two kinds of memory access requests and the normal memory access request, and the main memory can only process one request at the same time, the bus control arbiter will encounter multiple request conflicts in the actual application. The embodiment of the present invention provides a corresponding solution. Referring to FIG. 3, another embodiment of the accelerator in the embodiment of the present invention includes:
控制器接口 101, 行地址判断单元 102, 第一选择器 103, 加速寄存单元 104,加速引擎 105, 总线控制裁决器 106, 第二选择器 107和存储器接口 108。  The controller interface 101, the row address judging unit 102, the first selector 103, the accelerating register unit 104, the acceleration engine 105, the bus control arbiter 106, the second selector 107 and the memory interface 108.
所述控制器接口 101用于接收内存控制器 20传输的访存请求, 具体的, 所述访存请求为中央处理器需要访问主存储器进行读写操作的指令,所述访存 请求中携带有主存储器的访存地址。 The controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20. Specifically, the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access is performed. The request carries the memory address of the main memory.
在本发明实施例中, 访存请求包括: 正常访存请求和加速访存请求; 正常 访存请求为对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 在本发明实施例在生成访问请求之前,会对将要被请 求的数据单元进行数据分析,确定该数据单元局部性的好坏, 具体可以通过设 置一个阔值判定, 当数据单元的局部性大于或等于某一阔值时, 可以确定该数 据单元的局部性较好, 则对应生成正常访存请求; 当数据单元的局部性小于某 一阔值时, 可以确定该数据单元的局部性较差, 则对应生成加速访存请求。  In the embodiment of the present invention, the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison. In the embodiment of the present invention, before the access request is generated, data analysis is performed on the data unit to be requested, and the locality of the data unit is determined. Specifically, the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
所述行地址判断单元 102 用于根据所述访存请求的行地址判断所述访存 请求的请求类型, 生成并向所述第一选择器发送第一控制信号; 若所述访存请 求为正常访存请求, 则生成向所述第二选择器 107传输的第一控制信号; 若所 述访存请求为加速访存请求,则生成向所述加速寄存单元 104传输的第一控制 信号。  The row address determining unit 102 is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector, if the memory access request is The normal access request generates a first control signal transmitted to the second selector 107. If the memory access request is an accelerated memory access request, the first control signal transmitted to the acceleration register unit 104 is generated.
在实际应用的内存控制器接口协议中,访存请求中携带的访存地址会被拆 分为行地址和列地址两部分, 行地址先发送, 间隔预置时长后发送列地址, 并 且列地址发送后固定的节拍数必须返回数据。  In the actual memory controller interface protocol, the memory access address carried in the memory access request is split into two parts: the row address and the column address. The row address is sent first, and the column address is sent after the preset time interval, and the column address is sent. The number of fixed beats after sending must return data.
所述第一选择器 103 用于根据所述第一控制信号选择所述访存请求的传 输方向; 具体的, 该传输方向为向第二选择器 107发送, 或向加速寄存单元 104发送。 在实际应用中, 第一选择器 103可以同时具备控制器接口 101的解 复用器功能和多路选择器的功能; 对于控制器接口 101的访存请求, 其功能为 解复用器, 根据行地址判断单元 102产生的控制信号, 输出到第二选择器 107 或加速寄存单元 104; 对于总线控制裁决器 106, 第二选择器 107和加速寄存 单元 104等单元返回的数据信息, 其功能为多路选择器,根据总线控制裁决器 106产生的控制信号从其中选择一个输出到控制器接口 101。  The first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104. In practical applications, the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
所述加速寄存单元 104用于存储所述加速访存请求的处理信息; 所述加速引擎 105用于通过向所述加速寄存单元 104调用所述加速访存请 求的处理信息 (具体可以为加速命令), 并根据所述处理信息通过所述第二选 择器 107访问主存储器, 以执行所述加速访存请求的数据处理操作 (具体为, 对该加速访存请求中指示的局部性差的数据单元进行运算处理);  The acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing);
所述总线控制裁决器 106 用于生成并向所述第二选择器发送第二控制信 号; 使得当所述第二选择器 107需要同时处理两种访存请求时, 可以按一定的 规则分配处理次序, 避免不同的访存请求在处理流程中发送冲突。 The bus control arbiter 106 is configured to generate and send a second control letter to the second selector When the second selector 107 needs to process two memory access requests simultaneously, the processing order may be allocated according to a certain rule to prevent different memory access requests from transmitting conflicts in the processing flow.
所述第二选择器 107用于接收第一选择器传输的正常访存请求,加速引擎 传输的加速访存请求以及总线控制裁决器发送的第二控制信号;并根据所述第 二控制信号选择当前访问所述主存储器 30的访存请求; 在实际应用中, 第二 选择器 107也同时具备存储器接口 108的解复用器功能和多路选择器的功能; 对于第一选择器 103上传输的正常访存请求和加速引擎 105上传输的加速访存 请求以及写数据, 其功能为多路选择器,根据总线控制裁决器 106的总线仲裁 结果选择一个输出到存储器接口 108; 对于存储器接口 108上返回的数据, 其 功能为解复用器,根据总线控制裁决器 106的总线仲裁结果,将返回数据输出 到加速引擎 105或第一选择器 103。  The second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; for the first selector 103 The normal fetch request and the accelerated fetch request and write data transmitted on the acceleration engine 105 function as a multiplexer that selects an output to the memory interface 108 based on the bus arbitration result of the bus control arbiter 106; for the memory interface 108 The data returned, which functions as a demultiplexer, outputs the return data to the acceleration engine 105 or the first selector 103 in accordance with the bus arbitration result of the bus control arbiter 106.
所述存储器接口 108用于向所述主存储器 30传输所述访存请求, 以及向 所述第二选择器传输所述访存请求对应的响应数据。  The memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
进一步的, 所述加速寄存单元 104包括:  Further, the acceleration registration unit 104 includes:
命令队列 1041, 用于存储所述加速访存请求为加速命令时的命令信息, 所述命令信息包括命令类型、 源操作数或源操作数地址;  a command queue 1041, configured to store command information when the accelerated memory access request is an acceleration command, where the command information includes a command type, a source operand, or a source operand address;
配置寄存器 1042, 用于存储所述加速访存请求为配置请求时的配置信息, 所述配置信息包括主存储器的物理地址到行地址和列地址的映射关系;  The configuration register 1042 is configured to store configuration information when the accelerated memory access request is a configuration request, where the configuration information includes a mapping relationship between a physical address of the main memory and a row address and a column address;
结果寄存器 1043, 用于存储所述加速访存请求的执行状态和响应数据。 在实际应用中,加速访存请求还可以分为加速命令和配置请求, 当为加速 命令时, 第一选择器 103则将该加速命令传输到命令队列 1041; 当为配置请 求时, 第一选择器 103则将该配置请求传输到配置寄存器 1042。 配置请求在 系统初始化的时候发送,用于根据配置信息将加速命令的地址转换成用以访问 主存储器的行地址和列地址。  The result register 1043 is configured to store an execution status and response data of the accelerated memory access request. In an actual application, the accelerated memory access request may be further divided into an acceleration command and a configuration request. When the command is accelerated, the first selector 103 transmits the acceleration command to the command queue 1041; when it is a configuration request, the first selection The device 103 then transmits the configuration request to the configuration register 1042. The configuration request is sent at system initialization to convert the address of the acceleration command to the row address and column address used to access the main memory based on the configuration information.
具体的, 所述第一选择器 101还用于: 当所述结果寄存器 1043返回所述 加速访存请求的响应数据,且所述第二选择器 107返回所述正常访存请求的响 应数据时,根据所述总线控制裁决器 106生成的第二控制信号选择向所述控制 器接口 101传输的响应数据。  Specifically, the first selector 101 is further configured to: when the result register 1043 returns the response data of the accelerated memory access request, and the second selector 107 returns the response data of the normal memory access request The response data transmitted to the controller interface 101 is selected according to the second control signal generated by the bus control arbiter 106.
具体的, 所述第二选择器 107还用于: 当所述存储器接口 108返回所述访 存请求的响应数据时,根据所述响应数据所述对应的访存请求的类型,选择向 所述加速引擎 105或所述第一选择器 103传输所述响应数据。 Specifically, the second selector 107 is further configured to: when the memory interface 108 returns the response data of the memory access request, select, according to the type of the corresponding memory access request of the response data, The acceleration engine 105 or the first selector 103 transmits the response data.
在实际应用中,由于中央处理器的运行速度可能比本发明实施例的加速器 快, 且正常访存请求对应处理的是局部性较好的数据单元, 因此, 当所述第二 选择器需要选择所述正常访存请求和所述加速访存请求中的任意一个访问所 述主存储器时, 会生成优先处理所述正常访存请求的第二控制信号。  In practical applications, since the running speed of the central processing unit may be faster than the accelerator of the embodiment of the present invention, and the normal memory access request correspondingly processes the data unit with better locality, when the second selector needs to be selected When any one of the normal memory access request and the accelerated memory access request accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
当所述存储器接口 108正在处理所述加速访存请求时,所述第二选择器收 到了所述正常访存请求,则判断所述正常访存请求的访问类型,若是写存请求, 则需要拦截该写存请求, 则推迟至所述存储器接口释放后再发送; 若是读存请 求, 则通过所述第一选择器向所述内存控制器发送纠错码 ( ECC, Error Correction Code )错误消息, 使得所述内存控制器 20重新发送一次该读存请 求, 以避免产生系统的逻辑错误。  When the memory interface 108 is processing the accelerated memory access request, the second selector receives the normal memory access request, and determines the access type of the normal memory access request, if it is a write memory request, Intercepting the write request, deferring until the memory interface is released and then transmitting; if the memory request is, sending an error correction code (ECC) error message to the memory controller by using the first selector The memory controller 20 is caused to resend the read request once to avoid generating a logical error of the system.
本发明实施例中的总线控制裁决器可以根据实际应用中的请求处理状态, 为第一选择器和第二选择器生成合理的控制信号,使得数据可以无冲突的进行 传输。  The bus control arbiter in the embodiment of the present invention can generate a reasonable control signal for the first selector and the second selector according to the request processing state in the actual application, so that the data can be transmitted without conflict.
如图 4 所示, 本发明实施例中的加速器同样可以扩展到多通道连接的场 景,加速器的内部结构如图 5所示, 本发明实施例中加速器的另一个实施例包 括:  As shown in FIG. 4, the accelerator in the embodiment of the present invention can also be extended to the scene of the multi-channel connection. The internal structure of the accelerator is as shown in FIG. 5. Another embodiment of the accelerator in the embodiment of the present invention includes:
控制器接口 101, 行地址判断单元 102, 第一选择器 103, 加速寄存单元 104,加速引擎 105, 总线控制裁决器 106, 第二选择器 107和存储器接口 108。  The controller interface 101, the row address judging unit 102, the first selector 103, the accelerating register unit 104, the acceleration engine 105, the bus control arbiter 106, the second selector 107 and the memory interface 108.
所述控制器接口 101用于接收内存控制器 20传输的访存请求, 具体的, 所述访存请求为中央处理器需要访问主存储器进行读写操作的指令,所述访存 请求中携带有主存储器的访存地址。  The controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20. Specifically, the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access request carries The memory address of the main memory.
在本发明实施例中, 访存请求包括: 正常访存请求和加速访存请求; 正常 访存请求为对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 在本发明实施例在生成访问请求之前,会对将要被请 求的数据单元进行数据分析,确定该数据单元局部性的好坏, 具体可以通过设 置一个阔值判定, 当数据单元的局部性大于或等于某一阔值时, 可以确定该数 据单元的局部性较好, 则对应生成正常访存请求; 当数据单元的局部性小于某 一阔值时, 可以确定该数据单元的局部性较差, 则对应生成加速访存请求。  In the embodiment of the present invention, the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison. In the embodiment of the present invention, before the access request is generated, data analysis is performed on the data unit to be requested, and the locality of the data unit is determined. Specifically, the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
所述行地址判断单元 102 用于根据所述访存请求的行地址判断所述访存 请求的请求类型, 生成并向所述第一选择器发送第一控制信号; 若所述访存请 求为正常访存请求, 则生成向所述第二选择器 107传输的第一控制信号; 若所 述访存请求为加速访存请求,则生成向所述加速寄存单元 104传输的第一控制 信号。 The row address determining unit 102 is configured to determine the memory access according to the row address of the memory access request Generating a request and sending a first control signal to the first selector; if the memory access request is a normal memory access request, generating a first control signal transmitted to the second selector 107; The memory access request is an accelerated memory access request, and a first control signal transmitted to the acceleration register unit 104 is generated.
在实际应用的内存控制器接口协议中,访存请求中携带的访存地址会被拆 分为行地址和列地址两部分, 行地址先发送, 间隔预置时长后发送列地址, 并 且列地址发送后固定的节拍数必须返回数据。  In the actual memory controller interface protocol, the memory access address carried in the memory access request is split into two parts: the row address and the column address. The row address is sent first, and the column address is sent after the preset time interval, and the column address is sent. The number of fixed beats after sending must return data.
所述第一选择器 103 用于根据所述第一控制信号选择所述访存请求的传 输方向; 具体的, 该传输方向为向第二选择器 107发送, 或向加速寄存单元 104发送。 在实际应用中, 第一选择器 103可以同时具备控制器接口 101的解 复用器功能和多路选择器的功能; 对于控制器接口 101的访存请求, 其功能为 解复用器, 根据行地址判断单元 102产生的控制信号, 输出到第二选择器 107 或加速寄存单元 104; 对于总线控制裁决器 106, 第二选择器 107和加速寄存 单元 104等单元返回的数据信息, 其功能为多路选择器,根据总线控制裁决器 106产生的控制信号从其中选择一个输出到控制器接口 101。  The first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104. In practical applications, the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
所述加速寄存单元 104用于存储所述加速访存请求的处理信息; 所述加速引擎 105用于通过向所述加速寄存单元 104调用所述加速访存请 求的处理信息 (具体可以为加速命令), 并根据所述处理信息通过所述第二选 择器 107访问主存储器, 以执行所述加速访存请求的数据处理操作 (具体为, 对该加速访存请求中指示的局部性差的数据单元进行运算处理;);  The acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing ;);
所述总线控制裁决器 106 用于生成并向所述第二选择器发送第二控制信 号; 使得当所述第二选择器 107需要同时处理两种访存请求时, 可以按一定的 规则分配处理次序, 避免不同的访存请求在处理流程中发送冲突。  The bus control arbiter 106 is configured to generate and send a second control signal to the second selector; such that when the second selector 107 needs to simultaneously process two memory access requests, the processing may be allocated according to a certain rule. Order, avoiding different memory access requests sending conflicts in the processing flow.
所述第二选择器 107用于接收第一选择器传输的正常访存请求,加速引擎 传输的加速访存请求以及总线控制裁决器发送的第二控制信号;并根据所述第 二控制信号选择当前访问所述主存储器 30的访存请求; 在实际应用中, 第二 选择器 107也同时具备存储器接口 108的解复用器功能和多路选择器的功能; 对于第一选择器 103上传输的正常访存请求和加速引擎 105上传输的加速访存 请求以及写数据, 其功能为多路选择器,根据总线控制裁决器 106的总线仲裁 结果选择一个输出到存储器接口 108; 对于存储器接口 108上返回的数据, 其 功能为解复用器,根据总线控制裁决器 106的总线仲裁结果,将返回数据输出 到加速引擎 105或第一选择器 103。 The second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; for the first selector 103 The normal fetch request and the accelerated fetch request and write data transmitted on the acceleration engine 105 function as a multiplexer that selects an output to the memory interface 108 based on the bus arbitration result of the bus control arbiter 106; for the memory interface 108 The data returned on it, The function is a demultiplexer that outputs the return data to the acceleration engine 105 or the first selector 103 in accordance with the bus arbitration result of the bus control arbiter 106.
所述存储器接口 108用于向所述主存储器 30传输所述访存请求, 以及向 所述第二选择器传输所述访存请求对应的响应数据。  The memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
进一步的,所述加速寄存单元 104包括:命令队列 1041,配置寄存器 1042 和结果寄存器 1043。  Further, the acceleration register unit 104 includes a command queue 1041, a configuration register 1042, and a result register 1043.
再进一步的, 所述加速器 10还可以包括:  Further, the accelerator 10 may further include:
路由模块 109, 用于将所述加速访存请求传输到对应的主存储器 30中, 所述路由模块分别与所述加速引擎和另一个加速器相连接,当所述加速访存请 求所需要的数据不在本地加速器所连接的主存储器时,所述加速引擎将所述加 速访存请求传输到所述路由模块,所述路由模块再将所述加速访存请求传输到 另一个加速器,使得所述另一个加速器根据所述加速访存请求对所述另一个加 速器连接的主存储器进行数据访问。  The routing module 109 is configured to transmit the accelerated memory access request to the corresponding main memory 30, where the routing module is respectively connected to the acceleration engine and another accelerator, and the data required for the accelerated memory access request When not in the main memory to which the local accelerator is connected, the acceleration engine transmits the accelerated memory access request to the routing module, and the routing module transmits the accelerated memory access request to another accelerator, so that the other An accelerator performs data access to the main memory connected to the other accelerator according to the accelerated memory access request.
具体的, 所述路由模块 108还可以与所述第二选择器相连接,使得所述路 由模块将接收到的另一个加速器发送的加速访存请求,并通过所述第二选择器 传输至本地加速器所连接的主存储器中。  Specifically, the routing module 108 may be further connected to the second selector, so that the routing module will receive the accelerated access request sent by another accelerator and transmit it to the local device through the second selector. The main memory to which the accelerator is connected.
在实际应用中, 利用该路由器 108, 多个加速器之间还可以组织成各种拓 朴结构, 如环形、 胖树等结构。  In practical applications, by using the router 108, multiple accelerators can be organized into various topology structures, such as a ring, a fat tree, and the like.
下面对上述本发明实施例中加速器对应的数据处理方法进行描述,请参阅 图 6, 本发明实施例中数据处理方法的一个实施例包括:  The following describes the data processing method corresponding to the accelerator in the foregoing embodiment of the present invention. Referring to FIG. 6, an embodiment of the data processing method in the embodiment of the present invention includes:
601、 加速器接收内存控制器传输的访存请求;  601. The accelerator receives a memory access request transmitted by the memory controller.
加速器接收内存控制器传输的访存请求, 所述访存请求包括: 正常访存请 求和力口速访存请求。  The accelerator receives the memory access request transmitted by the memory controller, and the memory access request includes: a normal memory access request and a power port speed access request.
在本发明实施例中, 访存请求包括: 正常访存请求和加速访存请求; 正常 访存请求为对应请求的是局部性较好的数据单元,加速访存请求对应请求的是 局部性较差的数据单元; 在本发明实施例在生成访问请求之前,会对将要被请 求的数据单元进行数据分析,确定该数据单元局部性的好坏, 具体可以通过设 置一个阔值判定, 当数据单元的局部性大于或等于某一阔值时, 可以确定该数 据单元的局部性较好, 则对应生成正常访存请求; 当数据单元的局部性小于某 一阔值时, 可以确定该数据单元的局部性较差, 则对应生成加速访存请求。 602、 加速器根据所述访存请求的行地址判断所述访存请求的请求类型; 加速器根据所述访存请求的行地址判断所述访存请求的请求类型;若访存 请求为加速访存请求, 则对所述加速访存请求进行緩存, 并在所述加速器内对 所述加速访存请求进行处理; 若所述正常访存请求, 则将所述正常访存请求传 输至主存储器进行处理。 In the embodiment of the present invention, the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison. In the embodiment of the present invention, before the access request is generated, data analysis is performed on the data unit to be requested, and the locality of the data unit is determined. Specifically, the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly. 602. The accelerator determines, according to the row address of the memory access request, the request type of the memory access request. The accelerator determines, according to the row address of the memory access request, the request type of the memory access request; if the memory access request is an accelerated memory access Requesting, buffering the accelerated memory access request, and processing the accelerated memory access request in the accelerator; if the normal memory access request, transmitting the normal memory access request to the main memory deal with.
603、 加速器选择访问主存储器的访存请求。  603. The accelerator selects a memory access request for accessing the main memory.
当所述加速器需要选择所述正常访存请求和所述加速访存请求中的任意 一个访问所述主存储器时, 优先选择所述正常访存请求。  When the accelerator needs to select any one of the normal memory access request and the accelerated memory access request to access the main memory, the normal memory access request is preferentially selected.
在实际应用中,由于中央处理器的运行速度可能比本发明实施例的加速器 快, 且正常访存请求对应处理的是局部性较好的数据单元, 因此, 当所述第二 选择器需要选择所述正常访存请求和所述加速访存请求中的任意一个访问所 述主存储器时, 会生成优先处理所述正常访存请求的第二控制信号。  In practical applications, since the running speed of the central processing unit may be faster than the accelerator of the embodiment of the present invention, and the normal memory access request correspondingly processes the data unit with better locality, when the second selector needs to be selected When any one of the normal memory access request and the accelerated memory access request accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
当所述加速器的存储器接口正在处理所述加速访存请求时,所述加速器的 第二选择器收到了所述正常访存请求, 则判断所述正常访存请求的访问类型, 若是写存请求,则将所述正常访存请求推迟至所述加速器的存储器接口释放后 再发送至所述主存储器;若是读存请求,则向所述内存控制器发送糾错码 ECC 错误消息。  When the accelerator interface of the accelerator is processing the accelerated memory access request, the second selector of the accelerator receives the normal memory access request, and determines the access type of the normal memory access request, if the memory access request is And delaying the normal memory access request until the memory interface of the accelerator is released, and then sending the error to the memory controller to the memory controller; if the memory request is read, sending an error correction code ECC error message to the memory controller.
当加速器的存储器接口正在处理所述加速访存请求时,加速器的第二选择 器收到了所述正常访存请求, 则判断所述正常访存请求的访问类型, 若是写存 请求, 则需要拦截该写存请求, 则推迟至所述存储器接口释放后再发送; 若是 读存请求, 则通过加速器的第一选择器向所述内存控制器发送纠错码 ( ECC, Error Correction Code )错误消息,使得所述内存控制器重新发送一次该读存请 求, 以避免产生系统的逻辑错误。  When the accelerator interface of the accelerator is processing the accelerated memory access request, the second selector of the accelerator receives the normal memory access request, and then determines the access type of the normal memory access request, and if it is a write memory request, it needs to intercept The write request is postponed until the memory interface is released and then sent; if the memory request is read, the error correcting code (ECC, Error Correction Code) error message is sent to the memory controller through the first selector of the accelerator. The memory controller is caused to resend the read request once to avoid generating a system logic error.
在本申请所提供的几个实施例中,应该理解到, 所揭露的装置和方法可以 通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如, 所述单元的划分,仅仅为一种逻辑功能划分, 实际实现时可以有另外的划分方 式, 例如多个单元或组件可以结合或者可以集成到另一个系统, 或一些特征可 以忽略, 或不执行。 另一点, 所显示或讨论的相互之间的耦合或直接辆合或通 信连接可以是通过一些接口,装置或单元的间接辆合或通信连接,可以是电性, 机械或其它的形式。 单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者 也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部 单元来实现本实施例方案的目的。 In the several embodiments provided herein, it should be understood that the disclosed apparatus and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. Alternatively, the mutual coupling or direct engagement or communication connection shown or discussed may be an indirect engagement or communication connection through some interface, device or unit, and may be in electrical, mechanical or other form. The components displayed by the unit may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元 中。上述集成的单元既可以釆用硬件的形式实现,也可以釆用软件功能单元的 形式实现。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售 或使用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发 明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全 部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储 介质中, 包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前述 的存储介质包括: U盘、 移动硬盘、 只读存储器( ROM, Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可以 存储程序代码的介质。  The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 所述以权利要求的保护范围为准。  The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权 利 要 求 Rights request
1、 一种加速器, 其特征在于, 包括: 1. An accelerator, characterized by including:
控制器接口, 行地址判断单元, 第一选择器, 加速寄存单元, 加速引擎, 总线控制裁决器, 第二选择器和存储器接口; Controller interface, row address judgment unit, first selector, acceleration register unit, acceleration engine, bus control arbiter, second selector and memory interface;
所述控制器接口用于接收内存控制器传输的访存请求, 所述访存请求包 括: 正常访存请求和力口速访存请求; The controller interface is used to receive memory access requests transmitted by the memory controller. The memory access requests include: normal memory access requests and Likou speed memory access requests;
所述行地址判断单元用于根据所述访存请求的行地址判断所述访存请求 的请求类型, 生成并向所述第一选择器发送第一控制信号; 若所述访存请求为 正常访存请求, 则生成向所述第二选择器传输的第一控制信号; 若所述访存请 求为加速访存请求, 则生成向所述加速寄存单元传输的第一控制信号; The row address determination unit is configured to determine the request type of the memory access request according to the row address of the memory access request, generate and send a first control signal to the first selector; if the memory access request is normal If the memory access request is an accelerated memory access request, a first control signal transmitted to the second selector is generated; if the memory access request is an accelerated memory access request, a first control signal transmitted to the accelerated register unit is generated;
所述第一选择器用于根据所述第一控制信号选择所述访存请求的传输方 向; The first selector is used to select the transmission direction of the memory access request according to the first control signal;
所述加速寄存单元用于存储所述加速访存请求的处理信息; The acceleration register unit is used to store the processing information of the accelerated memory access request;
所述加速引擎用于通过向所述加速寄存单元调用所述加速访存请求的处 理信息, 并根据所述处理信息通过所述第二选择器访问主存储器, 以执行所述 加速访存请求的数据处理操作; The acceleration engine is configured to call the processing information of the accelerated memory access request to the acceleration register unit, and access the main memory through the second selector according to the processing information to execute the accelerated memory access request. data processing operations;
所述总线控制裁决器用于生成并向所述第二选择器发送第二控制信号; 所述第二选择器用于接收第一选择器传输的正常访存请求,加速引擎传输 的加速访存请求以及总线控制裁决器发送的第二控制信号;并根据所述第二控 制信号选择当前访问所述主存储器的访存请求; The bus control arbiter is used to generate and send a second control signal to the second selector; the second selector is used to receive a normal memory access request transmitted by the first selector, an accelerated memory access request transmitted by the acceleration engine, and A second control signal sent by the bus control arbiter; and selecting a memory access request currently accessing the main memory according to the second control signal;
所述存储器接口用于向所述主存储器传输所述访存请求,以及向所述第二 选择器传输所述访存请求对应的响应数据。 The memory interface is used to transmit the memory access request to the main memory, and transmit response data corresponding to the memory access request to the second selector.
2、 根据权利要求 1所述的加速器, 其特征在于, 所述总线控制裁决器具 体用于:当所述第二选择器需要选择所述正常访存请求和所述加速访存请求中 的任意一个访问所述主存储器时,生成优先处理所述正常访存请求的第二控制 信号。 2. The accelerator according to claim 1, wherein the bus control arbiter is specifically used: when the second selector needs to select any of the normal memory access request and the accelerated memory access request. When one accesses the main memory, a second control signal is generated to prioritize processing of the normal memory access request.
3、 根据权利要求 2所述的加速器, 其特征在于, 所述总线控制裁决器具 体还用于: 当所述存储器接口正在处理所述加速访存请求时, 所述第二选择器 收到了所述正常访存请求, 则判断所述正常访存请求的访问类型, 若是写存请 求, 则推迟至所述存储器接口释放后再发送; 若是读存请求, 则通过所述第一 选择器向所述内存控制器发送纠错码 ECC错误消息。 3. The accelerator according to claim 2, wherein the bus control arbiter is further configured to: when the memory interface is processing the accelerated memory access request, the second selector receives all If it is a normal memory access request, the access type of the normal memory access request is determined. If it is a write memory request, it is postponed until the memory interface is released before sending; if it is a read memory request, it is sent through the first The selector sends an error correction code (ECC) error message to the memory controller.
4、根据权利要求 1所述的加速器, 其特征在于, 所述加速寄存单元包括: 命令队列, 用于存储所述加速访存请求为加速命令时的命令信息, 所述命 令信息包括命令类型、 源操作数或源操作数地址; 4. The accelerator according to claim 1, characterized in that, the acceleration registration unit includes: a command queue, used to store command information when the acceleration memory access request is an acceleration command, the command information includes command type, Source operand or source operand address;
配置寄存器, 用于存储所述加速访存请求为配置请求时的配置信息, 所述 配置信息包括主存储器的物理地址到行地址和列地址的映射关系; Configuration register, used to store configuration information when the accelerated memory access request is a configuration request. The configuration information includes the mapping relationship between the physical address of the main memory and the row address and column address;
结果寄存器, 用于存储所述加速访存请求的执行状态和响应数据。 The result register is used to store the execution status and response data of the accelerated memory access request.
5、根据权利要求 4所述的加速器, 其特征在于, 所述第一选择器还用于: 当所述结果寄存器返回所述加速访存请求的响应数据,且所述第二选择器返回 所述正常访存请求的响应数据时,根据所述总线控制裁决器生成的第二控制信 号选择向所述控制器接口传输的响应数据。 5. The accelerator according to claim 4, characterized in that, the first selector is also used to: when the result register returns the response data of the accelerated memory access request, and the second selector returns the When responding to the normal memory access request, the response data to be transmitted to the controller interface is selected according to the second control signal generated by the bus control arbiter.
6、根据权利要求 4所述的加速器, 其特征在于, 所述第二选择器还用于: 当所述存储器接口返回所述访存请求的响应数据时,根据所述响应数据所 述对应的访存请求的请求类型,选择向所述加速引擎或所述第一选择器传输所 述响应数据。 6. The accelerator according to claim 4, characterized in that, the second selector is further configured to: when the memory interface returns the response data of the memory access request, according to the response data, the corresponding The request type of the memory access request selects to transmit the response data to the acceleration engine or the first selector.
7、 根据权利要求 1所述的加速器, 其特征在于, 所述加速器还包括: 路由模块, 用于将所述加速访存请求传输到对应的主存储器中, 所述路由 模块分别与所述加速引擎和另一个加速器相连接,当所述加速访存请求所需要 的数据不在本地加速器所连接的主存储器时,所述加速引擎将所述加速访存请 求传输到所述路由模块,所述路由模块再将所述加速访存请求传输到另一个加 速器,使得所述另一个加速器根据所述加速访存请求对所述另一个加速器连接 的主存储器进行数据访问。 7. The accelerator according to claim 1, characterized in that, the accelerator further includes: a routing module, used to transmit the accelerated memory access request to the corresponding main memory, the routing module is connected to the acceleration module respectively. The engine is connected to another accelerator. When the data required for the accelerated memory access request is not in the main memory connected to the local accelerator, the acceleration engine transmits the accelerated memory access request to the routing module. The routing module The module then transmits the accelerated memory access request to another accelerator, so that the other accelerator performs data access to the main memory connected to the other accelerator according to the accelerated memory access request.
8、 根据权利要求 7所述的加速器, 其特征在于, 所述路由模块与所述第 二选择器相连接,使得所述路由模块将接收到的另一个加速器发送的加速访存 请求, 并通过所述第二选择器传输至本地加速器所连接的主存储器中。 8. The accelerator according to claim 7, wherein the routing module is connected to the second selector, so that the routing module receives an acceleration memory access request sent by another accelerator and passes it through The second selector is transferred to the main memory connected to the local accelerator.
9、 一种数据处理方法, 其特征在于, 包括: 9. A data processing method, characterized by including:
加速器接收内存控制器传输的访存请求, 所述访存请求包括: 正常访存请 求和力口速访存请求; The accelerator receives memory access requests transmitted by the memory controller. The memory access requests include: normal memory access requests and speed memory access requests;
所述加速器根据所述访存请求的行地址判断所述访存请求的请求类型;若 访存请求为力口速访存请求, 则对所述力口速访存请求进行緩存, 并在所述力口速器 内对所述加速访存请求进行处理; 若所述正常访存请求, 则将所述正常访存请 求传输至主存储器进行处理。 The accelerator determines the request type of the memory access request based on the row address of the memory access request; if the memory access request is a Likousu memory access request, the Likousu memory access request is cached and stored there. oral speed device The accelerated memory access request is processed within the processor; if the normal memory access request is received, the normal memory access request is transferred to the main memory for processing.
10、 根据权利要求 9所述的加速器, 其特征在于, 所述方法还包括: 当所述加速器需要选择所述正常访存请求和所述加速访存请求中的任意 一个访问所述主存储器时, 优先选择所述正常访存请求。 10. The accelerator according to claim 9, wherein the method further includes: when the accelerator needs to select any one of the normal memory access request and the accelerated memory access request to access the main memory. , giving priority to the normal memory access request.
11、 根据权利要求 9所述的加速器, 其特征在于, 所述方法还包括: 当所述加速器的存储器接口正在处理所述加速访存请求时,所述加速器的第二 选择器收到了所述正常访存请求, 则判断所述正常访存请求的访问类型, 若是 写存请求,则将所述正常访存请求推迟至所述加速器的存储器接口释放后再发 送至所述主存储器; 若是读存请求, 则向所述内存控制器发送纠错码 ECC错 误消息。 11. The accelerator according to claim 9, wherein the method further includes: when the memory interface of the accelerator is processing the accelerated memory access request, the second selector of the accelerator receives the For a normal memory access request, determine the access type of the normal memory access request. If it is a write memory request, defer the normal memory access request until the memory interface of the accelerator is released and then send it to the main memory; if it is a read request, If a memory request is received, an error correction code (ECC) error message is sent to the memory controller.
PCT/CN2014/080162 2013-06-28 2014-06-18 Accelerator and data processing method WO2014206229A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310269782.4A CN104252416B (en) 2013-06-28 2013-06-28 A kind of accelerator and data processing method
CN201310269782.4 2013-06-28

Publications (1)

Publication Number Publication Date
WO2014206229A1 true WO2014206229A1 (en) 2014-12-31

Family

ID=52141035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080162 WO2014206229A1 (en) 2013-06-28 2014-06-18 Accelerator and data processing method

Country Status (2)

Country Link
CN (1) CN104252416B (en)
WO (1) WO2014206229A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017065379A1 (en) * 2015-10-16 2017-04-20 삼성전자 주식회사 Method and apparatus for processing instructions using processing-in-memory
CN109308280B (en) * 2017-07-26 2021-05-18 华为技术有限公司 Data processing method and related equipment
CN109756390B (en) * 2018-12-06 2020-12-01 网易(杭州)网络有限公司 Method and device for automatically testing connectivity of network accelerator
CN110018839B (en) * 2019-03-27 2021-04-13 联想(北京)有限公司 Hardware accelerator multiplexing method and hardware accelerator
CN114328311A (en) * 2021-12-15 2022-04-12 珠海一微半导体股份有限公司 Storage controller architecture, data processing circuit and data processing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221538A (en) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 System and method for implementing fast data search in caching
CN101290610A (en) * 2008-06-03 2008-10-22 浙江大学 Embedded heterogeneous chip multiprocessor on-chip communications interconnecting organization level accomplishing method
US20110307647A1 (en) * 2010-06-11 2011-12-15 California Institute Of Technology Systems and methods for rapid processing and storage of data
CN103345429A (en) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221538A (en) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 System and method for implementing fast data search in caching
CN101290610A (en) * 2008-06-03 2008-10-22 浙江大学 Embedded heterogeneous chip multiprocessor on-chip communications interconnecting organization level accomplishing method
US20110307647A1 (en) * 2010-06-11 2011-12-15 California Institute Of Technology Systems and methods for rapid processing and storage of data
CN103345429A (en) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU

Also Published As

Publication number Publication date
CN104252416B (en) 2017-09-05
CN104252416A (en) 2014-12-31

Similar Documents

Publication Publication Date Title
EP3796179A1 (en) System, apparatus and method for processing remote direct memory access operations with a device-attached memory
US11755203B2 (en) Multicore shared cache operation engine
US9760386B2 (en) Accelerator functionality management in a coherent computing system
WO2018076793A1 (en) Nvme device, and methods for reading and writing nvme data
US9003082B2 (en) Information processing apparatus, arithmetic device, and information transferring method
US7555597B2 (en) Direct cache access in multiple core processors
US7600077B2 (en) Cache circuitry, data processing apparatus and method for handling write access requests
WO2015078219A1 (en) Information caching method and apparatus, and communication device
EP2630579B1 (en) Unified i/o adapter
WO2014206229A1 (en) Accelerator and data processing method
US9256555B2 (en) Method and system for queue descriptor cache management for a host channel adapter
US7975090B2 (en) Method for efficient I/O controller processor interconnect coupling supporting push-pull DMA read operations
US11960945B2 (en) Message passing circuitry and method
WO2013185660A1 (en) Instruction storage device of network processor and instruction storage method for same
US20230153153A1 (en) Task processing method and apparatus
US8850159B2 (en) Method and system for latency optimized ATS usage
US11275707B2 (en) Multi-core processor and inter-core data forwarding method
JP3873589B2 (en) Processor system
US20080109639A1 (en) Execution of instructions within a data processing apparatus having a plurality of processing units
US11960727B1 (en) System and method for large memory transaction (LMT) stores
EP4339776A1 (en) Task scheduling method, system, and hardware task scheduler

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14818028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14818028

Country of ref document: EP

Kind code of ref document: A1