WO2014206229A1 - Accélérateur et procédé de traitement de données - Google Patents

Accélérateur et procédé de traitement de données Download PDF

Info

Publication number
WO2014206229A1
WO2014206229A1 PCT/CN2014/080162 CN2014080162W WO2014206229A1 WO 2014206229 A1 WO2014206229 A1 WO 2014206229A1 CN 2014080162 W CN2014080162 W CN 2014080162W WO 2014206229 A1 WO2014206229 A1 WO 2014206229A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory access
access request
memory
selector
accelerator
Prior art date
Application number
PCT/CN2014/080162
Other languages
English (en)
Chinese (zh)
Inventor
崔泽汉
陈明宇
刘垚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014206229A1 publication Critical patent/WO2014206229A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Definitions

  • the present invention relates to the field of computer data processing, and more particularly to an accelerator and a data processing method.
  • the access speed of the main memory is always much slower than that of the central processing unit, so that the high-speed processing capability of the central processing unit cannot be fully utilized, and a lot of time is wasted waiting for data to be returned. The efficiency of computer systems is affected.
  • cache cache level
  • the size of the cache is only a few thousandth of the main memory, but it is much faster than the main memory.
  • time locality a unit of the main memory being used is highly likely to be accessed again in the future (time locality), and the possibility that those units in the vicinity are used is also large (spatial locality).
  • the central processor accesses a unit of the main memory
  • the computer hardware automatically transfers the set of unit contents including the unit into the cache memory, and the main memory unit to be accessed by the central processing unit is very It may be in the set of cells that have just been loaded into the cache.
  • the central processor can access the cache. Throughout the process, if most of the central processor's access to the main memory can be replaced by access to the cache, the processing speed of the computer can be significantly improved.
  • the cache can significantly improve performance, its capacity is limited. If the unit to be accessed by the central processor is not in the cache, the main memory with high access latency is still required, which is still a key factor limiting performance. . At the same time, if a locally poor data unit is fetched into the cache, some locally good data units may be replaced by the cache, the so-called cache pollution, which causes the central The processor needs to constantly access the main memory with low processing speed, which affects the overall operating efficiency of the computer.
  • an accelerator is added to the memory controller to make the data with poor locality. Unit operations are performed in the accelerator. Since there is no need to fetch the data unit to the central processor and process it directly in the memory controller closer to the main memory, a portion of the memory access latency is saved. At the same time, since these data units are relatively poor in local operation, they are not taken into the cache memory and do not have a performance impact on subsequent operations of the central processing unit.
  • Embodiments of the present invention provide an accelerator and a data processing method for upgrading an existing computer device to improve data processing efficiency of the computer device.
  • Controller interface row address judging unit, first selector, accelerating register unit, acceleration engine, bus control arbiter, second selector and memory interface;
  • the controller interface is configured to receive a memory access request transmitted by the memory controller, where the memory access request includes: a normal memory access request and a power port speed access request;
  • the row address determining unit is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector; if the memory access request is normal And acquiring, by the memory access request, a first control signal transmitted to the second selector; if the memory access request is an accelerated memory access request, generating a first control signal transmitted to the acceleration register unit;
  • the first selector is configured to select a transmission direction of the memory access request according to the first control signal
  • the acceleration registration unit is configured to store processing information of the accelerated memory access request
  • the acceleration engine is configured to execute the accelerated memory access request by invoking processing information of the accelerated memory access request to the acceleration registration unit, and accessing a main memory through the second selector according to the processing information.
  • Data processing operation
  • the bus control arbiter is configured to generate and send a second control signal to the second selector; the second selector is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and The bus controls a second control signal sent by the arbiter; and selects a current access request to access the main memory according to the second control signal;
  • the memory interface is configured to transmit the memory access request to the main memory, and transmit response data corresponding to the memory access request to the second selector.
  • the bus control arbiter is specifically configured to: when the second selector needs to select the normal memory access request and the When any one of the accelerated memory access requests accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
  • the bus control arbiter is further configured to: when the memory interface is processing the accelerated memory access request, If the second selector receives the normal memory access request, it determines the access type of the normal memory access request, and if it is a write request, defers to the memory interface and then sends the message; if it is a read request, it passes The first selector transmits an error correction code ECC error message to the memory controller.
  • the acceleration registering unit includes:
  • a command queue configured to store command information when the accelerated memory access request is an acceleration command, where the command information includes a command type, a source operand, or a source operand address;
  • a configuration register configured to store configuration information when the accelerated memory access request is a configuration request, where the configuration information includes a mapping relationship between a physical address of the main memory and a row address and a column address;
  • a result register configured to store an execution state and response data of the accelerated memory access request.
  • the first selector is further configured to: when the result register returns response data of the accelerated memory access request, and When the second selector returns the response data of the normal memory access request, the response data transmitted to the controller interface is selected according to the second control signal generated by the bus control arbiter.
  • the second selector is further configured to:
  • the accelerator further includes: a routing module, configured to transmit the accelerated memory access request to a corresponding main memory, where the routing module is respectively connected to the acceleration engine and another accelerator, where data required for the accelerated memory access request is not local
  • the acceleration engine transmits the accelerated memory access request to the routing module when the accelerator is connected to the main memory
  • the routing module transmits the accelerated memory access request to another accelerator, so that the other accelerator Data access is performed to the main memory connected to the other accelerator according to the accelerated memory access request.
  • the routing module is connected to the second selector, so that the routing module sends another accelerator that is received.
  • the accelerated fetch request is transmitted to the main memory to which the local accelerator is connected through the second selector.
  • the accelerator receives the memory access request transmitted by the memory controller, where the memory access request includes: a normal memory access request and a power port speed access request;
  • the accelerator Determining, by the accelerator, the request type of the memory access request according to the row address of the memory access request; if the memory access request is a power port speed access memory request, buffering the power port speed access memory request, and The accelerated memory access request is processed in the power porter; if the normal memory access request is received, the normal memory access request is transmitted to the main memory for processing.
  • the method further includes:
  • the normal memory access request is preferentially selected.
  • the method further includes:
  • the second selector of the accelerator receives the normal memory access request, and determines the access type of the normal memory access request, if the memory access request is And delaying the normal memory access request until the memory interface of the accelerator is released, and then sending the error to the memory controller to the memory controller; if the memory request is read, sending an error correction code ECC error message to the memory controller.
  • the embodiments of the present invention have the following advantages:
  • the accelerator in the embodiment of the present invention is respectively configured with a computer through a controller interface and a memory interface.
  • the memory controller and the main memory connection in the standby device when receiving the memory access request transmitted by the memory controller, the row address determining unit determines that the memory access request is a normal memory access request or an accelerated memory access request, wherein the normal memory access The requesting corresponding request is a locally better data unit, and the accelerated access request corresponding to the request is a locally poor data unit; if the memory access request is a normal memory access request, the row address determining unit indicates the first selection Transmitting the normal memory access request to the second selector, so that the second selector directly transfers the normal memory access request to the main memory for processing; if the memory access request is an accelerated memory access request, the row address is determined.
  • the unit instructs the first selector to send the accelerated memory access request to the acceleration register unit, so that the acceleration registration unit caches the processing information in the accelerated memory access request, and triggers the localization of the acceleration engine corresponding to the accelerated memory access request.
  • Poor data units are processed so that locally better data units can be concentrated in the cache, and The processing performance of the data unit of the poorly localized data unit is also stronger than that of the central processing unit, thereby improving the processing speed of the data unit of the computer; and the accelerator in the embodiment of the present invention can be respectively connected to the computer equipment through the controller interface and the memory interface.
  • the memory controller and the main memory connection are compatible with the hardware structure of the existing computer, and the data processing capability of the existing computer equipment is upgraded.
  • FIG. 1 is a schematic structural view of a computer in the prior art
  • FIG. 2 is a schematic structural view of an accelerator in an embodiment of the present invention.
  • FIG 3 is another schematic structural view of an accelerator in an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a computer in an embodiment of the present invention.
  • FIG. 5 is another schematic structural diagram of an accelerator in an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of a data processing method in an embodiment of the present invention.
  • Embodiments of the present invention provide an accelerator and a data processing method for upgrading an existing computer device to improve data processing efficiency of the computer device.
  • an embodiment of the accelerator in the embodiment of the present invention includes:
  • Controller interface 101 row address determining unit 102, first selector 103, acceleration register unit 104, acceleration engine 105, bus control arbiter 106, second selector 107 and memory interface 108; 2 is shown.
  • the controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20.
  • the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access request carries The memory address of the main memory.
  • the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison.
  • data analysis is performed on the data unit to be requested, and the locality of the data unit is determined.
  • the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
  • a locally better data unit refers to a data unit (time locality) that is more likely to be accessed again, or if a data unit is accessed once, the data unit stores a location attachment. The probability that other data units are also accessed is greater, and the locality of the data unit is better (spatial locality).
  • the row address determining unit 102 is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector, if the memory access request is The normal access request generates a first control signal transmitted to the second selector 107. If the memory access request is an accelerated memory access request, the first control signal transmitted to the acceleration register unit 104 is generated.
  • the memory access address carried in the memory access request is split into two parts: the row address and the column address.
  • the row address is sent first, and the column address is sent after the preset time interval, and the column address is sent.
  • the number of fixed beats after sending must return data.
  • the address is If it is not split into a row address and a column address, since there is no restriction on "fixed beat return data", it can judge and switch the data path after receiving the address, which usually takes one beat time.
  • the accelerator of the present invention since the accelerator of the present invention is connected by using an external interface of the memory controller, it is necessary to comply with the limitation that the number of fixed beats after the column address is sent must be returned, in order to avoid delay.
  • the embodiment of the present invention utilizes the interval between the row address transmission and the column address transmission, and uses the row address determination unit 102 to determine the request type of the memory access request by using the row address before the completion of all the fetch request is completed, thereby saving The time spent waiting for judgment improves the efficiency of data transmission.
  • the first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104.
  • the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
  • the acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing ;);
  • the bus control arbiter 106 is configured to generate and send a second control signal to the second selector; such that when the second selector 107 needs to simultaneously process two memory access requests, the processing may be allocated according to a certain rule. Order, avoiding different memory access requests sending conflicts in the processing flow.
  • the second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; For the normal memory access request transmitted on the first selector 103 and the accelerated memory access request and write data transmitted on the acceleration engine 105, the function is a multiplexer, and an output is selected according to the bus arbitration result of the bus control arbiter 106.
  • the memory interface 108 for the data returned on the memory interface 108, functions as a demultiplexer that outputs the return data to the acceleration engine 105 or the first selector 103 in accordance with the bus arbitration result of the bus control arbiter 106.
  • the memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
  • the accelerator in the embodiment of the present invention is respectively connected to the memory controller and the main memory in the computer device through the controller interface and the memory interface.
  • the row address determining unit determines the memory access.
  • the request is a normal memory access request or an accelerated memory access request, wherein the normal memory access request corresponds to a locally better data unit, and the accelerated memory access request corresponds to a locally poor data unit;
  • the storage request is a normal memory access request, and the row address determining unit instructs the first selector to send the normal memory access request to the second selector, so that the second selector directly transmits the normal memory access request to the main memory for processing.
  • the row address determining unit instructs the first selector to send the accelerated memory access request to the acceleration register unit, so that the acceleration registration unit processes the information in the accelerated memory access request.
  • Caching, and triggering the acceleration engine to perform the localized data unit corresponding to the accelerated memory access request Therefore, the data unit with better locality can be concentrated in the cache processing, and the acceleration engine has better processing performance on the data unit with poor locality than the central processing unit, thereby improving the processing speed of the data unit of the computer.
  • the accelerator in the embodiment of the present invention can be respectively connected to the memory controller and the main memory in the computer device through the controller interface and the memory interface, and is compatible with the hardware structure of the existing computer, and realizes data processing on the existing computer device. Upgrade of capabilities.
  • another embodiment of the accelerator in the embodiment of the present invention includes:
  • the controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20.
  • the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access is performed.
  • the request carries the memory address of the main memory.
  • the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison.
  • data analysis is performed on the data unit to be requested, and the locality of the data unit is determined.
  • the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
  • the row address determining unit 102 is configured to determine, according to the row address of the memory access request, the request type of the memory access request, generate and send a first control signal to the first selector, if the memory access request is The normal access request generates a first control signal transmitted to the second selector 107. If the memory access request is an accelerated memory access request, the first control signal transmitted to the acceleration register unit 104 is generated.
  • the memory access address carried in the memory access request is split into two parts: the row address and the column address.
  • the row address is sent first, and the column address is sent after the preset time interval, and the column address is sent.
  • the number of fixed beats after sending must return data.
  • the first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104.
  • the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
  • the acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing);
  • the bus control arbiter 106 is configured to generate and send a second control letter to the second selector
  • the processing order may be allocated according to a certain rule to prevent different memory access requests from transmitting conflicts in the processing flow.
  • the second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; for the first selector 103
  • the normal fetch request and the accelerated fetch request and write data transmitted on the acceleration engine 105 function as a multiplexer that selects an output to the memory interface 108 based on the bus arbitration result of the bus control arbiter 106; for the memory interface 108
  • the memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
  • acceleration registration unit 104 includes:
  • a command queue 1041 configured to store command information when the accelerated memory access request is an acceleration command, where the command information includes a command type, a source operand, or a source operand address;
  • the configuration register 1042 is configured to store configuration information when the accelerated memory access request is a configuration request, where the configuration information includes a mapping relationship between a physical address of the main memory and a row address and a column address;
  • the result register 1043 is configured to store an execution status and response data of the accelerated memory access request.
  • the accelerated memory access request may be further divided into an acceleration command and a configuration request.
  • the first selector 103 transmits the acceleration command to the command queue 1041; when it is a configuration request, the first selection The device 103 then transmits the configuration request to the configuration register 1042.
  • the configuration request is sent at system initialization to convert the address of the acceleration command to the row address and column address used to access the main memory based on the configuration information.
  • the first selector 101 is further configured to: when the result register 1043 returns the response data of the accelerated memory access request, and the second selector 107 returns the response data of the normal memory access request
  • the response data transmitted to the controller interface 101 is selected according to the second control signal generated by the bus control arbiter 106.
  • the second selector 107 is further configured to: when the memory interface 108 returns the response data of the memory access request, select, according to the type of the corresponding memory access request of the response data, The acceleration engine 105 or the first selector 103 transmits the response data.
  • the running speed of the central processing unit may be faster than the accelerator of the embodiment of the present invention, and the normal memory access request correspondingly processes the data unit with better locality, when the second selector needs to be selected When any one of the normal memory access request and the accelerated memory access request accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
  • the second selector receives the normal memory access request, and determines the access type of the normal memory access request, if it is a write memory request, Intercepting the write request, deferring until the memory interface is released and then transmitting; if the memory request is, sending an error correction code (ECC) error message to the memory controller by using the first selector
  • ECC error correction code
  • the bus control arbiter in the embodiment of the present invention can generate a reasonable control signal for the first selector and the second selector according to the request processing state in the actual application, so that the data can be transmitted without conflict.
  • the accelerator in the embodiment of the present invention can also be extended to the scene of the multi-channel connection.
  • the internal structure of the accelerator is as shown in FIG. 5.
  • Another embodiment of the accelerator in the embodiment of the present invention includes:
  • the controller interface 101 is configured to receive a memory access request transmitted by the memory controller 20.
  • the memory access request is an instruction that the central processing unit needs to access the main memory for reading and writing operations, where the memory access request carries The memory address of the main memory.
  • the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison.
  • data analysis is performed on the data unit to be requested, and the locality of the data unit is determined.
  • the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined. If the locality is poor, an accelerated access request is generated correspondingly.
  • the row address determining unit 102 is configured to determine the memory access according to the row address of the memory access request Generating a request and sending a first control signal to the first selector; if the memory access request is a normal memory access request, generating a first control signal transmitted to the second selector 107; The memory access request is an accelerated memory access request, and a first control signal transmitted to the acceleration register unit 104 is generated.
  • the memory access address carried in the memory access request is split into two parts: the row address and the column address.
  • the row address is sent first, and the column address is sent after the preset time interval, and the column address is sent.
  • the number of fixed beats after sending must return data.
  • the first selector 103 is configured to select a transmission direction of the memory access request according to the first control signal; specifically, the transmission direction is sent to the second selector 107, or is sent to the acceleration registration unit 104.
  • the first selector 103 can have both the demultiplexer function of the controller interface 101 and the function of the multiplexer; for the memory access request of the controller interface 101, the function is a demultiplexer, according to The control signal generated by the row address judging unit 102 is output to the second selector 107 or the accumulating register unit 104; for the bus control arbiter 106, the second selector 107 and the accelerating register unit 104 and the like return data information, the function of which is The multiplexer selects an output from the controller interface 101 based on the control signal generated by the bus control arbiter 106.
  • the acceleration registration unit 104 is configured to store processing information of the accelerated memory access request; the acceleration engine 105 is configured to invoke the processing information of the accelerated memory access request to the acceleration registration unit 104 (specifically, an acceleration command) And accessing the main memory by the second selector 107 according to the processing information, to perform the data processing operation of the accelerated memory access request (specifically, the data unit of the local difference indicated in the accelerated memory access request) Perform arithmetic processing ;);
  • the bus control arbiter 106 is configured to generate and send a second control signal to the second selector; such that when the second selector 107 needs to simultaneously process two memory access requests, the processing may be allocated according to a certain rule. Order, avoiding different memory access requests sending conflicts in the processing flow.
  • the second selector 107 is configured to receive a normal memory access request transmitted by the first selector, accelerate an accelerated memory access request transmitted by the engine, and a second control signal sent by the bus control arbiter; and select according to the second control signal Currently accessing the memory access request of the main memory 30; in practical applications, the second selector 107 also has the functions of the demultiplexer function of the memory interface 108 and the multiplexer; for the first selector 103
  • the normal fetch request and the accelerated fetch request and write data transmitted on the acceleration engine 105 function as a multiplexer that selects an output to the memory interface 108 based on the bus arbitration result of the bus control arbiter 106; for the memory interface 108 The data returned on it,
  • the function is a demultiplexer that outputs the return data to the acceleration engine 105 or the first selector 103 in accordance with the bus arbitration result of the bus control arbiter 106.
  • the memory interface 108 is configured to transmit the memory access request to the main memory 30, and transmit response data corresponding to the memory access request to the second selector.
  • the acceleration register unit 104 includes a command queue 1041, a configuration register 1042, and a result register 1043.
  • the accelerator 10 may further include:
  • the routing module 109 is configured to transmit the accelerated memory access request to the corresponding main memory 30, where the routing module is respectively connected to the acceleration engine and another accelerator, and the data required for the accelerated memory access request
  • the acceleration engine transmits the accelerated memory access request to the routing module
  • the routing module transmits the accelerated memory access request to another accelerator, so that the other An accelerator performs data access to the main memory connected to the other accelerator according to the accelerated memory access request.
  • routing module 108 may be further connected to the second selector, so that the routing module will receive the accelerated access request sent by another accelerator and transmit it to the local device through the second selector.
  • multiple accelerators can be organized into various topology structures, such as a ring, a fat tree, and the like.
  • an embodiment of the data processing method in the embodiment of the present invention includes:
  • the accelerator receives a memory access request transmitted by the memory controller.
  • the accelerator receives the memory access request transmitted by the memory controller, and the memory access request includes: a normal memory access request and a power port speed access request.
  • the memory access request includes: a normal memory access request and an accelerated memory access request; the normal memory access request is a locally better data unit corresponding to the request, and the accelerated memory access request corresponding to the request is a local comparison.
  • data analysis is performed on the data unit to be requested, and the locality of the data unit is determined.
  • the data unit may be determined by setting a threshold value. When the locality is greater than or equal to a certain threshold, it can be determined that the locality of the data unit is good, and a normal memory access request is generated correspondingly; when the locality of the data unit is less than a certain threshold, the data unit can be determined.
  • the accelerator determines, according to the row address of the memory access request, the request type of the memory access request.
  • the accelerator determines, according to the row address of the memory access request, the request type of the memory access request; if the memory access request is an accelerated memory access Requesting, buffering the accelerated memory access request, and processing the accelerated memory access request in the accelerator; if the normal memory access request, transmitting the normal memory access request to the main memory deal with.
  • the accelerator selects a memory access request for accessing the main memory.
  • the normal memory access request is preferentially selected.
  • the running speed of the central processing unit may be faster than the accelerator of the embodiment of the present invention, and the normal memory access request correspondingly processes the data unit with better locality, when the second selector needs to be selected When any one of the normal memory access request and the accelerated memory access request accesses the main memory, a second control signal that preferentially processes the normal memory access request is generated.
  • the second selector of the accelerator receives the normal memory access request, and determines the access type of the normal memory access request, if the memory access request is And delaying the normal memory access request until the memory interface of the accelerator is released, and then sending the error to the memory controller to the memory controller; if the memory request is read, sending an error correction code ECC error message to the memory controller.
  • the second selector of the accelerator receives the normal memory access request, and then determines the access type of the normal memory access request, and if it is a write memory request, it needs to intercept The write request is postponed until the memory interface is released and then sent; if the memory request is read, the error correcting code (ECC, Error Correction Code) error message is sent to the memory controller through the first selector of the accelerator.
  • ECC Error Correction Code
  • the memory controller is caused to resend the read request once to avoid generating a system logic error.
  • the disclosed apparatus and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the mutual coupling or direct engagement or communication connection shown or discussed may be an indirect engagement or communication connection through some interface, device or unit, and may be in electrical, mechanical or other form.
  • the components displayed by the unit may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Abstract

La présente invention porte sur un accélérateur et sur un procédé de traitement de données qui sont utilisés pour mettre à niveau un équipement informatique existant afin d'améliorer l'efficacité de traitement de données de l'équipement informatique. L'accélérateur comporte une interface d'unité de commande, une unité de détermination d'adresse de rangée, un premier sélecteur, une unité de registre d'accélération, un moteur d'accélération, un arbitre de commande de bus, un second sélecteur et une interface mémoire.
PCT/CN2014/080162 2013-06-28 2014-06-18 Accélérateur et procédé de traitement de données WO2014206229A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310269782.4A CN104252416B (zh) 2013-06-28 2013-06-28 一种加速器以及数据处理方法
CN201310269782.4 2013-06-28

Publications (1)

Publication Number Publication Date
WO2014206229A1 true WO2014206229A1 (fr) 2014-12-31

Family

ID=52141035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080162 WO2014206229A1 (fr) 2013-06-28 2014-06-18 Accélérateur et procédé de traitement de données

Country Status (2)

Country Link
CN (1) CN104252416B (fr)
WO (1) WO2014206229A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101814577B1 (ko) * 2015-10-16 2018-01-03 삼성전자주식회사 프로세싱-인-메모리를 이용한 명령어 처리 방법 및 그 장치
CN109308280B (zh) * 2017-07-26 2021-05-18 华为技术有限公司 数据处理方法和相关设备
CN109756390B (zh) * 2018-12-06 2020-12-01 网易(杭州)网络有限公司 自动测试网络加速器连通性方法和装置
CN110018839B (zh) * 2019-03-27 2021-04-13 联想(北京)有限公司 硬件加速器复用方法和硬件加速装置
CN114328311A (zh) * 2021-12-15 2022-04-12 珠海一微半导体股份有限公司 一种存储控制器架构、数据处理电路及数据处理方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221538A (zh) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 实现对缓存中数据快速查找的系统和方法
CN101290610A (zh) * 2008-06-03 2008-10-22 浙江大学 嵌入式异构多核体系片上通信互连组织层次的实现方法
US20110307647A1 (en) * 2010-06-11 2011-12-15 California Institute Of Technology Systems and methods for rapid processing and storage of data
CN103345429A (zh) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 基于片上ram的高并发访存加速方法、加速器及cpu

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221538A (zh) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 实现对缓存中数据快速查找的系统和方法
CN101290610A (zh) * 2008-06-03 2008-10-22 浙江大学 嵌入式异构多核体系片上通信互连组织层次的实现方法
US20110307647A1 (en) * 2010-06-11 2011-12-15 California Institute Of Technology Systems and methods for rapid processing and storage of data
CN103345429A (zh) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 基于片上ram的高并发访存加速方法、加速器及cpu

Also Published As

Publication number Publication date
CN104252416B (zh) 2017-09-05
CN104252416A (zh) 2014-12-31

Similar Documents

Publication Publication Date Title
EP3796179A1 (fr) Système, appareil et procédé de traitement d'opérations d'accès direct à la mémoire à distance à l'aide d'une mémoire fixée à un dispositif
US11755203B2 (en) Multicore shared cache operation engine
US9760386B2 (en) Accelerator functionality management in a coherent computing system
WO2018076793A1 (fr) Dispositif nvme et procédés de lecture et d'écriture de données nvme
US9003082B2 (en) Information processing apparatus, arithmetic device, and information transferring method
US7555597B2 (en) Direct cache access in multiple core processors
US7600077B2 (en) Cache circuitry, data processing apparatus and method for handling write access requests
WO2015078219A1 (fr) Procédé et appareil de mise en mémoire cache d'informations, et dispositif de communication
EP2630579B1 (fr) Adaptateur e/s unifié
US9256555B2 (en) Method and system for queue descriptor cache management for a host channel adapter
WO2014206229A1 (fr) Accélérateur et procédé de traitement de données
US7975090B2 (en) Method for efficient I/O controller processor interconnect coupling supporting push-pull DMA read operations
US11960945B2 (en) Message passing circuitry and method
WO2013185660A1 (fr) Dispositif et procédé de stockage d'instructions de processeur de réseau
US20230153153A1 (en) Task processing method and apparatus
US8850159B2 (en) Method and system for latency optimized ATS usage
US11275707B2 (en) Multi-core processor and inter-core data forwarding method
JP3873589B2 (ja) プロセッサシステム
US20080109639A1 (en) Execution of instructions within a data processing apparatus having a plurality of processing units
US11960727B1 (en) System and method for large memory transaction (LMT) stores
EP4339776A1 (fr) Procédé de planification de tâches, système et planificateur de tâches matérielles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14818028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14818028

Country of ref document: EP

Kind code of ref document: A1