CN110704343A - Data transmission method and device for memory access and on-chip communication of many-core processor - Google Patents

Data transmission method and device for memory access and on-chip communication of many-core processor Download PDF

Info

Publication number
CN110704343A
CN110704343A CN201910852824.4A CN201910852824A CN110704343A CN 110704343 A CN110704343 A CN 110704343A CN 201910852824 A CN201910852824 A CN 201910852824A CN 110704343 A CN110704343 A CN 110704343A
Authority
CN
China
Prior art keywords
rma
channel
dma
instruction
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910852824.4A
Other languages
Chinese (zh)
Other versions
CN110704343B (en
Inventor
施晶晶
唐勇
谢军
张清波
陈芳园
陈庆强
过锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910852824.4A priority Critical patent/CN110704343B/en
Publication of CN110704343A publication Critical patent/CN110704343A/en
Application granted granted Critical
Publication of CN110704343B publication Critical patent/CN110704343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a data transmission method and device for memory access and on-chip communication of a many-core processor, and belongs to the field of computer system structures and processor microstructures. The data transmission method and device facing to memory access and on-chip communication of the many-core processor comprise the following steps: s1: the channel instruction buffer unit acquires 1 or more channel instructions sent by the source core processor; s2: extracting a DMA channel instruction or an RMA channel instruction from a channel instruction buffer unit; s3: analyzing DMA micro access from the DMA channel instruction, sending the DMA micro access to a memory, analyzing RMA micro access from the RMA channel instruction and sending the RMA micro access to a target core processor; s4: and initiating a response word operation after acquiring a response returned by the memory or a response returned by the target core processor. The invention reduces the hardware logic overhead, realizes the high-efficiency in-chip data multiplexing and improves the computing capability of the many-core processor.

Description

Data transmission method and device for memory access and on-chip communication of many-core processor
Technical Field
The invention belongs to the field of computer system structures and processor microstructure design, and relates to a data transmission method and device for memory access and on-chip communication of a many-core processor.
Background
With the development of semiconductor technology, the integration level of chips is higher and higher. Many-core processors usually integrate hundreds of cores, and compared with the traditional processors adopting a main memory + multi-level CACHE structure, a considerable number of many-core processors adopt a storage hierarchy structure of the main memory + an on-chip local memory. Compared with the computing capacity of a many-core processor, the memory access bandwidth which can be obtained by the many-core processor is limited, and the problem of a more serious memory wall is faced, on one hand, the basic memory access capacity of the many-core processor can be improved by adopting new technologies such as high-bandwidth memory access and the like, on the other hand, data can be shared among cores by utilizing a network on chip, larger-scale data reuse is realized, and memory access requests are reduced.
How to more effectively realize the multiplexing of data in a chip and reduce the memory access requirement is a difficult problem for designing a many-core processor. Conventionally, cores can generally acquire new data one by one from different cores by using LD/ST instructions to realize more data multiplexing, but because many-core cores are more and the access delay difference between different cores is larger, usually dozens or hundreds of beats, the cores can only wait before acquiring new data, and the exertion of the computing power of the many-core processor is severely restricted.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data transmission method and a device for memory access and on-chip communication of a many-core processor, and the technical problems to be solved by the invention are as follows: a data transmission method and a data transmission device facing memory access and on-chip communication of a many-core processor are provided, wherein the data transmission method and the data transmission device can enable batch data transmission (DMA) between a many-core and a memory and batch data transmission (RMA) between the many-core cores to be realized in parallel so as to reduce hardware logic overhead.
The purpose of the invention can be realized by the following technical scheme:
the data transmission method facing the memory access and on-chip communication of the many-core processor comprises the following steps:
s1: the channel instruction buffer unit acquires 1 or more channel instructions sent by the source core processor;
s2: extracting a DMA channel instruction or an RMA channel instruction from a channel instruction buffer unit;
s3: analyzing DMA micro access from the DMA channel instruction, sending the DMA micro access to a memory, analyzing RMA micro access from the RMA channel instruction and sending the RMA micro access to a target core processor;
s4: and initiating a response word operation after acquiring a response returned by the memory or a response returned by the target core processor.
Preferably, step S2 specifically includes performing a unified allocation on the channel state register set to obtain the DMA channel instruction or the RMA channel instruction respectively when obtaining the extraction request for extracting the DMA channel instruction or the RMA channel instruction from the channel instruction buffer unit, then dispatching the DMA channel instruction to the DMA splitting station and dispatching the RMA channel instruction to the RMA splitting station respectively, where in step S3 the DMA splitting station splits the DMA channel instruction to resolve the DMA micro-access, and the RMA splitting station splits the RMA channel instruction to resolve the RMA micro-access.
Preferably, step S4 specifically includes acquiring a response returned by the memory or a response returned by the target core processor, and then updating the channel state register set of the internal state in real time, and when the channel state register receives all responses, initiating a reply word operation, where the reply word operation is setting a flag in the local memory of the source core processor or the target core processor.
Preferably, the DMA split station split DMA channel instruction and the RMA split station split channel instruction are performed concurrently.
Preferably, after the DMA channel instruction or the RMA channel instruction is obtained, arbitration is performed according to the operation type and the instruction sequence of the channel instruction to respectively dispatch the DMA channel instruction to the DMA splitting station and the RMA channel instruction to the RMA splitting station, and after the DMA micro-access or RMA micro-access is analyzed, arbitration is performed according to the operation type and the sequence of the micro-access to respectively send the DMA micro-access to the memory and send the RMA micro-access to the target core processor.
Preferably, the device further comprises a channel barrier instruction set for changing the direction of the channel instruction stream, wherein the channel barrier instruction set comprises a DMA barrier instruction, an RMA barrier instruction, and a full barrier instruction, and the subsequent DMA channel instruction is executed after the completion of the transmission initiated by the DMA channel instruction is determined after the DMA barrier instruction is received, the subsequent RMA channel instruction is executed after the completion of the transmission initiated by the RMA channel instruction received before the RMA barrier instruction is determined after the RMA barrier instruction is received, and the subsequent DMA instruction or the RMA instruction is executed after the completion of the transmission initiated by the DMA instruction received before the full barrier instruction is received.
Preferably, the split DMA micro-access or RMA micro-access is rotated or forwarded in parallel by weight through the network on chip to send the DMA micro-access to the memory and send the RMA micro-access to the target core processor.
A data transmission device facing memory access and on-chip communication of a many-core processor is used for respectively carrying out data transmission among a source core processor, a memory, the source core processor and a target core processor and comprises a channel instruction buffer unit, a channel instruction extracting unit, a channel instruction splitting unit and a channel instruction distributing unit, wherein the channel instruction buffer unit is used for receiving and storing a DMA channel instruction or an RMA channel instruction sent by the source core processor, the channel instruction extracting unit is used for extracting the DMA channel instruction or the RMA channel instruction from the channel instruction buffer unit, the channel instruction splitting unit is used for splitting the DMA micro-access from the DMA channel instruction and sending the DMA micro-access to the memory, the RMA micro-access from the RMA channel instruction and sending the RMA micro-access to the target core processor, the channel instruction distributing unit is used for receiving the DMA channel instruction or the RMA channel instruction extracted by the channel instruction extracting unit and sending the DMA channel instruction and RMA channel instruction to the channel instruction splitting, the memory sends a response to the channel instruction distribution unit after receiving DMA micro-access, the target core processor sends a response to the channel instruction distribution unit after receiving RMA micro-access, and the channel instruction distribution unit sets a mark in a local memory of the source core processor or the target core processor after acquiring the response returned by the memory or the response returned by the target core processor.
Preferably, the channel instruction splitting unit includes a DMA splitting station configured to split the DMA micro-access from the DMA channel instruction and send the DMA micro-access to the memory, and an RMA splitting station configured to split the RMA micro-access from the RMA channel instruction and send the RMA micro-access to the memory.
Preferably, the system further comprises a barrier command set management unit configured to issue a channel barrier command set for changing a channel command stream direction to the channel command buffer unit, where the channel barrier command set includes a DMA barrier command, an RMA barrier command, and a full barrier command, the channel command buffer unit continues to execute the subsequent DMA channel command after determining that a transmission initiated by the previous DMA channel command is completed after receiving the DMA barrier command issued by the barrier command set management unit, continues to execute the subsequent RMA channel command after determining that a transmission initiated by the previously received RMA channel command is completed after receiving the RMA barrier command issued by the barrier command set management unit, and continues to execute the subsequent DMA command or the RMA command after receiving the full barrier command issued by the barrier command set management unit and determining that the previously received DMA command or the transmission initiated by the RMA command is completed.
The channel instruction buffer unit obtains 1 or more channel instructions sent by a source core processor, the channel buffer unit can buffer the channel instructions, then extracts DMA channel instructions or RMA channel instructions from the channel instruction buffer unit, then analyzes DMA micro-access from the DMA channel instructions, sends the DMA micro-access to a memory, analyzes RMA micro-access from the RMA channel instructions and sends the RMA micro-access to a target core processor, and finally initiates a reply word operation after acquiring a response returned by the memory or a response returned by the target core processor.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of two data transmission processes when the apparatus of the present invention is applied;
fig. 3 is a schematic view of the structure of the apparatus of the present invention.
Detailed Description
The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.
Referring to fig. 1, the data transmission method for memory access and on-chip communication of a many-core processor in this embodiment includes the following steps:
s1: the channel instruction buffer unit acquires 1 or more channel instructions sent by the source core processor;
s2: extracting a DMA channel instruction or an RMA channel instruction from a channel instruction buffer unit;
s3: analyzing DMA micro access from the DMA channel instruction, sending the DMA micro access to a memory, analyzing RMA micro access from the RMA channel instruction and sending the RMA micro access to a target core processor;
s4: and initiating a response word operation after acquiring a response returned by the memory or a response returned by the target core processor.
The channel instruction buffer unit obtains 1 or more channel instructions sent by a source core processor, the channel buffer unit can buffer the channel instructions, then extracts DMA channel instructions or RMA channel instructions from the channel instruction buffer unit, then analyzes DMA micro-access from the DMA channel instructions, sends the DMA micro-access to a memory, analyzes RMA micro-access from the RMA channel instructions and sends the RMA micro-access to a target core processor, and finally initiates answer word operation after acquiring a response returned by the memory or a response returned by the target core processor. The channel instruction defines parameters such as the type, transmission length, source and target addresses, access mode, answer word address and the like of the batch transmission operation. After the source core processor sends the channel instruction, the computing task can be executed.
Step S2 may specifically include that when an extraction request for extracting a DMA channel instruction or an RMA channel instruction from the channel instruction buffer unit is obtained, the channel state register set performs a unified allocation to respectively obtain the DMA channel instruction or the RMA channel instruction, then the DMA channel instruction is respectively dispatched to the DMA splitting station, and the RMA channel instruction is dispatched to the RMA splitting station, where in step S3 the DMA splitting station splits the DMA channel instruction to resolve the DMA micro-access, and the RMA splitting station splits the RMA channel instruction to resolve the RMA micro-access. When an extraction request for extracting a DMA channel instruction or an RMA channel instruction from a channel instruction buffer unit is obtained, the channel state register group alternately carries out uniform distribution of transmission channel numbers to obtain the instruction of the transmission channel number, and the channel instruction is distributed to a special DMA splitting platform or an RMA splitting platform according to the operation type of the instruction to carry out parallel splitting processing.
Step S4 may specifically include acquiring a response returned by the memory or a response returned by the target core processor, and then updating the channel state register set of the internal state in real time, and when the channel state register collects all responses, initiating a reply word operation, where the reply word operation is setting a flag in the local memory of the source core processor or the target core processor. Whether the data transmission is completed or not is judged by inquiring the address of the answer word of the local memory (the address can be specified in the channel instruction).
The DMA splitting station splits the DMA channel instruction and the RMA splitting station splits the channel instruction to be carried out concurrently, so that the concurrent execution of two data transmissions is realized, and the transmission efficiency is improved.
After obtaining the DMA channel instruction or the RMA channel instruction, carrying out arbitration according to the operation type and the instruction sequence of the channel instruction so as to respectively dispatch the DMA channel instruction to the DMA splitting station and dispatch the RMA channel instruction to the RMA splitting station, and after analyzing the DMA micro-access or the RMA micro-access, carrying out arbitration according to the operation type and the sequence of the micro-access so as to respectively send the DMA micro-access to the memory and send the RMA micro-access to the target core processor.
The data transmission method facing the memory access and the in-chip communication of the many-core processor in the embodiment can also comprise a channel barrier instruction set used for changing the flow direction of the channel instructions, wherein the channel barrier instruction set comprises a DMA barrier instruction, an RMA barrier instruction and a full barrier instruction, after receiving the DMA barrier command, determining that the transmission initiated by the previous DMA channel command is completed, then continuing to execute the subsequent DMA channel command, after receiving the RMA barrier command, determining that the transmission initiated by the previously received RMA channel command is completed, then continuing to execute the subsequent RMA channel command, after receiving the DMA command received before the determination of the full barrier command or the transmission initiated by the RMA command, then continuing to execute the subsequent DMA command or the RMA command, under the control of the three barrier instructions, the device can control the transmission sequence between two data streams and each data stream, and provides bottom support for the many-core processor to realize different communication models. The concurrent execution of the two data transmissions is realized, and the transmission sequence of the two transmissions can be controlled and the nested mixed operation of the two transmissions can be realized under the control of a matched channel instruction. And finally, the data transmission between the source core processor and the target core processor and the data transmission between the source core processor and the memory are completely parallel, the in-chip data multiplexing is efficiently realized, and the computing capacity of the many-core processor is improved.
And the split DMA micro-access or RMA micro-access rotates or forwards in parallel through the network on chip according to the weight so as to send the DMA micro-access to the memory and send the RMA micro-access to the target core processor. The weights are forwarded in parallel from large to small by rotation according to the sequence or the sequence of the priority.
Referring to fig. 2 and 3, a data transmission device for memory access and on-chip communication of a many-core processor is used for respectively performing data transmission among a source core processor, a memory, the source core processor and a target core processor, and includes a channel instruction buffer unit for receiving and storing a DMA channel instruction or an RMA channel instruction sent by the source core processor, a channel instruction extraction unit for extracting the DMA channel instruction or the RMA channel instruction from the channel instruction buffer unit, a channel instruction splitting unit for splitting a DMA micro-access from the DMA channel instruction and sending the DMA micro-access to the memory, a channel instruction splitting unit for splitting the RMA micro-access from the RMA channel instruction and sending the RMA micro-access to the target core processor, and a channel instruction distribution unit for receiving the DMA channel instruction or the RMA channel instruction extracted by the channel instruction extraction unit and sending the DMA channel instruction and the RMA channel instruction to the channel instruction splitting unit, the device comprises a memory, a target core processor, a channel instruction distribution unit, a multi-core LDM (low density memory) and a multi-core LDM (linear memory access), wherein the memory sends a response to the channel instruction distribution unit after receiving DMA (direct memory access) micro access, the target core processor sends a response to the channel instruction distribution unit after receiving RMA micro access, the channel instruction distribution unit sets a mark in a local memory of a source core processor or a target core processor after acquiring the response returned by the memory or the response returned by the target core processor, and can be compatible with batch data transmission between the multi-core LDM and a memory (DMA) and between the multi-core LDM, support the mixed operation of the two transmission modes, realize the sharing of partial resources between the two by adopting a unified framework, reduce the realization expense to the maximum extent, simultaneously support multiple transmission modes such as point-to-point transmission, row transmission, column multicast and the like when the batch data transmission between the cores, each type of transmission can realize parallel transmission of multiple batches of data, so that the expenditure is saved, the dynamic adjustment and use of resources are realized, the many-core can completely separate calculation and data transmission, and a user can conveniently construct a large-scale complex scientific calculation program.
The device reserves a small amount of resources (such as message channel numbers) for each type of channel instruction, dynamically allocates other resources in the two types of channel instructions, and sets a special DMA (direct memory access) and RMA (remote management architecture) splitting platform to support the transmission, splitting and processing of the two types of data. The DMA and RMA transmission sequence can be flexibly scheduled and controlled; a flexible and configurable answer word completion notification mechanism is implemented. A DMA transfer is a transfer from a source core processor to memory. RAM transfers are transfers between a source core processor to a target core processor.
The channel instruction allocation unit can dynamically allocate and recycle the channel state register set.
The channel instruction fetch unit may include a DMA fetch module to fetch DMA channel instructions from the channel instruction buffer unit and an RMA fetch module to fetch RMA channel instructions from the channel instruction buffer unit.
The data transmission device facing memory access and in-chip communication of the many-core processor in this embodiment may further include an arbitration unit that performs arbitration according to an operation type and an instruction sequence of the channel instruction after obtaining the DMA channel instruction or the RMA channel instruction to respectively dispatch the DMA channel instruction to the DMA splitting station and the RMA channel instruction to the RMA splitting station, performs arbitration according to an operation type and a sequence of micro-access after resolving the DMA micro-access or the RMA micro-access to respectively send the DMA micro-access to the memory, and sends the RMA micro-access to the target core processor.
The channel instruction splitting unit comprises a DMA splitting station for splitting the DMA micro access from the DMA channel instruction and sending the DMA micro access to the memory, and an RMA splitting station for splitting the RMA micro access from the RMA channel instruction and sending the RMA micro access to the memory, and the splitting is respectively carried out, so that the transmission efficiency is improved. And the DMA split station sends the DMA micro-access to the memory through the network on chip.
The data transmission device facing the memory access and on-chip communication of the many-core processor in the embodiment can also comprise a barrier command set management unit used for sending a channel barrier command set for changing the flow direction of the channel command to the channel command buffer unit, the channel barrier instruction set comprises a DMA barrier instruction, an RMA barrier instruction and a full barrier instruction, the channel instruction buffer unit continues to execute subsequent DMA channel instructions after determining that transmission initiated by the previous DMA channel instruction is completed after receiving the DMA barrier instruction sent by the barrier instruction set management unit, continues to execute subsequent RMA channel instructions after determining that transmission initiated by the previously received RMA channel instruction is completed after receiving the RMA barrier instruction sent by the barrier instruction set management unit, and continues to execute subsequent DMA instructions or RMA instructions after receiving the DMA instruction received before determining that transmission initiated by the previous DMA instruction or RMA instruction is completed after receiving the full barrier instruction sent by the barrier instruction set management unit. After receiving the DMA barrier command, the channel command buffer unit must wait for the completion of the previous transmission initiated by the DMA channel command before allowing the subsequent DMA channel command to start executing. After receiving the RMA barrier command, the channel command buffer unit must wait for the transmission initiated by the previously received RMA channel command to complete before allowing the subsequent RMA channel command to start executing. The channel instruction buffer unit receives the full barrier instruction, and must wait for the transmission initiated by all the previously received channel transmission instructions to be completed before allowing the subsequent channel instructions to start executing. A unified channel transmission and barrier instruction set is defined, under the control of the three barrier instructions, the channel instruction buffer unit can control the transmission sequence between two data streams and each data stream, and bottom layer support is provided for the many-core processor to realize different communication models.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (10)

1. A data transmission method facing memory access and on-chip communication of a many-core processor is characterized by comprising the following steps:
s1: the channel instruction buffer unit acquires 1 or more channel instructions sent by the source core processor;
s2: extracting a DMA channel instruction or an RMA channel instruction from a channel instruction buffer unit;
s3: analyzing DMA micro access from the DMA channel instruction, sending the DMA micro access to a memory, analyzing RMA micro access from the RMA channel instruction and sending the RMA micro access to a target core processor;
s4: and initiating a response word operation after acquiring a response returned by the memory or a response returned by the target core processor.
2. The data transmission method oriented to many-core processor memory access and on-chip communication of claim 1, wherein: step S2 specifically includes performing a global allocation on the channel state register set to obtain DMA channel instructions or RMA channel instructions respectively when an extraction request for extracting DMA channel instructions or RMA channel instructions from the channel instruction buffer unit is obtained, then distributing the DMA channel instructions to the DMA splitting station and distributing the RMA channel instructions to the RMA splitting station, where in step S3 the DMA splitting station splits the DMA channel instructions to resolve DMA micro-accesses and the RMA splitting station splits the RMA channel instructions to resolve RMA micro-accesses.
3. The data transmission method for many-core processor memory access and on-chip communication according to claim 1 or 2, characterized in that: step S4 includes acquiring a response returned by the memory or a response returned by the target core processor, and then updating the channel state register set of the internal state in real time, and when the channel state register collects all responses, initiating a response word operation, where the response word operation is setting a flag in the local memory of the source core processor or the target core processor.
4. The data transmission method oriented to many-core processor memory access and on-chip communication of claim 3, wherein: the DMA splitting station splits the DMA channel instruction and the RMA splitting station splits the channel instruction and carries out concurrently.
5. The data transmission method oriented to many-core processor memory access and on-chip communication of claim 2, wherein: after obtaining the DMA channel instruction or the RMA channel instruction, carrying out arbitration according to the operation type and the instruction sequence of the channel instruction so as to respectively dispatch the DMA channel instruction to the DMA splitting station and dispatch the RMA channel instruction to the RMA splitting station, and after analyzing the DMA micro-access or the RMA micro-access, carrying out arbitration according to the operation type and the sequence of the micro-access so as to respectively send the DMA micro-access to the memory and send the RMA micro-access to the target core processor.
6. The data transmission method oriented to many-core processor memory access and on-chip communication of claim 3, wherein: the system also comprises a channel barrier instruction set used for changing the direction of the channel instruction stream, wherein the channel barrier instruction set comprises a DMA barrier instruction, an RMA barrier instruction and a full barrier instruction, the subsequent DMA channel instruction is continuously executed after the transmission initiated by the DMA channel instruction is determined to be completed after the DMA barrier instruction is received, the subsequent RMA channel instruction is continuously executed after the transmission initiated by the RMA channel instruction received before is determined to be completed after the RMA barrier instruction is received, and the subsequent DMA instruction or the RMA instruction is continuously executed after the full barrier instruction is received and the DMA instruction or the RMA instruction received before is determined to be completed.
7. The data transmission method for many-core processor memory access and on-chip communication according to claim 1 or 2, characterized in that: and the split DMA micro-access or RMA micro-access rotates or forwards in parallel through the network on chip according to the weight so as to send the DMA micro-access to the memory and send the RMA micro-access to the target core processor.
8. A data transmission device facing memory access and on-chip communication of a many-core processor is used for respectively carrying out data transmission among a source core processor, a memory, the source core processor and a target core processor and comprises a channel instruction buffer unit, a channel instruction extracting unit, a channel instruction splitting unit and a channel instruction distributing unit, wherein the channel instruction buffer unit is used for receiving and storing a DMA channel instruction or an RMA channel instruction sent by the source core processor, the channel instruction extracting unit is used for extracting the DMA channel instruction or the RMA channel instruction from the channel instruction buffer unit, the channel instruction splitting unit is used for splitting the DMA micro-access from the DMA channel instruction and sending the DMA micro-access to the memory, the RMA micro-access from the RMA channel instruction and sending the RMA micro-access to the target core processor, the channel instruction distributing unit is used for receiving the DMA channel instruction or the RMA channel instruction extracted by the channel instruction extracting unit and sending the DMA channel instruction and RMA channel instruction to the channel instruction splitting, the memory sends a response to the channel instruction distribution unit after receiving DMA micro-access, the target core processor sends a response to the channel instruction distribution unit after receiving RMA micro-access, and the channel instruction distribution unit sets a mark in a local memory of the source core processor or the target core processor after acquiring the response returned by the memory or the response returned by the target core processor.
9. The data transfer device for many-core processor memory access and on-chip communication of claim 8, wherein: the channel instruction splitting unit comprises a DMA splitting station used for splitting DMA micro access from a DMA channel instruction and sending the DMA micro access to the memory, and an RMA splitting station used for splitting RMA micro access from an RMA channel instruction and sending the RMA micro access to the memory.
10. The data transfer device for many-core processor memory access and on-chip communication of claim 8 or 9, wherein: the channel command buffer unit continues to execute subsequent DMA channel commands after determining that transmission initiated by the previous DMA channel command is completed after receiving the DMA barrier command sent by the barrier command set management unit, continues to execute subsequent RMA channel commands after determining that transmission initiated by the previously received RMA channel command is completed after receiving the RMA barrier command sent by the barrier command set management unit, and continues to execute subsequent DMA commands or RMA commands after receiving the DMA command received before determining that transmission initiated by the full barrier command or RMA command sent by the barrier command set management unit is completed.
CN201910852824.4A 2019-09-10 2019-09-10 Data transmission method and device for memory access and on-chip communication of many-core processor Active CN110704343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910852824.4A CN110704343B (en) 2019-09-10 2019-09-10 Data transmission method and device for memory access and on-chip communication of many-core processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910852824.4A CN110704343B (en) 2019-09-10 2019-09-10 Data transmission method and device for memory access and on-chip communication of many-core processor

Publications (2)

Publication Number Publication Date
CN110704343A true CN110704343A (en) 2020-01-17
CN110704343B CN110704343B (en) 2021-01-05

Family

ID=69195146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910852824.4A Active CN110704343B (en) 2019-09-10 2019-09-10 Data transmission method and device for memory access and on-chip communication of many-core processor

Country Status (1)

Country Link
CN (1) CN110704343B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000596A (en) * 2007-01-22 2007-07-18 北京中星微电子有限公司 Chip and communication method of implementing communicating between multi-kernel in chip and communication method
US20090259789A1 (en) * 2005-08-22 2009-10-15 Shuhei Kato Multi-processor, direct memory access controller, and serial data transmitting/receiving apparatus
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN104346285A (en) * 2013-08-06 2015-02-11 华为技术有限公司 Memory access processing method, device and system
CN104699631A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN105389120A (en) * 2014-09-02 2016-03-09 英特尔公司 Supporting RMA API over active message
CN105512088A (en) * 2015-11-27 2016-04-20 中国电子科技集团公司第三十八研究所 Processor architecture capable of being reconstructed and reconstruction method thereof
CN106201939A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Multinuclear catalogue concordance device towards GPDSP framework

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090259789A1 (en) * 2005-08-22 2009-10-15 Shuhei Kato Multi-processor, direct memory access controller, and serial data transmitting/receiving apparatus
CN101000596A (en) * 2007-01-22 2007-07-18 北京中星微电子有限公司 Chip and communication method of implementing communicating between multi-kernel in chip and communication method
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN104346285A (en) * 2013-08-06 2015-02-11 华为技术有限公司 Memory access processing method, device and system
CN105389120A (en) * 2014-09-02 2016-03-09 英特尔公司 Supporting RMA API over active message
CN104699631A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN105512088A (en) * 2015-11-27 2016-04-20 中国电子科技集团公司第三十八研究所 Processor architecture capable of being reconstructed and reconstruction method thereof
CN106201939A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Multinuclear catalogue concordance device towards GPDSP framework

Also Published As

Publication number Publication date
CN110704343B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11782870B2 (en) Configurable heterogeneous AI processor with distributed task queues allowing parallel task execution
US11789895B2 (en) On-chip heterogeneous AI processor with distributed tasks queues allowing for parallel task execution
US10262390B1 (en) Managing access to a resource pool of graphics processing units under fine grain control
CN106663028B (en) Dynamic fragmentation allocation adjustment
RU2597556C2 (en) Computer cluster arrangement for executing computation tasks and method for operation thereof
EP2339468A2 (en) Accelerating opencl applications by utilizing a virtual opencl device as interface to compute clouds
WO2015117565A1 (en) Methods and systems for dynamically allocating resources and tasks among database work agents in smp environment
CN105893083A (en) Container-based mobile code unloading support system under cloud environment and unloading method thereof
CN103019838B (en) Multi-DSP (Digital Signal Processor) platform based distributed type real-time multiple task operating system
CN101276294B (en) Method and apparatus for parallel processing heteromorphism data
TW201816595A (en) Processor and method of controlling work flow
CN101288049A (en) Use of a data engine within a data processing apparatus
CN103793255B (en) Starting method for configurable multi-main-mode multi-OS-inner-core real-time operating system structure
CN110569312B (en) Big data rapid retrieval system based on GPU and use method thereof
JP2020027613A (en) Artificial intelligence chip and instruction execution method used in artificial intelligence chip
JP2023511467A (en) Task scheduling for machine learning workloads
CN110704343B (en) Data transmission method and device for memory access and on-chip communication of many-core processor
CN115775199B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN111653317A (en) Gene comparison accelerating device, method and system
CN110879753A (en) GPU acceleration performance optimization method and system based on automatic cluster resource management
Geyer et al. Pipeline group optimization on disaggregated systems
CN113608861A (en) Software load computing resource virtualization distribution method and device
CN112732634B (en) ARM-FPGA (advanced RISC machine-field programmable gate array) cooperative local dynamic reconstruction processing method for edge calculation
CN114327926A (en) Heterogeneous edge intelligent micro server and construction method thereof
CN102736949A (en) Scheduling of tasks to be performed by a non-coherent device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Gao Jiangang

Inventor after: Shi Jingjing

Inventor after: Tang Yong

Inventor after: Xie Jun

Inventor after: Zhang Qingbo

Inventor after: Chen Fangyuan

Inventor after: Chen Qingqiang

Inventor after: Guo Feng

Inventor before: Shi Jingjing

Inventor before: Tang Yong

Inventor before: Xie Jun

Inventor before: Zhang Qingbo

Inventor before: Chen Fangyuan

Inventor before: Chen Qingqiang

Inventor before: Guo Feng

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant