CN114996205A - On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system - Google Patents

On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system Download PDF

Info

Publication number
CN114996205A
CN114996205A CN202210856427.6A CN202210856427A CN114996205A CN 114996205 A CN114996205 A CN 114996205A CN 202210856427 A CN202210856427 A CN 202210856427A CN 114996205 A CN114996205 A CN 114996205A
Authority
CN
China
Prior art keywords
instruction
data
information
unit
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210856427.6A
Other languages
Chinese (zh)
Other versions
CN114996205B (en
Inventor
曹玥
杨建国
张文君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202210856427.6A priority Critical patent/CN114996205B/en
Publication of CN114996205A publication Critical patent/CN114996205A/en
Application granted granted Critical
Publication of CN114996205B publication Critical patent/CN114996205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses an on-chip data scheduling controller and method for an auxiliary 3D architecture near memory computing system, wherein the scheduling controller is used as a memory mapping IO device to be accessed into a system bus, so that a preset instruction can be written into a corresponding memory mapping address through a processor to realize scheduling control; the scheduling controller is connected with the host external interrupt receiving module to send an execution completion interrupt signal to the host and receive an accelerator interrupt signal to judge the accelerator state; and acquiring the memory access path of the takeover host and directly accessing the memory. The pre-write data scheduling instruction from the host can be received and the host access port can be managed so as to access all memory addresses on the chip. According to the preset instruction, the scheduling controller sequentially executes data scheduling and sends a completion signal to the host at the preset node, and returns the control right of the access port to allow the host to read the final data.

Description

On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system
Technical Field
The invention relates to the technical field of data transmission, in particular to an on-chip data scheduling controller and an on-chip data scheduling method for an auxiliary 3D architecture near memory computing system.
Background
The 3D architecture near memory computing system can perform 3D stacking of a Silicon chip of a conventional process accelerator and a DRAM chip, and connect an upper signal port and a lower signal port by using a Through Silicon Via (TSV) or Hybrid Bonding (HB) technology, thereby completing data interaction. Compared with the traditional processor/memory structure, the system can greatly shorten the distance between the computing unit and the memory unit, reduce the memory access delay, and simultaneously use the TSV/HB technology to directly extract data from the memory bank without a DRAM system bus, thereby greatly improving the memory access bandwidth. The system can effectively relieve the problem of the memory wall, thereby improving the performance of the processor system and having great development potential.
However, the access connection of the 3D architecture does not pass through a DRAM system bus, the access range of a single accelerator is limited, and only the bank directly connected to the accelerator below the accelerator chip can be accessed, and if the size of data processed at a single time in an application exceeds the upper data storage limit of the bank of the single memory, data transfer scheduling between accelerators still needs to be performed through the host, and because the host and the DRAM are still connected by using a conventional structure, if optimization is not performed, the system performance bottleneck still can be formed.
Disclosure of Invention
In order to solve the defects of the prior art, further reduce the data transmission quantity through the traditional host-DRAM memory port, realize the conversion of the interaction between the accelerators on the chip into the communication on the chip, promote the upper limit of the data size of the near memory calculation on the pure chip, greatly reduce the memory access times of the host and effectively improve the data handling efficiency and the energy efficiency ratio of the system, the invention adopts the following technical scheme:
an on-chip data scheduling controller of an assisted 3D architecture near memory computing system, comprising: presetting an instruction storage module, a data handling module and a state controller;
the preset instruction storage module is used for storing a preset data scheduling instruction sent by the host and respectively sending the carrying information and the state information to the data carrying module and the state controller;
the data carrying module carries data from one accelerator to another accelerator through carrying information of the preset instruction storage module according to a data carrying starting instruction of the state controller, and sends an instruction completion signal to the state controller;
the state controller enters a chip data scheduling state according to a chip carrying takeover request of the host, judges an executable data carrying instruction according to state information of a preset instruction storage module and an interrupt signal of an accelerator, sends the executable data carrying starting instruction to the data carrying module, acquires an instruction completion signal, judges whether to send an execution completion interrupt signal to the host or not according to the state information after carrying is completed, and exits the chip data scheduling state.
Further, the preset instruction storage module comprises: the system comprises an instruction decoder, a preset instruction queue and an instruction information register;
the instruction decoder receives a preset data scheduling instruction, judges whether a preset instruction queue is full or not, feeds back write-in failure information to the host if the preset instruction queue is full, decodes the preset data scheduling instruction if the preset instruction queue is not full, judges whether the decoded preset data scheduling instruction is correct or not, feeds back the write-in failure information to the host if the decoded preset data scheduling instruction is wrong, and respectively sends the carrying information and the state information in the preset data scheduling instruction to the preset instruction queue and the state controller if the decoded preset data scheduling instruction is correct;
presetting an instruction queue, writing in the carrying information, and updating the queue according to the updating information acquired from the state controller;
and the instruction information register reads corresponding carrying information from a preset instruction queue according to the reading request of the state controller so as to enable the data carrying module to read.
Further, whether the decoded preset data scheduling instruction is correct or not is judged, a source memory address range and a target memory address range of the preset accelerator are calculated through an adder, the source memory starting address and the target memory starting address in the carrying information are respectively compared with the source memory address range and the target memory address range of the preset accelerator in a consistency mode, the source memory starting address and the target memory starting address in the carrying information are correct in the range, and otherwise, the source memory starting address and the target memory starting address are wrong.
Further, before consistency comparison, validity comparison is performed, and when the source memory starting address and the target memory starting address in the transport information are not empty at the same time, the validity is determined, otherwise, the validity is determined.
And further, updating the queue according to the updating information acquired from the state controller, namely writing the carrying information into the tail of the queue by a preset instruction queue, updating tail unit information, and updating head unit information according to a head updating request transmitted by the state controller.
Further, the data handling module comprises a data handling controller and a temporary data cache, the data handling controller receives a data handling starting instruction of the state controller, sequentially generates a memory access instruction according to the handling information provided by the preset instruction storage module, reads a data corresponding address from the source accelerator, stores the data corresponding address into the temporary data cache, reads data from the temporary data cache, writes the data into a target accelerator corresponding address, and circularly carries out data handling operation until all data are handled, and sends an instruction completion signal to the state controller.
Further, the state controller comprises a state information storage queue and a judgment module;
the state information storage queue is used for storing the state information, updating queue information according to the existing queue information, the accelerator interrupt signal and the completion signal of the data handling module, and clearing a completed queue unit according to a head updating instruction of the judging module;
the judging module judges whether an executable data carrying instruction exists according to the on-chip carrying takeover request sent by the host and the state information in the state information storage queue; if not, waiting for the next period to judge again; if so, judging a next instruction to be executed according to the state information, initiating a data reading request to a preset instruction storage module, and sending a state confirmation instruction to the target accelerator; if the target accelerator feedback can receive data writing, transmitting a data carrying starting instruction to the data carrying module; if the feedback of the target accelerator cannot be written, re-entering the judgment of the quasi-execution instruction, and confirming the state of the accelerator in the next round; and after receiving the data carrying module carrying completion signal, judging whether the head unit of the queue is updated or not according to the signal and the self state information, judging whether an execution completion interrupt signal is sent to the host or not, and exiting the in-chip data scheduling state.
Further, the state information storage queue includes state information and additional information, the state information including: source accelerator id, target accelerator id, whether exit the on-chip scheduling state after completion, the additional information includes: the source data is valid, whether the source data is finished or not, whether read-write dependence exists or not and relevant dependence unit id information exist or not;
the state information storage queue update rule is as follows:
the extra information is generated for the first time when the tail unit of the queue is written:
the source data is valid and is set to 0 or not;
if the scheduling state on the back exit piece exists in the unit in front of the queue, or the target accelerator id is consistent with the source accelerator id of the unit, the read dependency is set to be 1 (namely the read dependency exists), and the read dependency unit id is set to be the queue unit id which is closest to the unit and meets the condition; if no coincidence unit exists, the read dependency is set to be 0, and the read dependency unit id is set to be 0;
if the source accelerator id is consistent with the target accelerator id of the unit in the front of the queue, the write dependency is set to be 1 (namely the write dependency exists), and the write dependency unit id is set to be the queue unit id which is closest to the unit and meets the condition; if no coincidence unit exists, the write dependency is set to be 0, and the write dependency unit is set to be 0;
the additional information is updated when the accelerator execution completion interrupt signal occurs as follows:
checking whether a source accelerator id is consistent with an execution completion accelerator id and no read dependency exists in an existing unit of the queue; if yes, effectively updating the source data of all the units meeting the condition to be 1 (namely the source data is valid); if not, updating is not carried out;
the additional information is updated as follows when a completion signal of the data handling module occurs:
setting the completion state of the id unit corresponding to the conveying information after the conveying to be 1;
if the corresponding id unit is in a take-over state of 0 after finishing the exit and the units in the queue have read and write dependencies on the corresponding id unit, updating the corresponding read and write dependencies of all the units meeting the conditions to be 0; if the corresponding id unit is in a takeover state of 1 after finishing exiting, and the corresponding id unit is not a head unit, the read dependency on the corresponding id unit is not cleared;
if the id unit corresponding to the conveying information after the conveying is finished is a head unit and the unit with the completion rear exit takeover state of 1 is included when the finished unit is cleared, clearing the read dependence corresponding to the unit;
meanwhile, the state information storage queue clears the completed queue unit according to the head updating instruction of the judging module, namely, the head is updated to the request unit id.
Furthermore, after the judging module judges that an executable data carrying instruction exists, the state information is input into the arbitration unit, the id of the next unit to be executed is judged, a data reading request is sent to the preset instruction storage module, and the corresponding information of the unit is read; judging whether the head unit of the queue is updated or not, namely judging whether the corresponding unit of the signal is the head unit or not, and if not, not updating the head; if yes, updating the head unit to the first unfinished unit behind the head unit, confirming whether a unit with a finished back exit takeover state of 1 is contained in the clearing unit, and if not, judging a next round of quasi-execution instruction; if the data is contained, sending an execution completion interrupt signal to the host, resetting the carrying takeover register to 0, and exiting the on-chip data scheduling state.
A chip data scheduling control method for an auxiliary 3D architecture near memory computing system comprises the following steps:
step S1: before a source accelerator is started, a host preset data scheduling instruction is obtained to ensure that a data scheduling controller correctly detects execution completion information of the source accelerator; except the information needed by data scheduling, the command contains information on whether to quit the on-chip scheduling after completion; judging whether a preset instruction queue is full, if so, feeding back write-in failure information to a host, if not, decoding a preset data scheduling instruction, judging whether the decoded preset data scheduling instruction is correct, if not, feeding back the write-in failure information, if so, storing the carrying information in the preset data scheduling instruction, and judging whether the instruction and the existing instruction have a dependency relationship according to state information in the preset data scheduling instruction;
step S2: acquiring an on-chip transport takeover request sent by a host, entering an on-chip data scheduling state, and judging executable data transport according to state information in a preset data scheduling instruction and an interrupt signal of an accelerator;
step S3: carrying data from one accelerator to another accelerator according to carrying information in a preset data scheduling instruction, and generating an instruction completion signal;
step S4: and after the carrying is finished, emptying the dependency relation related to the finished instruction, updating the queue, judging whether to send an execution finishing interrupt signal to the host or not according to the information whether to quit the in-chip scheduling state or not after the carrying is finished, and quitting the in-chip data scheduling state.
The invention has the advantages and beneficial effects that:
the scheduling controller and the scheduling control method can receive the pre-written data scheduling instruction from the host and can manage the access port of the host so as to access all memory addresses on a chip. According to the preset instruction, the controller sequentially executes data transferring and sends a completion signal to the host at the preset node, returns the control right of the access port and allows the host to read the final data. The invention can convert the interaction among all accelerators on the chip into on-chip communication, and the upper limit of the data size of pure on-chip near memory calculation is promoted to a single memory chip from a single memory bank, thereby greatly reducing the access frequency of a host and effectively improving the data handling efficiency and the energy efficiency ratio of a system. Meanwhile, the invention can multiplex the access path of the original host, and the additionally generated hardware expense is also smaller.
Drawings
FIG. 1 is a schematic diagram of a 3D architecture near memory computing accelerator system including an on-chip data scheduling controller according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an on-chip data scheduling controller according to an embodiment of the present invention.
FIG. 3 is a diagram of the structure and stored information of a state information store queue according to an embodiment of the present invention.
Fig. 4 is a flowchart of a method for controlling on-chip data scheduling according to an embodiment of the present invention.
Fig. 5a is a flowchart illustrating a process from obtaining a preset command to obtaining a transportation takeover request in the control method according to the embodiment of the invention.
Fig. 5b is a flowchart illustrating the process of acquiring the accelerator interrupt signal and sending the data transfer request according to the control method of the embodiment of the invention.
Fig. 5c is a flowchart illustrating data transfer to completion of scheduling in the control method according to the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
An on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system can be used for assisting a 3D architecture near memory computing accelerator system, as shown in fig. 1, the scheduling controller is used as a memory mapping IO device to access a system bus, so that a preset instruction can be written in a corresponding memory mapping address through a processor to realize scheduling control; the scheduling controller is connected with the host external interrupt receiving module to send an execution completion interrupt signal to the host and receive an accelerator interrupt signal to judge the accelerator state; and acquiring the memory access path of the takeover host and directly accessing the memory. As shown in fig. 2, the scheduling controller includes: the system comprises a preset instruction storage module, a data handling module and a state controller;
the system comprises a preset instruction storage module, a data handling module and a state controller, wherein the preset instruction storage module is used for storing a preset data scheduling instruction sent by a host and respectively sending handling information and state information to the data handling module and the state controller;
the data carrying module carries data from one accelerator to another accelerator through carrying information of the preset instruction storage module according to a data carrying starting instruction of the state controller, and sends an instruction completion signal to the state controller;
the state controller enters a chip data scheduling state according to a chip carrying takeover request of the host, judges an executable data carrying instruction according to state information of a preset instruction storage module and an interrupt signal of an accelerator, sends the executable data carrying starting instruction to the data carrying module, acquires an instruction completion signal, judges whether to send an execution completion interrupt signal to the host or not according to the state information after carrying is completed, and exits the chip data scheduling state.
The preset instruction storage module can receive a preset data scheduling instruction from the host, decode the preset data scheduling instruction and store the decoded preset data scheduling instruction in a preset instruction queue, transmit required state information into the state controller, update the queue according to the information sent by the state controller, and provide the required information for the data carrying module.
In the embodiment of the invention, the system addresses of the accelerators are represented as 32 bits, the number of the system addresses is 16, the address range corresponding to each accelerator is 32MB, and the bit width of the host memory access port is 64 bits.
The preset instruction storage module comprises: the system comprises an instruction decoder, a preset instruction queue and an instruction information register;
the instruction decoder receives a preset data scheduling instruction, judges whether a preset instruction queue is full or not, feeds back write-in failure information to the host if the preset instruction queue is full, decodes the preset data scheduling instruction if the preset instruction queue is not full, judges whether the decoded preset data scheduling instruction is correct or not, feeds back the write-in failure information to the host if the decoded preset data scheduling instruction is wrong, and respectively sends the carrying information and the state information in the preset data scheduling instruction to the preset instruction queue and the state controller if the decoded preset data scheduling instruction is correct;
presetting an instruction queue, writing in the carrying information, and updating the queue according to the updating information acquired from the state controller;
and the instruction information register reads corresponding carrying information from a preset instruction queue according to the reading request of the state controller so as to enable the data carrying module to read.
The preset scheduling instruction comprises carrying information and state information, the carrying information comprises a source internal memory starting address, a target internal memory starting address and data size, and the state information comprises: source accelerator id, target accelerator id, information whether to quit on-chip scheduling state after completion;
in the embodiment of the present invention, the command decoder includes a memory-mapped register for receiving a predetermined scheduling command from the host, the upper limit size of data handling is set to 16MB, and one possible command ordering mode is shown in table 1:
TABLE 1 Preset instruction sequencing mode for scheduling instructions
Source accelerator id Source memory start address Target accelerator id Target memory address Data size Whether to exit the on-chip scheduling state after completion
95-92 91-60 59-56 55-24 23-1 0
The instruction receiving memory-mapped register has a bit width of 96 bits.
When the preset instruction queue is full, the judging module feeds back write failure to the host; when the preset instruction queue is not full, the preset scheduling instruction can be successfully written into the preset instruction queue.
The instruction decoding module decodes the whole preset scheduling instruction into the format of table 1.
Judging whether the decoded preset data scheduling instruction is correct or not, comparing validity, and if the source internal memory starting address and the target internal memory starting address in the carrying information are not empty, determining that the data scheduling instruction is valid, otherwise, determining that the data scheduling instruction is invalid;
and calculating a source memory address range and a target memory address range of the preset accelerator through an adder, and respectively carrying out consistency comparison on the source memory starting address and the target memory starting address in the carrying information with the source memory address range and the target memory address range of the preset accelerator, wherein the source memory starting address and the target memory starting address are correct in the ranges, and otherwise, the source memory starting address and the target memory starting address are wrong.
In the embodiment of the invention, validity judgment is carried out on the initial address of the source memory and the initial address of the target memory by judging whether the two addresses are 0 at the same time, and data is considered to be valid when the two addresses are not 0 at the same time.
The consistency judgment is that the initial address of the memory of the preset accelerator is 0x8000000, the address ranges of the accelerators are sequentially arranged, namely 0x 80000000-0 x803fffff, 0x 80400000-0 x807fffff … … and the like, the decoder comprises an adder to calculate the address ranges of the source memory and the target memory, and whether the address in the preset scheduling instruction is in the address range of the preset accelerator is judged; if not, sending an instruction error signal to the host; if the two transmission interfaces are consistent, transmitting the initial address of the source memory, the initial address of the target memory and the data size to a preset instruction queue, transmitting the id of the source accelerator and the id of the target accelerator to a state controller if the on-chip scheduling state exits after the completion, and simultaneously pulling up effective valid signals of the two transmission interfaces; after completion, the module resets the instruction receiving memory mapped register to 0.
And updating the queue according to the updating information acquired from the state controller, namely writing the carrying information into the tail of the queue by a preset instruction queue, updating tail unit information, and updating head unit information according to a head updating request transmitted by the state controller.
In the embodiment of the invention, the preset instruction queue receives the information when the valid of the transmission interface of the decoder is pulled up, writes the information into the tail of the queue and updates the tail unit information; meanwhile, updating the information of the head unit according to a head updating request transmitted by the state controller; updating the empty and full condition according to the change in each period, and transmitting the updated empty and full condition to the instruction decoder; in this example, assume that the queue length is 16.
When the state controller sends an effective reading request, the instruction information register reads information of a corresponding source memory starting address, a target memory starting address and data size from a preset instruction queue according to the unit id provided by the state controller, and transmits the information to the data handling module.
The data carrying module comprises a data carrying controller and a temporary data cache, wherein the data carrying controller receives a data carrying starting instruction of the state controller, sequentially generates an access instruction according to carrying information provided by the preset instruction storage module, reads a data corresponding address from the source accelerator, stores the data into the temporary data cache, reads data from the temporary data cache, writes the data into a target accelerator corresponding address, and circularly carries out data carrying operation until all data are carried completely, and sends an instruction finishing signal to the state controller.
In the embodiment of the invention, a data handling module acquires a data handling starting instruction from a state controller, acquires a source memory starting address, a target memory starting address and data size information from an instruction information register, carries data from the source memory starting address corresponding to a source accelerator to the target memory starting address corresponding to a target accelerator, and sends an instruction completion signal to the state controller after completion;
the data handling module comprises a data handling controller and a temporary data cache unit; through the data carrying controller, the temporary data cache unit receives a data carrying initiating instruction of the state controller, sequentially generates an access instruction according to an acquisition source internal memory starting address, a target internal memory starting address and data size information provided by an instruction information register in a preset instruction storage module as carrying information, reads data from the source internal memory starting address and stores the data into the temporary data cache unit, reads the data from the temporary data cache unit and writes the data into the target internal memory starting address, and circularly executes data carrying operation until all data are carried completely, and sends an instruction finishing signal to the state controller;
the size of the temporary data cache unit may be set as an upper limit of single data transmission allowed in the memory access protocol, for example: the memory support protocol is DDR4 (Burst upper limit of 8), in which case the temporary data cache unit size may be set to 64B.
The state controller enters an in-chip data scheduling state according to an in-chip carrying takeover request of the host, judges an executable data carrying instruction according to state information of a preset instruction storage module and an interrupt signal of an accelerator, sends the data carrying request to the data carrying module, reads necessary information from the preset instruction storage module, judges whether an execution completion interrupt signal is sent to the host or not after carrying is completed, and exits the in-chip data scheduling state. The state controller comprises a state information storage queue and a judgment module;
and the state information storage queue is used for storing the state information (the source accelerator id, the target accelerator id and whether the on-chip scheduling state exits after completion), updating the queue information according to the existing queue information, the accelerator interrupt signal and the completion signal of the data handling module, and clearing the completed queue unit according to the head updating instruction of the judgment module.
A state information storage queue comprising state information and additional information, the state information comprising: source accelerator id, target accelerator id, whether exit the on-chip scheduling state after completion, the additional information includes: the source data is valid, whether the source data is complete, whether read-write dependency exists, and the id information of the related dependency unit, and one possible queue condition is shown in fig. 3;
the state information storage queue update rule is as follows:
the extra information is generated for the first time when the tail unit of the queue is written:
the source data is valid and is set to 0 or not;
if the scheduling state on the back exit piece exists in the unit in front of the queue, or the target accelerator id is consistent with the source accelerator id of the unit, the read dependency is set to be 1 (namely the read dependency exists), and the read dependency unit id is set to be the queue unit id which is closest to the unit and meets the condition; if no coincidence unit exists, the read dependency is set to be 0, and the read dependency unit id is set to be 0;
if the source accelerator id is consistent with the target accelerator id of the unit in the front of the queue, the write dependency is set to be 1 (namely the write dependency exists), and the write dependency unit id is set to be the queue unit id which is closest to the unit and meets the condition; if no coincidence unit exists, the write dependency exists and is set as 0, and the write dependency unit is set as 0;
the additional information is updated when the accelerator execution completion interrupt signal occurs as follows:
checking whether a source accelerator id is consistent with an execution completion accelerator id and no read dependency exists in an existing unit of the queue; if yes, effectively updating the source data of all the units meeting the condition to be 1 (namely the source data is effective); if not, not updating;
the additional information is updated as follows when the completion signal of the data handling module occurs:
setting the completion state of the id unit corresponding to the conveying information after the conveying to be 1;
if the corresponding id unit is in a take-over state of 0 after finishing the exit and the units in the queue have read and write dependencies on the corresponding id unit, updating the corresponding read and write dependencies of all the units meeting the conditions to be 0; if the corresponding id unit is in a takeover state of 1 after finishing exiting, and the corresponding id unit is not a head unit, the read dependency on the corresponding id unit is not cleared;
if the id unit corresponding to the conveying information after the conveying is finished is a head unit and the unit with the completion rear exit takeover state of 1 is included when the finished unit is cleared, clearing the read dependence corresponding to the unit;
meanwhile, the state information storage queue clears the completed queue unit according to the head updating instruction of the judging module, namely, the head is updated to the request unit id.
The judging module judges whether an executable data carrying instruction exists according to the on-chip carrying takeover request sent by the host and the state information in the state information storage queue; if not, waiting for the next period to judge again; if so, judging a next instruction to be executed according to the state information, initiating a data reading request to a preset instruction storage module, and sending a state confirmation instruction to the target accelerator; if the target accelerator feedback can receive data writing, transmitting a data carrying starting instruction to the data carrying module; if the feedback of the target accelerator can not be written, re-entering the judgment of the quasi-execution instruction, and confirming the state of the accelerator in the next round; and after receiving the data carrying module carrying completion signal, judging whether the head unit of the queue is updated or not according to the signal and the self state information, judging whether an execution completion interrupt signal is sent to the host or not, and exiting the in-chip data scheduling state.
After judging that an executable data carrying instruction exists, inputting the state information into an arbitration unit, judging the unit id to be executed next, initiating a data reading request to a preset instruction storage module, and reading the corresponding information of the unit; judging whether the head unit of the queue is updated or not, namely judging whether the corresponding unit of the signal is the head unit or not, and if not, not updating the head; if yes, updating the head unit to the first unfinished unit behind the head unit, confirming whether a unit with a finished back exit takeover state of 1 is contained in the clearing unit, and if not, judging a next round of quasi-execution instruction; if yes, sending an execution completion interrupt signal to the host, resetting the carrying takeover register to 0, and exiting the on-chip data scheduling state.
In the embodiment of the invention, a judging module comprises a single byte carrying takeover memory mapping register, a host writes 1 into the register, namely, the host enters a chip data scheduling state and takes over a memory access path of a chip, and the module judges whether an executable data carrying instruction exists or not according to state information in a state information storage queue, namely, a queue unit with effective and incomplete source data; inputting the information into an arbitration unit, and judging the unit id to be executed next by the arbitration unit (if the unit does not meet the condition, the unit id is not selected); initiating a data reading request to the preset instruction storage module, and reading corresponding unit information; sending a state confirmation instruction to a unit target accelerator to be executed; if the feedback target accelerator can receive data writing, transmitting a data carrying initiating request to the data carrying module; if the feedback target accelerator can not be written in, re-entering the quasi-execution instruction judgment, and confirming the state of the next round of accelerator; the arbitration unit should include a weight variable mechanism to prevent deadlock, such as round-robin design, but the highest weight needs to be updated back to the head unit after a data transfer request is successfully initiated; after receiving a data carrying module carrying completion signal, judging whether a head unit of a queue is updated according to the signal and self state information, namely whether a unit corresponding to the signal is the head unit; if not, the head is not updated; if yes, updating the head unit to a first unfinished unit behind the head unit, and confirming whether the cleaning unit comprises a unit with a finished back exit takeover state of 1; if not, judging the next round of simulated execution instruction; if yes, sending an execution completion interrupt signal to the host, resetting the transport takeover register to 0, and exiting the in-chip data scheduling state.
As shown in fig. 4, a method for controlling on-chip data scheduling of an auxiliary 3D architecture near memory computing system includes the following steps:
step S1: before a source accelerator is started, a host preset data scheduling instruction is obtained to ensure that a data scheduling controller correctly detects execution completion information of the source accelerator; except the information needed by data scheduling, the command contains information whether to quit the on-chip scheduling state after completion; judging whether a preset instruction queue is full, if so, feeding back writing failure information to a host, if not, decoding the preset data scheduling instruction, judging whether the decoded preset data scheduling instruction is correct, if not, feeding back the writing failure information, if so, storing the carrying information in the preset data scheduling instruction, and judging whether the instruction and the existing instruction have a dependency relationship according to state information in the preset data scheduling instruction;
step S2: acquiring an on-chip transport takeover request sent by a host, entering an on-chip data scheduling state, and judging executable data transport according to state information in a preset data scheduling instruction and an interrupt signal of an accelerator;
step S3: carrying data from one accelerator to another accelerator according to carrying information in a preset data scheduling instruction, and generating an instruction completion signal;
step S4: and after the carrying is finished, emptying the dependency relation related to the finished instruction, updating the queue, judging whether to send an execution finishing interrupt signal to the host or not according to the information whether to quit the in-chip scheduling state or not after the carrying is finished, and quitting the in-chip data scheduling state.
In an embodiment of the present invention, a method for determining host control and controller scheduling of an on-chip data scheduling controller is provided, where a single instruction is a flow of interest, as shown in fig. 5a, 5b, and 5c, the method includes the following steps:
the host needs to write a preset data scheduling instruction into the instruction receiving memory mapping register before starting the source accelerator so as to ensure that the data scheduling controller correctly detects the execution completion information of the source accelerator;
after receiving a preset data scheduling instruction, the data scheduling controller transmits the preset data scheduling instruction to a preset instruction storage module, the preset instruction storage module judges whether an instruction queue is full, and if the instruction queue is full, the data scheduling controller feeds back write failure to a host; if the instruction is not full, the module judges the instruction consistency after decoding, and if the instruction is wrong, the module feeds back the instruction error to the host; if the state information is correct, the required information is stored in the preset instruction queue and the state information storage queue respectively, the instruction receiving memory mapping register is reset to be 0, and meanwhile the state controller generates additional state information for the first time;
after finishing writing all initial data and a preset data scheduling instruction, the host writes 1 into the transportation takeover memory mapping register, sends an on-chip transportation takeover request to enable the on-chip transportation takeover request to enter an on-chip data scheduling state, and takes over a memory access path of the memory chip;
the data scheduling controller detects the execution completion condition of accelerators related to all preset instructions written into the controller, and the data scheduling controller is irrelevant to whether the data scheduling controller enters an on-chip data scheduling state or not; after receiving execution completion information of a certain accelerator, effectively writing all corresponding source accelerators (without leading uncompleted dependent instructions) in a state information storage queue into 1;
after the data scheduling controller enters a data scheduling state in a chip, the state controller checks the information of the existing effective source accelerator in each period when data transportation is not initiated; if the accelerator does not meet the condition, waiting for the next period to repeat the action; if so, judging the next execution carrying instruction through arbitration, reading corresponding information from a preset instruction storage module, and sending a reading request to the target accelerator state memory mapping register to confirm the target accelerator state; if the accelerator is in use, carrying out arbitration again in the next period; if the accelerator is idle, a corresponding data handling instruction is initiated to the data handling module;
after receiving a data carrying instruction, the data carrying module acquires data carrying information from a preset instruction storage module, sequentially generates an access instruction, reads data from a source memory address and stores the data to the temporary data cache, reads data from the temporary data cache and writes the data into a target memory address, and circularly carries out data carrying operation until all data are carried out, and sends an instruction completion signal to the state controller;
the state controller marks the queue unit corresponding to the instruction as finished, clears the dependency relation related to the queue unit corresponding to the instruction according to the updating rule, marks a first unfinished unit behind the queue unit corresponding to the instruction as a new head unit if the queue unit corresponding to the instruction is a head unit, and transmits the information to the preset instruction storage module; and meanwhile, if the cleared unit has the information for exiting the on-chip scheduling state, emptying the dependency relationship related to the queue unit corresponding to the instruction, resetting the carrying takeover register to be 0, and sending execution completion interrupt to the host by the data scheduling controller to exit the on-chip scheduling state.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the embodiments of the present invention in nature.

Claims (10)

1. An on-chip data scheduling controller of an assisted 3D architecture near memory computing system, comprising: the device comprises a preset instruction storage module, a data handling module and a state controller, and is characterized in that:
the preset instruction storage module stores a preset data scheduling instruction sent by the host computer and sends the carrying information and the state information to the data carrying module and the state controller respectively;
the data carrying module carries data from one accelerator to another accelerator through carrying information of the preset instruction storage module according to a data carrying starting instruction of the state controller, and sends an instruction completion signal to the state controller;
the state controller enters a chip data scheduling state according to a chip carrying takeover request of the host, judges an executable data carrying instruction according to state information of a preset instruction storage module and an interrupt signal of an accelerator, sends the executable data carrying starting instruction to the data carrying module, acquires an instruction completion signal, judges whether to send an execution completion interrupt signal to the host or not according to the state information after carrying is completed, and exits the chip data scheduling state.
2. The on-chip data scheduling controller of an assisted 3D architecture near memory computing system of claim 1, wherein: the preset instruction storage module comprises: the system comprises an instruction decoder, a preset instruction queue and an instruction information register;
the instruction decoder receives a preset data scheduling instruction, judges whether a preset instruction queue is full or not, feeds back write-in failure information to the host if the preset instruction queue is full, decodes the preset data scheduling instruction if the preset instruction queue is not full, judges whether the decoded preset data scheduling instruction is correct or not, feeds back the write-in failure information to the host if the decoded preset data scheduling instruction is wrong, and respectively sends carrying information and state information in the preset data scheduling instruction to the preset instruction queue and the state controller if the decoded preset data scheduling instruction is correct;
the preset instruction queue writes in the carrying information and updates the queue according to the updating information acquired from the state controller;
and the instruction information register reads corresponding carrying information from a preset instruction queue according to the reading request of the state controller so as to enable the data carrying module to read.
3. The on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system according to claim 2, wherein: and judging whether the decoded preset data scheduling instruction is correct or not, calculating a source memory address range and a target memory address range of the preset accelerator through an adder, and respectively comparing the source memory starting address and the target memory starting address in the carrying information with the source memory address range and the target memory address range of the preset accelerator in a consistency manner, wherein the result is correct in the range, and otherwise, the result is wrong.
4. The on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system of claim 3, wherein: and before consistency comparison, effectiveness comparison is carried out, when the source memory starting address and the target memory starting address in the carrying information are not empty simultaneously, the source memory starting address and the target memory starting address are effective, and otherwise, the source memory starting address and the target memory starting address are invalid.
5. The on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system according to claim 2, wherein: and updating the queue according to the updating information acquired from the state controller, namely writing the carrying information into the tail of the queue by a preset instruction queue, updating tail unit information, and updating head unit information according to a head updating request transmitted by the state controller.
6. The on-chip data scheduling controller of an assisted 3D architecture near memory computing system of claim 1, wherein: the data carrying module comprises a data carrying controller and a temporary data cache, the data carrying controller receives a data carrying starting instruction of the state controller, sequentially generates an access instruction according to carrying information provided by the preset instruction storage module, reads a data corresponding address from the source accelerator, stores the data into the temporary data cache, reads data from the temporary data cache, writes the data into a target accelerator corresponding address, and circularly carries out data carrying operation until all data are carried, and sends an instruction finishing signal to the state controller.
7. The on-chip data scheduling controller of an assisted 3D architecture near memory computing system of claim 1, wherein: the state controller comprises a state information storage queue and a judgment module;
the state information storage queue is used for storing the state information, updating the queue information according to the existing queue information, the accelerator interrupt signal and the completion signal of the data handling module, and clearing the completed queue unit according to the head updating instruction of the judging module;
the judging module judges whether an executable data carrying instruction exists according to an on-chip carrying takeover request sent by the host and the state information in the state information storage queue; if not, waiting for the next period to judge again; if so, judging a next instruction to be executed according to the state information, initiating a data reading request to a preset instruction storage module, and sending a state confirmation instruction to the target accelerator; if the target accelerator feedback can receive data writing, transmitting a data carrying starting instruction to the data carrying module; if the feedback of the target accelerator can not be written, re-entering the judgment of the quasi-execution instruction, and confirming the state of the accelerator in the next round; and after receiving the data carrying module carrying completion signal, judging whether the head unit of the queue is updated or not according to the signal and the self state information, judging whether an execution completion interrupt signal is sent to the host or not, and exiting the in-chip data scheduling state.
8. The on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system according to claim 7, wherein: the state information storage queue comprises state information and additional information, wherein the state information comprises: source accelerator id, target accelerator id, whether exit the on-chip scheduling state after completion, the additional information includes: the source data is valid, whether the source data is finished or not, whether read-write dependence exists or not and relevant dependence unit id information exist or not;
the state information storage queue update rule is as follows:
the extra information is generated for the first time when the tail unit of the queue is written:
the source data is valid and is set to 0 or not;
if the scheduling state on the back exit chip exists in the unit in front of the queue or the target accelerator id is consistent with the source accelerator id of the unit, the read dependency is set to be 1, and the read dependency unit id is set to be the queue unit id which is closest to the unit and meets the conditions; if no coincidence unit exists, the read dependency is set to be 0, and the read dependency unit id is set to be 0;
if the source accelerator id is consistent with the target accelerator id of the unit in the front of the queue, the write dependency is set to be 1, and the write dependency unit id is set to be the queue unit id which is closest to the unit and meets the condition; if no coincidence unit exists, the write dependency is set to be 0, and the write dependency unit is set to be 0;
the additional information is updated when the accelerator execution completion interrupt signal occurs as follows:
checking whether a source accelerator id is consistent with an execution completion accelerator id and no read dependency exists in an existing unit of the queue; if yes, effectively updating the source data of all the units meeting the conditions to be 1; if not, not updating;
the additional information is updated as follows when the completion signal of the data handling module occurs:
setting the completion state of the id unit corresponding to the conveying information after the conveying to be 1;
if the corresponding id unit is in a take-over state of 0 after finishing the exit and the units in the queue have read and write dependencies on the corresponding id unit, updating the corresponding read and write dependencies of all the units meeting the conditions to be 0; if the corresponding id unit is in a takeover state of 1 after finishing exiting, and the corresponding id unit is not a head unit, the read dependency on the corresponding id unit is not cleared;
if the id unit corresponding to the transportation information after transportation is the head unit and the unit with the completion rear exit takeover state of 1 is included in the process of clearing the completed unit, clearing the read dependence corresponding to the unit;
meanwhile, the state information storage queue clears the completed queue unit according to the head updating instruction of the judging module, namely, the head is updated to the request unit id.
9. The on-chip data scheduling controller of an auxiliary 3D architecture near memory computing system of claim 7, wherein: the judging module inputs the state information into the arbitration unit after judging that the executable data carrying instruction exists, judges the id of the next unit to be executed, initiates a data reading request to the preset instruction storage module, and reads the corresponding information of the unit; judging whether the head unit of the queue is updated or not, namely judging whether the corresponding unit of the signal is the head unit or not, and if not, not updating the head; if yes, updating the head unit to a first unfinished unit behind the head unit, confirming whether a unit with an ejection taking-over state of 1 after completion is included in the clearing unit, and if not, judging a next round of quasi-execution instruction; if the data is contained, sending an execution completion interrupt signal to the host, resetting the carrying takeover register to 0, and exiting the on-chip data scheduling state.
10. A method for controlling on-chip data scheduling of an auxiliary 3D architecture near memory computing system is characterized by comprising the following steps:
step S1: before a source accelerator is started, acquiring a host preset data scheduling instruction, wherein the instruction comprises information whether to exit from an on-chip scheduling state after completion; judging whether a preset instruction queue is full, if so, feeding back writing failure information to a host, if not, decoding the preset data scheduling instruction, judging whether the decoded preset data scheduling instruction is correct, if not, feeding back the writing failure information, if so, storing the carrying information in the preset data scheduling instruction, and judging whether the instruction and the existing instruction have a dependency relationship according to state information in the preset data scheduling instruction;
step S2: acquiring an on-chip transport takeover request sent by a host, entering an on-chip data scheduling state, and judging executable data transport according to state information in a preset data scheduling instruction and an interrupt signal of an accelerator;
step S3: carrying data from one accelerator to another accelerator according to carrying information in a preset data scheduling instruction, and generating an instruction completion signal;
step S4: and after the carrying is finished, emptying the dependency relation related to the finished instruction, updating the queue, judging whether to send an execution finishing interrupt signal to the host or not according to the information whether to quit the in-chip scheduling state or not after the carrying is finished, and quitting the in-chip data scheduling state.
CN202210856427.6A 2022-07-21 2022-07-21 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system Active CN114996205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210856427.6A CN114996205B (en) 2022-07-21 2022-07-21 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210856427.6A CN114996205B (en) 2022-07-21 2022-07-21 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system

Publications (2)

Publication Number Publication Date
CN114996205A true CN114996205A (en) 2022-09-02
CN114996205B CN114996205B (en) 2022-12-06

Family

ID=83022523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210856427.6A Active CN114996205B (en) 2022-07-21 2022-07-21 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system

Country Status (1)

Country Link
CN (1) CN114996205B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680230A (en) * 2023-05-22 2023-09-01 无锡麟聚半导体科技有限公司 Hardware acceleration circuit and chip

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820657A (en) * 2015-05-14 2015-08-05 西安电子科技大学 Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
CN105843775A (en) * 2016-04-06 2016-08-10 中国科学院计算技术研究所 On-chip data partitioning read-write method, system and device
US9740235B1 (en) * 2015-03-05 2017-08-22 Liming Xiu Circuits and methods of TAF-DPS based interface adapter for heterogeneously clocked Network-on-Chip system
US10003554B1 (en) * 2015-12-22 2018-06-19 Amazon Technologies, Inc. Assisted sideband traffic management
EP3576328A1 (en) * 2017-03-24 2019-12-04 Huawei Technologies Co., Ltd. Data transmission method and apparatus
US10755772B1 (en) * 2019-07-31 2020-08-25 Shanghai Cambricon Information Technology Co., Ltd Storage device and methods with fault tolerance capability for neural networks
WO2021195949A1 (en) * 2020-03-31 2021-10-07 华为技术有限公司 Method for scheduling hardware accelerator, and task scheduler
CN114297101A (en) * 2021-12-31 2022-04-08 海光信息技术股份有限公司 Method and system for recording memory access source
CN114328322A (en) * 2022-03-17 2022-04-12 之江实验室 DMA controller operation method capable of configuring function mode
CN114398308A (en) * 2022-01-18 2022-04-26 上海交通大学 Near memory computing system based on data-driven coarse-grained reconfigurable array
CN114399035A (en) * 2021-12-30 2022-04-26 北京奕斯伟计算技术有限公司 Method for transferring data, direct memory access device and computer system
CN114610394A (en) * 2022-03-14 2022-06-10 海飞科(南京)信息技术有限公司 Instruction scheduling method, processing circuit and electronic equipment
CN114661644A (en) * 2022-02-17 2022-06-24 之江实验室 Pre-stored DMA device of auxiliary 3D architecture near memory computing accelerator system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740235B1 (en) * 2015-03-05 2017-08-22 Liming Xiu Circuits and methods of TAF-DPS based interface adapter for heterogeneously clocked Network-on-Chip system
CN104820657A (en) * 2015-05-14 2015-08-05 西安电子科技大学 Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
US10003554B1 (en) * 2015-12-22 2018-06-19 Amazon Technologies, Inc. Assisted sideband traffic management
CN105843775A (en) * 2016-04-06 2016-08-10 中国科学院计算技术研究所 On-chip data partitioning read-write method, system and device
EP3576328A1 (en) * 2017-03-24 2019-12-04 Huawei Technologies Co., Ltd. Data transmission method and apparatus
US10755772B1 (en) * 2019-07-31 2020-08-25 Shanghai Cambricon Information Technology Co., Ltd Storage device and methods with fault tolerance capability for neural networks
WO2021195949A1 (en) * 2020-03-31 2021-10-07 华为技术有限公司 Method for scheduling hardware accelerator, and task scheduler
CN114399035A (en) * 2021-12-30 2022-04-26 北京奕斯伟计算技术有限公司 Method for transferring data, direct memory access device and computer system
CN114297101A (en) * 2021-12-31 2022-04-08 海光信息技术股份有限公司 Method and system for recording memory access source
CN114398308A (en) * 2022-01-18 2022-04-26 上海交通大学 Near memory computing system based on data-driven coarse-grained reconfigurable array
CN114661644A (en) * 2022-02-17 2022-06-24 之江实验室 Pre-stored DMA device of auxiliary 3D architecture near memory computing accelerator system
CN114610394A (en) * 2022-03-14 2022-06-10 海飞科(南京)信息技术有限公司 Instruction scheduling method, processing circuit and electronic equipment
CN114328322A (en) * 2022-03-17 2022-04-12 之江实验室 DMA controller operation method capable of configuring function mode

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FILIP ADAMEC ET AL.: "《Introduction to the new Packet Triggered Architecture for pipelined and parallel data processing》", 《 PROCEEDINGS OF 21ST INTERNATIONAL CONFERENCE RADIOELEKTRONIKA 2011》 *
刘义鹏等: "基于Zynq的皮下指纹OCT数据采集系统设计", 《浙江工业大学学报》 *
张磊等: "可重塑处理器:用户可定义的加速器中处理器架构", 《网络新媒体技术》 *
曾思涛 等: "《基于eFLASH存算一体架构的卷积神经网络加速器设计》", 《万方》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680230A (en) * 2023-05-22 2023-09-01 无锡麟聚半导体科技有限公司 Hardware acceleration circuit and chip
CN116680230B (en) * 2023-05-22 2024-04-09 无锡麟聚半导体科技有限公司 Hardware acceleration circuit and chip

Also Published As

Publication number Publication date
CN114996205B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
US8185713B2 (en) Flexible sequencer design architecture for solid state memory controller
CN114328322B (en) DMA controller operation method capable of configuring function mode
CN114996205B (en) On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system
US20080147969A1 (en) Separate Handling of Read and Write of Read-Modify-Write
US7337260B2 (en) Bus system and information processing system including bus system
CN114661644B (en) Pre-storage DMA device for auxiliary 3D architecture near-memory computing accelerator system
WO2011155096A1 (en) Data transfer control device, integrated circuit of same, data transfer control method of same, data transfer completion notification device, integrated circuit of same, data transfer completion notification method of same, and data transfer control system
US20150033090A1 (en) Memory system capable of increasing data transfer efficiency
CN111858141B (en) System-on-chip memory control device and system-on-chip
US20050114556A1 (en) System for improving PCI write performance
CN112286852B (en) Data communication method and data communication device based on IIC bus
WO2021068850A1 (en) Transaction management method and system, network device and readable storage medium
CN114968863A (en) Data transmission method based on DMA controller
WO2022028223A1 (en) Method and system for controlling data transmission by data flow architecture neural network chip
CN112825028B (en) Method for writing in a volatile memory and corresponding integrated circuit
US7987301B1 (en) DMA controller executing multiple transactions at non-contiguous system locations
CN111338998B (en) FLASH access processing method and device based on AMP system
CN108234147B (en) DMA broadcast data transmission method based on host counting in GPDSP
CN114490106A (en) Information exchange system and method
US6799293B2 (en) Sparse byte enable indicator for high speed memory access arbitration method and apparatus
CN112799974B (en) Control method and system of memory card
KR101041838B1 (en) Mobile storage control device and method
CN1234550B (en) Input/output bus system
US20230315573A1 (en) Memory controller, information processing apparatus, and information processing method
JP2004126911A (en) Control unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant