CN116680230A - Hardware acceleration circuit and chip - Google Patents

Hardware acceleration circuit and chip Download PDF

Info

Publication number
CN116680230A
CN116680230A CN202310573241.4A CN202310573241A CN116680230A CN 116680230 A CN116680230 A CN 116680230A CN 202310573241 A CN202310573241 A CN 202310573241A CN 116680230 A CN116680230 A CN 116680230A
Authority
CN
China
Prior art keywords
unit
interface module
instruction
chip
write operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310573241.4A
Other languages
Chinese (zh)
Other versions
CN116680230B (en
Inventor
邓炯麟
梅平
王吉
尹棋烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Linju Semiconductor Technology Co ltd
Original Assignee
Wuxi Linju Semiconductor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Linju Semiconductor Technology Co ltd filed Critical Wuxi Linju Semiconductor Technology Co ltd
Priority to CN202310573241.4A priority Critical patent/CN116680230B/en
Publication of CN116680230A publication Critical patent/CN116680230A/en
Application granted granted Critical
Publication of CN116680230B publication Critical patent/CN116680230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a hardware acceleration circuit and a chip, which are used for carrying out one-to-one acceleration operation on equipment mounted on a bus in the chip, wherein a front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of access equipment and a write operation instruction of the acceleration equipment; the master control module executes corresponding operation on the write operation instruction based on the time reference provided by the chip before the effective time, and transmits compiling and configuration information corresponding to the write operation instruction to the subordinate module; the latter interface module is used for accessing the equipment based on the transmission of the read operation instruction, and scheduling the equipment based on the received compiling and configuration information; the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the computing resources of the chip are saved through the execution of the corresponding operation of the main control module, so that the devices can be scheduled at the precise effective moment, and the devices are mutually decoupled, so that the device register has wide application scenes.

Description

Hardware acceleration circuit and chip
Technical Field
The present application relates to the field of chip design and application technologies, and in particular, to a hardware acceleration circuit and a chip.
Background
In the chip, the processor sends the register configuration to each device mounted on the bus through the bus, but the register configuration of the devices is issued in series on the bus, the issuing process is generally configured and scheduled by software based on the service sequence, the execution speed of the software is limited by factors such as the processor, the memory and the like, in a system with strict requirements on the system time delay requirement, the device execution sequence and the device execution time, the conventional serial issuing is difficult to meet the requirements, the common practice is to add additional synchronous processing operation between the devices, but the synchronous adding mode makes decoupling difficult between the devices, so that the use scene is greatly limited, and the complexity of the system is increased.
It should be noted that the foregoing description of the background art is only for the purpose of providing a clear and complete description of the technical solution of the present application and is presented for the convenience of understanding by those skilled in the art. The above-described solutions are not considered to be known to the person skilled in the art simply because they are set forth in the background of the application section.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a hardware acceleration circuit and a chip, which are used for solving the problem that it is difficult to implement configuration scheduling on devices inside the chip under the requirement of mutual decoupling in the prior art.
In order to achieve the above object, the present application provides a hardware acceleration circuit for performing configuration scheduling on a device mounted on a bus inside a chip, the hardware acceleration circuit comprising: front-stage interface module, main control module and back-stage interface module, wherein:
the front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of the access equipment and a write operation instruction of the acceleration equipment;
the master control module is connected with the front-stage interface module, performs corresponding operation on the write operation instruction based on a time reference provided by the chip before the effective moment, and transmits compiling and configuration information corresponding to the write operation instruction to the lower-stage module, wherein the time reference is provided by a time module in the chip;
the back-stage interface module is connected with the front-stage interface module and the main control module, is used for equipment access based on transmission of a read operation instruction, and is used for scheduling equipment based on received compiling and configuration information;
the equipment for executing scheduling is mutually decoupled under the action of the corresponding hardware acceleration circuit.
Optionally, the time module includes a counter, wherein the counter counts cycles at a constant frequency after the chip reset is validated.
Optionally, the pre-stage interface module includes: address decoding unit, accelerating operation register, head memory unit and data memory unit, wherein:
the address decoding unit is connected with the bus and is used for converting and distributing the input instruction according to the address;
the acceleration operation register is connected with the address decoding unit and mapped to a corresponding device register through configuration;
the head memory unit is connected with the address decoding unit and is used for mapping head information of the instruction;
the data memory unit is connected with the address decoding unit and is used for mapping data information of the instruction.
Optionally, the header memory unit and the data memory unit are identical in depth in the same hardware acceleration circuit.
Optionally, the main control module includes: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on a write operation instruction, wherein after a chip is reset, the state machine control unit executes an idle state; after the device register is mapped, the state machine control unit orders instructions through the ordering unit based on the effective moment; after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the rear-stage interface module through the preprocessing and executing unit before the time of generating efficiency; and when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit outputs a finishing signal to enable the state machine control unit to enter a release state through the release unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the release unit.
Optionally, the process of ordering includes: after the sorting is triggered, firstly clearing the sorting result, and then sorting all the effective instructions according to the effective time of the header information; the preprocessing and executing unit transmits header information and data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.
Optionally, comparing the time of the validation of the instruction with a time reference; and transmitting the head information and the data information of the write operation in the instruction to the later interface module by K clock cycles before the effective moment, wherein the K clock cycles are equal to the time consumption of the work of the later interface module, and the number of K is determined by a communication protocol between a bus and equipment.
Optionally, the back-stage interface module includes a read-write operation conversion unit and an output control unit, where the read-write operation conversion unit is connected with the front-stage interface module and the main control module, where the read-write operation conversion unit is used to keep the communication protocol of the bus and the device consistent; the output control unit is connected with the read-write operation conversion unit.
In order to achieve the above purpose, the present application provides a chip, which includes at least one hardware acceleration circuit for performing configuration scheduling on devices mounted on a bus inside the chip, where the hardware acceleration circuit corresponds to the devices one by one.
As described above, the hardware acceleration circuit and the chip have the following beneficial effects:
according to the hardware acceleration circuit and the chip, the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the corresponding operation is executed through the main control module, so that the computing resources of the chip are saved, the devices can be configured and scheduled at the accurate effective moment, the devices are mutually decoupled, and the hardware acceleration circuit and the chip have wide application scenes.
Drawings
Fig. 1 shows a schematic diagram of an exemplary chip internal simplified frame of the present application.
Fig. 2 shows a first schematic diagram of the hardware acceleration circuit of the present application.
FIG. 3 is a second schematic diagram of the hardware acceleration circuit of the present application.
Fig. 4 is a schematic diagram showing a state machine control unit performing a state jump according to the present application.
Fig. 5 shows a schematic diagram of a process of sorting by the sorting unit of the present application.
FIG. 6 is a diagram illustrating the result of ordering an instruction of the present application when it is in effect.
FIG. 7 is a schematic diagram illustrating the operation of the preprocessing and execution unit of the present application.
Fig. 8 shows a schematic diagram of the operations performed by the release unit of the present application.
Description of the reference numerals
1-a hardware acceleration circuit; 11-a front-end interface module; 12-a main control module; 13-a back-end interface module; 2-time module.
Description of the embodiments
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application.
Please refer to fig. 1 to 8. It should be noted that the illustrations provided in the present embodiment merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
Fig. 1 shows a simplified frame diagram of the inside of a SOC (system on chip) chip, where a processor sends register configurations to the various devices mounted under the bus via the bus (the devices in fig. 1 include device 1, device 2 and device 3; the dashed lines represent the data flow). Assuming that the SOC chip in fig. 1 is used as a communication system, the requirement of the communication system on system delay is very strict, and the execution sequence and execution time of each device are also strictly required, further, if the communication system is a MIMO (Multiple-Input Multiple-Output) Multiple-Input Multiple-Output system (a communication system using Multiple antennas at the transmitting end and the receiving end simultaneously), the data alignment of Multiple antenna receiving/transmitting channels is required, and the simultaneous operation is performed, corresponding to the requirement that the device 1, the device 2 and the device 3 are required to operate simultaneously in fig. 1, but the register configuration of the device is issued serially on the bus, so that the device 1, the device 2 and the device 3 cannot operate simultaneously. The conventional method is to do additional synchronization processing among the devices 1, 2 and 3, because the existing communication system is mostly realized by hardware, decoupling cannot be achieved among the devices, and the use scene of the communication system is flexible and changeable, so that the synchronization processing of the devices is very difficult to realize, on the other hand, the issuing process is generally configured and scheduled by software based on the service sequence, the execution speed of the software is limited by factors such as a processor and a memory, and if the configuration and scheduling are operated by adopting a pure hardware circuit, the speed of the configuration and scheduling can be greatly improved, and meanwhile, the operation resources of the processor and the memory can be saved. It should be noted that, for convenience of description, only three devices are shown in fig. 1, and in actual use, the number of devices should be set according to the requirement.
Therefore, the application provides a hardware acceleration circuit and a chip, which are implemented as follows:
as shown in fig. 2 and 3, the present embodiment provides a hardware acceleration circuit 1 for performing configuration scheduling on devices mounted on a bus inside a chip, where the hardware acceleration circuit 1 includes: a front-stage interface module 11, a main control module 12 and a rear-stage interface module 13, wherein:
as shown in fig. 2 and 3, the front interface module 11 is connected to the bus, and is configured to perform custom compiling and configuration on an input read operation instruction of the access device and a write operation instruction of the acceleration device.
Specifically, as an example, as shown in fig. 3, the front-stage interface module 11 includes: address decoding unit, acceleration operation register, head memory unit Head Ram and Data memory unit Data Ram, wherein:
the address decoding unit is connected with the bus and is used for converting and distributing the input instruction according to the address;
the acceleration operation register is connected with the address decoding unit and mapped to the corresponding equipment register through configuration;
the head memory unit is connected with the address decoding unit and is used for mapping head information of the instruction;
the data memory unit is connected with the address decoding unit and is used for mapping data information of the instruction.
It should be noted that operations on the bus for the device are classified into a read operation and a write operation, and the processor controls the device by configuring the device register, so that the operations for the device need to be accurately accelerated according to the time reference are write operations, and the operations for the device need to be accessed are read operations. Each write operation on the bus includes a write operation address and write operation data. The hardware acceleration circuit 1 self-defines an internal instruction, a complete instruction comprises an instruction head and instruction data, the content of the instruction head is the moment when the operation is effective, and the content of the instruction data is the address and the data of the operation. The instruction header and the instruction data are stored separately, and each address stores an instruction. The following tables represent instructions format and instructions deposit description, respectively:
more specifically, the Head memory unit Head Ram and the Data memory unit Data Ram are identical in depth in the same hardware acceleration circuit. It should be noted that, when the bus transmits the instruction to the front interface module 11, a table is maintained in the memory Ram of the front interface module 11, to record whether the instruction is valid, and the depth of the memory Ram is assumed to be N, where N is a natural number greater than 1. As shown in the following table, the value of each address in the memory Ram is used to indicate whether the instruction of the address is valid, the value 0 indicates invalid, the value 1 indicates valid, and the sequence number indicates the address of the memory Ram, wherein when the same address is written with header information and Data information respectively, the value representing the address receives a complete instruction, the value of the address is marked as 1, the instruction representing the address is valid, and the valid instruction is mapped by the memory unit Head Ram and the Data memory unit Data Ram.
It should be further noted that, an application specific integrated circuit (english full name Application Specific Integrated Circuit, abbreviated as ASIC) may be used to set the front-stage interface module 11, and the computing power and computing efficiency of the application specific chip may be customized according to the specific user requirements and the specific electronic system design and manufacture, or an IP core (IP english full name Intellectual Property, which is a section of hardware description language program with a specific circuit function, and the program is irrelevant to the integrated circuit process, and may be transplanted into a different semiconductor process to produce an integrated circuit chip) may be used to set the front-stage interface module 11, so long as the input read operation instruction of the access device and the write operation instruction of the acceleration device may be compiled and configured in a customized manner, and the setting form of any front-stage interface module 11 is not limited to this embodiment.
As shown in fig. 2 and 3, the main control module 12 is connected to the front interface module 11, performs a corresponding operation on the write operation instruction before the time of the validation based on the time reference provided by the chip, and transmits the compiling and configuration information corresponding to the write operation instruction to the lower module, wherein the time reference is provided by the time module 2 inside the chip, and the lower module refers to the rear interface module 13.
Specifically, as an example, as shown in fig. 2 and 3, the time module 2 includes a counter, wherein the counter counts cycles at a constant frequency after the chip reset is validated. It should be noted that the component for providing the time reference may be a phase-locked loop, a clock chip, etc., and any component is applicable as long as the component can provide the time reference, and is not limited to the embodiment.
Specifically, as shown in fig. 2 and 3, as an example, the main control module 12 includes: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on the write operation instruction, wherein the state machine control unit executes the state jump process please refer to fig. 4:
after the chip is reset, the state machine control unit executes an idle state;
after the device register is mapped, the state machine control unit orders the instructions through the ordering unit based on the effective moment;
after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the later-stage interface module before the time of generating efficiency through the preprocessing and executing unit;
when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit releasing unit outputs a finishing signal to enable the state machine control unit to enter a releasing state through the releasing unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the releasing unit.
Further, the sorting process is as shown in fig. 5:
step one: when the ordering is triggered, the ordering result is emptied first, specifically, the ordering result queue is emptied and the read pointer is reset.
Step two: and ordering all the valid instructions according to the effective time of the header information. In this embodiment, the order from small to large is performed according to the effective time of the header information, and particularly, when the effective time of the instruction is smaller than the value of the current time reference (the current count value if the counter is the current count value), the effective time is considered to be the instruction of the next period after overflowing; and when the effective time of the instruction is larger than the value of the current time reference, the effective time is considered to be the instruction of the current period. Referring to FIG. 6, in the case of ordering, T1 < the current time reference < T2, and the ordering result is T2 < T1.
Step three: and sequentially writing the ordered results into an ordered result queue, and maintaining a read pointer. The contents stored in the queue are Ram addresses of corresponding instructions, wherein the ordering result queue refers to the following table:
it should be noted that, the content stored in the ordering result queue is the Ram address of the corresponding instruction, the sequence number 0 is the maximum value of the ordering result, and the Ram address of the valid instruction is assumed to be: 2,4 and 6, when the time is effective, the time of the address 6 is less than the current time reference is less than the time of the address 2 is less than the time of the address 4, according to the ordering criterion, the time of the address 6 overflows the instruction of the next period, the time of the address 4 is the instruction of the current period, the time of the address 2 is the instruction of the current period, and the final ordering result queue is shown as a table, wherein the Ram address of the instruction corresponding to the read pointer is the address 6.
Step four: after the sequencing is completed, the read pointer points to the Ram address of the instruction with the smallest effective moment, the state machine control unit generates a starting signal, and the preprocessing and executing unit transmits the head information and the data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.
Further, the operations performed by the preprocessing and executing unit are shown in fig. 7, and include:
step five: after receiving the starting signal, the preprocessing and executing unit acquires the instruction head and the instruction data of the Ram address of the instruction pointed by the read pointer.
Step six: comparing the effective time of the instruction with a time reference, judging whether the time reference is equal to the difference value between the effective time of the instruction and K clock cycles, and if so, continuing to execute the step seven; if not, returning to the step five, wherein K clocks refer to K system clocks, and K clock periods are equal to the working time consumption of the later interface module, the number of K is determined by a communication protocol between the bus and the device, and the setting of the communication protocol is not described in detail herein.
Step seven: and transmitting the header information and the data information of the write operation instruction in the instruction to a later interface module.
Step eight: the preprocessing and executing unit generates a completion signal to enable the state machine control unit to enter a release state through the release unit.
Further, the operations performed by the release unit, as shown in fig. 8, include:
step nine: after receiving the completion signal generated by the preprocessing and executing unit, the value of the Ram address of the instruction pointed by the read pointer is set to 0.
Step ten: maintaining a read pointer, directing the read pointer to the last member of the sequencing result queue, judging whether the member pointed by the read pointer exists, and if so, jumping to the fourth step, and operating the preprocessing and executing unit; if not, the state machine control unit returns to the idle state through the release unit.
It should be further noted that, the application specific integrated circuit may be used to set the main control module 12, or the IP core may be used to check the main control module 12, so long as the corresponding operation can be executed on the write operation instruction before the time of the validation based on the time reference provided by the chip, and the compiling and configuration information corresponding to the write operation instruction is transmitted to the subordinate module, and the setting form of any main control module 12 is not limited to this embodiment.
As shown in fig. 2 and 3, the back-end interface module 13 is connected with the front-end interface module 11 and the main control module 12, is used for accessing the device based on transmitting a read operation instruction, and schedules the device based on received compiling and configuration information; wherein the devices performing the scheduling are mutually decoupled from each other under the action of the corresponding hardware acceleration circuit 1.
Specifically, as an example, as shown in fig. 2 and fig. 3, the post-stage interface module 13 includes a read-write operation conversion unit and an output control unit, where the read-write operation conversion unit is connected to the pre-stage interface module 11 and the main control module 12, and the read-write operation conversion unit is used to keep the communication protocol between the bus and the device consistent, and specific operation processes are not described in detail herein; the output control unit is connected with the read-write operation conversion unit.
It should be noted that, the application specific integrated circuit may be used to set the post-stage interface module 13, or the IP check may be used to set the post-stage interface module 13, so long as the device can be used for device access based on transmitting a read operation instruction, and the device can be accelerated based on the received compiling and configuration information, and any setting form of the post-stage interface module 13 is not limited to this embodiment.
The embodiment also provides a chip, which comprises at least one hardware acceleration circuit according to the embodiment and is used for carrying out configuration scheduling on equipment mounted on a bus in the chip, wherein the hardware acceleration circuit corresponds to the equipment one by one.
It should be noted that, the chip may be implemented by FPGA (fully called Field Programmable Gate Array, translated into field programmable gate array), or may be implemented by an application specific integrated circuit, an IP core, etc., and the specific implementation should consider an actual use scenario, which is not described here in detail.
In summary, the hardware acceleration circuit and the chip of the present application are used for performing one-to-one acceleration operation on a device mounted on a bus inside the chip, and at least include: front-stage interface module, main control module and back-stage interface module, wherein: the front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of the access equipment and a write operation instruction of the acceleration equipment; the master control module is connected with the front-stage interface module, performs corresponding operation on the write operation instruction based on the time reference provided by the chip before the effective moment, and transmits compiling and configuration information corresponding to the write operation instruction to the lower-stage module; the back-stage interface module is connected with the front-stage interface module and the main control module, is used for equipment access based on transmission of a read operation instruction, and accelerates the equipment based on received compiling and configuration information; the devices for executing acceleration are mutually decoupled under the action of the corresponding hardware acceleration circuits. According to the hardware acceleration circuit and the chip, the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the corresponding operation is executed through the main control module, so that the computing resources of the chip are saved, the devices can be configured and scheduled at the accurate effective moment, the devices are mutually decoupled, and the hardware acceleration circuit and the chip have wide application scenes. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (9)

1. A hardware acceleration circuit for performing configuration scheduling on devices mounted on a bus inside a chip, the hardware acceleration circuit comprising: front-stage interface module, main control module and back-stage interface module, wherein:
the front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of the access equipment and a write operation instruction of the acceleration equipment;
the master control module is connected with the front-stage interface module, performs corresponding operation on the write operation instruction based on a time reference provided by the chip before the effective moment, and transmits compiling and configuration information corresponding to the write operation instruction to the lower-stage module, wherein the time reference is provided by a time module in the chip;
the back-stage interface module is connected with the front-stage interface module and the main control module, is used for equipment access based on transmission of a read operation instruction, and is used for scheduling equipment based on received compiling and configuration information;
the equipment for executing scheduling is mutually decoupled under the action of the corresponding hardware acceleration circuit.
2. The hardware acceleration circuit of claim 1, wherein: the time module comprises a counter, wherein the counter performs cycle counting at a constant frequency after chip reset is effective.
3. The hardware acceleration circuit of claim 1, wherein: the front interface module includes: address decoding unit, accelerating operation register, head memory unit and data memory unit, wherein:
the address decoding unit is connected with the bus and is used for converting and distributing the input instruction according to the address;
the acceleration operation register is connected with the address decoding unit and mapped to a corresponding device register through configuration;
the head memory unit is connected with the address decoding unit and is used for mapping head information of the instruction;
the data memory unit is connected with the address decoding unit and is used for mapping data information of the instruction.
4. A hardware acceleration circuit according to claim 3, characterized in that: the head memory unit and the data memory unit are identical in depth in the same hardware acceleration circuit.
5. The hardware acceleration circuit of claim 4, wherein: the main control module comprises: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on a write operation instruction, wherein after a chip is reset, the state machine control unit executes an idle state; after the device register is mapped, the state machine control unit orders instructions through the ordering unit based on the effective moment; after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the rear-stage interface module through the preprocessing and executing unit before the time of generating efficiency; and when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit outputs a finishing signal to enable the state machine control unit to enter a release state through the release unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the release unit.
6. The hardware acceleration circuit of claim 5, wherein: the process of sorting comprises the following steps: after the sorting is triggered, firstly clearing the sorting result, and then sorting all the effective instructions according to the effective time of the header information; the preprocessing and executing unit transmits header information and data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.
7. The hardware acceleration circuit of claim 5, wherein: the operations performed by the preprocessing and executing unit include: comparing the effective time of the instruction with a time reference; and transmitting the head information and the data information of the write operation in the instruction to the later interface module by K clock cycles before the effective moment, wherein the K clock cycles are equal to the time consumption of the work of the later interface module, and the number of K is determined by a communication protocol between a bus and equipment.
8. The hardware acceleration circuit of claim 1, wherein: the back-stage interface module comprises a read-write operation conversion unit and an output control unit, wherein the read-write operation conversion unit is connected with the front-stage interface module and the main control module, and the read-write operation conversion unit is used for keeping the communication protocol of a bus and equipment consistent; the output control unit is connected with the read-write operation conversion unit.
9. A chip, characterized in that: the chip comprises at least one hardware acceleration circuit as set forth in any one of claims 1-8, for performing configuration scheduling on devices mounted on the bus inside the chip, wherein the hardware acceleration circuit corresponds to the devices one by one.
CN202310573241.4A 2023-05-22 2023-05-22 Hardware acceleration circuit and chip Active CN116680230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310573241.4A CN116680230B (en) 2023-05-22 2023-05-22 Hardware acceleration circuit and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310573241.4A CN116680230B (en) 2023-05-22 2023-05-22 Hardware acceleration circuit and chip

Publications (2)

Publication Number Publication Date
CN116680230A true CN116680230A (en) 2023-09-01
CN116680230B CN116680230B (en) 2024-04-09

Family

ID=87777899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310573241.4A Active CN116680230B (en) 2023-05-22 2023-05-22 Hardware acceleration circuit and chip

Country Status (1)

Country Link
CN (1) CN116680230B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5341495A (en) * 1991-10-04 1994-08-23 Bull Hn Information Systems, Inc. Bus controller having state machine for translating commands and controlling accesses from system bus to synchronous bus having different bus protocols
US5471638A (en) * 1991-10-04 1995-11-28 Bull Hn Inforamtion Systems Inc. Bus interface state machines with independent access to memory, processor and registers for concurrent processing of different types of requests
US5603047A (en) * 1995-10-06 1997-02-11 Lsi Logic Corporation Superscalar microprocessor architecture
US5671443A (en) * 1995-02-21 1997-09-23 International Business Machines Corporation Direct memory access acceleration device for use in a data processing system
TW564350B (en) * 2002-03-01 2003-12-01 Via Tech Inc Control chip for speeding up memory access and the operation method
CN1558332A (en) * 2004-01-18 2004-12-29 中兴通讯股份有限公司 Device and method for implementing automatically reading and writing internal integrated circuit equipment
WO2006004166A1 (en) * 2004-07-02 2006-01-12 Ssd Company Limited Data processing unit and compatible processor
US20070240011A1 (en) * 2006-04-05 2007-10-11 Texas Instruments Incorporated FIFO memory data pipelining system and method for increasing I²C bus speed
CN103092785A (en) * 2013-02-08 2013-05-08 豪威科技(上海)有限公司 Double data rate (DDR) 2 synchronous dynamic random access memory (SDRAM) controller
US20130318323A1 (en) * 2012-03-30 2013-11-28 Eliezer Weissmann Apparatus and method for accelerating operations in a processor which uses shared virtual memory
US20140277590A1 (en) * 2013-03-15 2014-09-18 Micron Technology, Inc. Overflow detection and correction in state machine engines
CN114253884A (en) * 2022-03-01 2022-03-29 四川鸿创电子科技有限公司 FPGA-based multi-master-to-multi-slave access arbitration method, system and storage medium
CN114996205A (en) * 2022-07-21 2022-09-02 之江实验室 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system
CN115048334A (en) * 2022-05-18 2022-09-13 西安科技大学 Programmable array processor control apparatus
CN116088940A (en) * 2022-11-24 2023-05-09 爱芯元智半导体(上海)有限公司 Hardware acceleration system, control method, chip and electronic equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5341495A (en) * 1991-10-04 1994-08-23 Bull Hn Information Systems, Inc. Bus controller having state machine for translating commands and controlling accesses from system bus to synchronous bus having different bus protocols
US5471638A (en) * 1991-10-04 1995-11-28 Bull Hn Inforamtion Systems Inc. Bus interface state machines with independent access to memory, processor and registers for concurrent processing of different types of requests
US5671443A (en) * 1995-02-21 1997-09-23 International Business Machines Corporation Direct memory access acceleration device for use in a data processing system
US5603047A (en) * 1995-10-06 1997-02-11 Lsi Logic Corporation Superscalar microprocessor architecture
TW564350B (en) * 2002-03-01 2003-12-01 Via Tech Inc Control chip for speeding up memory access and the operation method
CN1558332A (en) * 2004-01-18 2004-12-29 中兴通讯股份有限公司 Device and method for implementing automatically reading and writing internal integrated circuit equipment
WO2006004166A1 (en) * 2004-07-02 2006-01-12 Ssd Company Limited Data processing unit and compatible processor
US20070240011A1 (en) * 2006-04-05 2007-10-11 Texas Instruments Incorporated FIFO memory data pipelining system and method for increasing I²C bus speed
US20130318323A1 (en) * 2012-03-30 2013-11-28 Eliezer Weissmann Apparatus and method for accelerating operations in a processor which uses shared virtual memory
CN103092785A (en) * 2013-02-08 2013-05-08 豪威科技(上海)有限公司 Double data rate (DDR) 2 synchronous dynamic random access memory (SDRAM) controller
US20140277590A1 (en) * 2013-03-15 2014-09-18 Micron Technology, Inc. Overflow detection and correction in state machine engines
CN114253884A (en) * 2022-03-01 2022-03-29 四川鸿创电子科技有限公司 FPGA-based multi-master-to-multi-slave access arbitration method, system and storage medium
CN115048334A (en) * 2022-05-18 2022-09-13 西安科技大学 Programmable array processor control apparatus
CN114996205A (en) * 2022-07-21 2022-09-02 之江实验室 On-chip data scheduling controller and method for auxiliary 3D architecture near memory computing system
CN116088940A (en) * 2022-11-24 2023-05-09 爱芯元智半导体(上海)有限公司 Hardware acceleration system, control method, chip and electronic equipment

Also Published As

Publication number Publication date
CN116680230B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US7779286B1 (en) Design tool clock domain crossing management
US8166214B2 (en) Shared storage for multi-threaded ordered queues in an interconnect
US8543949B1 (en) Allocating hardware resources for high-level language code sequences
CN104168217A (en) Scheduling method and device for first in first out queue
CN108628784A (en) Serial communicator and serial communication system
Burns et al. GALS synthesis and verification for xMAS models
US8301933B2 (en) Multi-clock asynchronous logic circuits
US7346863B1 (en) Hardware acceleration of high-level language code sequences on programmable devices
CN116680230B (en) Hardware acceleration circuit and chip
CN103218343A (en) Inter-multiprocessor data communication circuit adopting data driving mechanism
CN116521096B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
Zhou et al. Task-binding based branch-and-bound algorithm for NoC mapping
CN116661703B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
US20110258361A1 (en) Petaflops router
CN102937915A (en) Hardware lock implementation method and device for multi-core processor
US7302667B1 (en) Methods and apparatus for generating programmable device layout information
US20220147097A1 (en) Synchronization signal generating circuit, chip and synchronization method and device, based on multi-core architecture
CN104834629A (en) Bus type central processing unit
CN103218344A (en) Data communication circuit arranged among a plurality of processors and adopting data driving mechanism
WO2014063531A1 (en) Modeling method for implementing hardware nonblocking assignment by high-level language
JP2014194619A (en) Buffer circuit and semiconductor integrated circuit
CN101667448B (en) Memory access control device and relevant control method thereof
CN202495946U (en) Bus type communication system of FPGA based on management and control of Internet of things
CN112396186B (en) Execution method, execution device and related product
CN107330195B (en) Data processing method and system for constructing behavior stimulus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant