CN116680230A

CN116680230A - Hardware acceleration circuit and chip

Info

Publication number: CN116680230A
Application number: CN202310573241.4A
Authority: CN
Inventors: 邓炯麟; 梅平; 王吉; 尹棋烽
Original assignee: Wuxi Linju Semiconductor Technology Co ltd
Current assignee: Wuxi Linju Semiconductor Technology Co ltd
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-09-01
Anticipated expiration: 2043-05-22
Also published as: CN116680230B

Abstract

The application provides a hardware acceleration circuit and a chip, which are used for carrying out one-to-one acceleration operation on equipment mounted on a bus in the chip, wherein a front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of access equipment and a write operation instruction of the acceleration equipment; the master control module executes corresponding operation on the write operation instruction based on the time reference provided by the chip before the effective time, and transmits compiling and configuration information corresponding to the write operation instruction to the subordinate module; the latter interface module is used for accessing the equipment based on the transmission of the read operation instruction, and scheduling the equipment based on the received compiling and configuration information; the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the computing resources of the chip are saved through the execution of the corresponding operation of the main control module, so that the devices can be scheduled at the precise effective moment, and the devices are mutually decoupled, so that the device register has wide application scenes.

Description

Hardware acceleration circuit and chip

Technical Field

The present application relates to the field of chip design and application technologies, and in particular, to a hardware acceleration circuit and a chip.

Background

In the chip, the processor sends the register configuration to each device mounted on the bus through the bus, but the register configuration of the devices is issued in series on the bus, the issuing process is generally configured and scheduled by software based on the service sequence, the execution speed of the software is limited by factors such as the processor, the memory and the like, in a system with strict requirements on the system time delay requirement, the device execution sequence and the device execution time, the conventional serial issuing is difficult to meet the requirements, the common practice is to add additional synchronous processing operation between the devices, but the synchronous adding mode makes decoupling difficult between the devices, so that the use scene is greatly limited, and the complexity of the system is increased.

It should be noted that the foregoing description of the background art is only for the purpose of providing a clear and complete description of the technical solution of the present application and is presented for the convenience of understanding by those skilled in the art. The above-described solutions are not considered to be known to the person skilled in the art simply because they are set forth in the background of the application section.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a hardware acceleration circuit and a chip, which are used for solving the problem that it is difficult to implement configuration scheduling on devices inside the chip under the requirement of mutual decoupling in the prior art.

In order to achieve the above object, the present application provides a hardware acceleration circuit for performing configuration scheduling on a device mounted on a bus inside a chip, the hardware acceleration circuit comprising: front-stage interface module, main control module and back-stage interface module, wherein:

the front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of the access equipment and a write operation instruction of the acceleration equipment;

the master control module is connected with the front-stage interface module, performs corresponding operation on the write operation instruction based on a time reference provided by the chip before the effective moment, and transmits compiling and configuration information corresponding to the write operation instruction to the lower-stage module, wherein the time reference is provided by a time module in the chip;

the back-stage interface module is connected with the front-stage interface module and the main control module, is used for equipment access based on transmission of a read operation instruction, and is used for scheduling equipment based on received compiling and configuration information;

the equipment for executing scheduling is mutually decoupled under the action of the corresponding hardware acceleration circuit.

Optionally, the time module includes a counter, wherein the counter counts cycles at a constant frequency after the chip reset is validated.

Optionally, the pre-stage interface module includes: address decoding unit, accelerating operation register, head memory unit and data memory unit, wherein:

the address decoding unit is connected with the bus and is used for converting and distributing the input instruction according to the address;

the acceleration operation register is connected with the address decoding unit and mapped to a corresponding device register through configuration;

the head memory unit is connected with the address decoding unit and is used for mapping head information of the instruction;

the data memory unit is connected with the address decoding unit and is used for mapping data information of the instruction.

Optionally, the header memory unit and the data memory unit are identical in depth in the same hardware acceleration circuit.

Optionally, the main control module includes: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on a write operation instruction, wherein after a chip is reset, the state machine control unit executes an idle state; after the device register is mapped, the state machine control unit orders instructions through the ordering unit based on the effective moment; after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the rear-stage interface module through the preprocessing and executing unit before the time of generating efficiency; and when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit outputs a finishing signal to enable the state machine control unit to enter a release state through the release unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the release unit.

Optionally, the process of ordering includes: after the sorting is triggered, firstly clearing the sorting result, and then sorting all the effective instructions according to the effective time of the header information; the preprocessing and executing unit transmits header information and data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.

Optionally, comparing the time of the validation of the instruction with a time reference; and transmitting the head information and the data information of the write operation in the instruction to the later interface module by K clock cycles before the effective moment, wherein the K clock cycles are equal to the time consumption of the work of the later interface module, and the number of K is determined by a communication protocol between a bus and equipment.

Optionally, the back-stage interface module includes a read-write operation conversion unit and an output control unit, where the read-write operation conversion unit is connected with the front-stage interface module and the main control module, where the read-write operation conversion unit is used to keep the communication protocol of the bus and the device consistent; the output control unit is connected with the read-write operation conversion unit.

In order to achieve the above purpose, the present application provides a chip, which includes at least one hardware acceleration circuit for performing configuration scheduling on devices mounted on a bus inside the chip, where the hardware acceleration circuit corresponds to the devices one by one.

As described above, the hardware acceleration circuit and the chip have the following beneficial effects:

according to the hardware acceleration circuit and the chip, the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the corresponding operation is executed through the main control module, so that the computing resources of the chip are saved, the devices can be configured and scheduled at the accurate effective moment, the devices are mutually decoupled, and the hardware acceleration circuit and the chip have wide application scenes.

Drawings

Fig. 1 shows a schematic diagram of an exemplary chip internal simplified frame of the present application.

Fig. 2 shows a first schematic diagram of the hardware acceleration circuit of the present application.

FIG. 3 is a second schematic diagram of the hardware acceleration circuit of the present application.

Fig. 4 is a schematic diagram showing a state machine control unit performing a state jump according to the present application.

Fig. 5 shows a schematic diagram of a process of sorting by the sorting unit of the present application.

FIG. 6 is a diagram illustrating the result of ordering an instruction of the present application when it is in effect.

FIG. 7 is a schematic diagram illustrating the operation of the preprocessing and execution unit of the present application.

Fig. 8 shows a schematic diagram of the operations performed by the release unit of the present application.

Description of the reference numerals

1-a hardware acceleration circuit; 11-a front-end interface module; 12-a main control module; 13-a back-end interface module; 2-time module.

Description of the embodiments

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application.

Please refer to fig. 1 to 8. It should be noted that the illustrations provided in the present embodiment merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

Fig. 1 shows a simplified frame diagram of the inside of a SOC (system on chip) chip, where a processor sends register configurations to the various devices mounted under the bus via the bus (the devices in fig. 1 include device 1, device 2 and device 3; the dashed lines represent the data flow). Assuming that the SOC chip in fig. 1 is used as a communication system, the requirement of the communication system on system delay is very strict, and the execution sequence and execution time of each device are also strictly required, further, if the communication system is a MIMO (Multiple-Input Multiple-Output) Multiple-Input Multiple-Output system (a communication system using Multiple antennas at the transmitting end and the receiving end simultaneously), the data alignment of Multiple antenna receiving/transmitting channels is required, and the simultaneous operation is performed, corresponding to the requirement that the device 1, the device 2 and the device 3 are required to operate simultaneously in fig. 1, but the register configuration of the device is issued serially on the bus, so that the device 1, the device 2 and the device 3 cannot operate simultaneously. The conventional method is to do additional synchronization processing among the devices 1, 2 and 3, because the existing communication system is mostly realized by hardware, decoupling cannot be achieved among the devices, and the use scene of the communication system is flexible and changeable, so that the synchronization processing of the devices is very difficult to realize, on the other hand, the issuing process is generally configured and scheduled by software based on the service sequence, the execution speed of the software is limited by factors such as a processor and a memory, and if the configuration and scheduling are operated by adopting a pure hardware circuit, the speed of the configuration and scheduling can be greatly improved, and meanwhile, the operation resources of the processor and the memory can be saved. It should be noted that, for convenience of description, only three devices are shown in fig. 1, and in actual use, the number of devices should be set according to the requirement.

Therefore, the application provides a hardware acceleration circuit and a chip, which are implemented as follows:

as shown in fig. 2 and 3, the present embodiment provides a hardware acceleration circuit 1 for performing configuration scheduling on devices mounted on a bus inside a chip, where the hardware acceleration circuit 1 includes: a front-stage interface module 11, a main control module 12 and a rear-stage interface module 13, wherein:

as shown in fig. 2 and 3, the front interface module 11 is connected to the bus, and is configured to perform custom compiling and configuration on an input read operation instruction of the access device and a write operation instruction of the acceleration device.

Specifically, as an example, as shown in fig. 3, the front-stage interface module 11 includes: address decoding unit, acceleration operation register, head memory unit Head Ram and Data memory unit Data Ram, wherein:

the acceleration operation register is connected with the address decoding unit and mapped to the corresponding equipment register through configuration;

It should be noted that operations on the bus for the device are classified into a read operation and a write operation, and the processor controls the device by configuring the device register, so that the operations for the device need to be accurately accelerated according to the time reference are write operations, and the operations for the device need to be accessed are read operations. Each write operation on the bus includes a write operation address and write operation data. The hardware acceleration circuit 1 self-defines an internal instruction, a complete instruction comprises an instruction head and instruction data, the content of the instruction head is the moment when the operation is effective, and the content of the instruction data is the address and the data of the operation. The instruction header and the instruction data are stored separately, and each address stores an instruction. The following tables represent instructions format and instructions deposit description, respectively:

more specifically, the Head memory unit Head Ram and the Data memory unit Data Ram are identical in depth in the same hardware acceleration circuit. It should be noted that, when the bus transmits the instruction to the front interface module 11, a table is maintained in the memory Ram of the front interface module 11, to record whether the instruction is valid, and the depth of the memory Ram is assumed to be N, where N is a natural number greater than 1. As shown in the following table, the value of each address in the memory Ram is used to indicate whether the instruction of the address is valid, the value 0 indicates invalid, the value 1 indicates valid, and the sequence number indicates the address of the memory Ram, wherein when the same address is written with header information and Data information respectively, the value representing the address receives a complete instruction, the value of the address is marked as 1, the instruction representing the address is valid, and the valid instruction is mapped by the memory unit Head Ram and the Data memory unit Data Ram.

It should be further noted that, an application specific integrated circuit (english full name Application Specific Integrated Circuit, abbreviated as ASIC) may be used to set the front-stage interface module 11, and the computing power and computing efficiency of the application specific chip may be customized according to the specific user requirements and the specific electronic system design and manufacture, or an IP core (IP english full name Intellectual Property, which is a section of hardware description language program with a specific circuit function, and the program is irrelevant to the integrated circuit process, and may be transplanted into a different semiconductor process to produce an integrated circuit chip) may be used to set the front-stage interface module 11, so long as the input read operation instruction of the access device and the write operation instruction of the acceleration device may be compiled and configured in a customized manner, and the setting form of any front-stage interface module 11 is not limited to this embodiment.

As shown in fig. 2 and 3, the main control module 12 is connected to the front interface module 11, performs a corresponding operation on the write operation instruction before the time of the validation based on the time reference provided by the chip, and transmits the compiling and configuration information corresponding to the write operation instruction to the lower module, wherein the time reference is provided by the time module 2 inside the chip, and the lower module refers to the rear interface module 13.

Specifically, as an example, as shown in fig. 2 and 3, the time module 2 includes a counter, wherein the counter counts cycles at a constant frequency after the chip reset is validated. It should be noted that the component for providing the time reference may be a phase-locked loop, a clock chip, etc., and any component is applicable as long as the component can provide the time reference, and is not limited to the embodiment.

Specifically, as shown in fig. 2 and 3, as an example, the main control module 12 includes: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on the write operation instruction, wherein the state machine control unit executes the state jump process please refer to fig. 4:

after the chip is reset, the state machine control unit executes an idle state;

after the device register is mapped, the state machine control unit orders the instructions through the ordering unit based on the effective moment;

after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the later-stage interface module before the time of generating efficiency through the preprocessing and executing unit;

when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit releasing unit outputs a finishing signal to enable the state machine control unit to enter a releasing state through the releasing unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the releasing unit.

Further, the sorting process is as shown in fig. 5:

step one: when the ordering is triggered, the ordering result is emptied first, specifically, the ordering result queue is emptied and the read pointer is reset.

Step two: and ordering all the valid instructions according to the effective time of the header information. In this embodiment, the order from small to large is performed according to the effective time of the header information, and particularly, when the effective time of the instruction is smaller than the value of the current time reference (the current count value if the counter is the current count value), the effective time is considered to be the instruction of the next period after overflowing; and when the effective time of the instruction is larger than the value of the current time reference, the effective time is considered to be the instruction of the current period. Referring to FIG. 6, in the case of ordering, T1 < the current time reference < T2, and the ordering result is T2 < T1.

Step three: and sequentially writing the ordered results into an ordered result queue, and maintaining a read pointer. The contents stored in the queue are Ram addresses of corresponding instructions, wherein the ordering result queue refers to the following table:

it should be noted that, the content stored in the ordering result queue is the Ram address of the corresponding instruction, the sequence number 0 is the maximum value of the ordering result, and the Ram address of the valid instruction is assumed to be: 2,4 and 6, when the time is effective, the time of the address 6 is less than the current time reference is less than the time of the address 2 is less than the time of the address 4, according to the ordering criterion, the time of the address 6 overflows the instruction of the next period, the time of the address 4 is the instruction of the current period, the time of the address 2 is the instruction of the current period, and the final ordering result queue is shown as a table, wherein the Ram address of the instruction corresponding to the read pointer is the address 6.

Step four: after the sequencing is completed, the read pointer points to the Ram address of the instruction with the smallest effective moment, the state machine control unit generates a starting signal, and the preprocessing and executing unit transmits the head information and the data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.

Further, the operations performed by the preprocessing and executing unit are shown in fig. 7, and include:

step five: after receiving the starting signal, the preprocessing and executing unit acquires the instruction head and the instruction data of the Ram address of the instruction pointed by the read pointer.

Step six: comparing the effective time of the instruction with a time reference, judging whether the time reference is equal to the difference value between the effective time of the instruction and K clock cycles, and if so, continuing to execute the step seven; if not, returning to the step five, wherein K clocks refer to K system clocks, and K clock periods are equal to the working time consumption of the later interface module, the number of K is determined by a communication protocol between the bus and the device, and the setting of the communication protocol is not described in detail herein.

Step seven: and transmitting the header information and the data information of the write operation instruction in the instruction to a later interface module.

Step eight: the preprocessing and executing unit generates a completion signal to enable the state machine control unit to enter a release state through the release unit.

Further, the operations performed by the release unit, as shown in fig. 8, include:

step nine: after receiving the completion signal generated by the preprocessing and executing unit, the value of the Ram address of the instruction pointed by the read pointer is set to 0.

Step ten: maintaining a read pointer, directing the read pointer to the last member of the sequencing result queue, judging whether the member pointed by the read pointer exists, and if so, jumping to the fourth step, and operating the preprocessing and executing unit; if not, the state machine control unit returns to the idle state through the release unit.

It should be further noted that, the application specific integrated circuit may be used to set the main control module 12, or the IP core may be used to check the main control module 12, so long as the corresponding operation can be executed on the write operation instruction before the time of the validation based on the time reference provided by the chip, and the compiling and configuration information corresponding to the write operation instruction is transmitted to the subordinate module, and the setting form of any main control module 12 is not limited to this embodiment.

As shown in fig. 2 and 3, the back-end interface module 13 is connected with the front-end interface module 11 and the main control module 12, is used for accessing the device based on transmitting a read operation instruction, and schedules the device based on received compiling and configuration information; wherein the devices performing the scheduling are mutually decoupled from each other under the action of the corresponding hardware acceleration circuit 1.

Specifically, as an example, as shown in fig. 2 and fig. 3, the post-stage interface module 13 includes a read-write operation conversion unit and an output control unit, where the read-write operation conversion unit is connected to the pre-stage interface module 11 and the main control module 12, and the read-write operation conversion unit is used to keep the communication protocol between the bus and the device consistent, and specific operation processes are not described in detail herein; the output control unit is connected with the read-write operation conversion unit.

It should be noted that, the application specific integrated circuit may be used to set the post-stage interface module 13, or the IP check may be used to set the post-stage interface module 13, so long as the device can be used for device access based on transmitting a read operation instruction, and the device can be accelerated based on the received compiling and configuration information, and any setting form of the post-stage interface module 13 is not limited to this embodiment.

The embodiment also provides a chip, which comprises at least one hardware acceleration circuit according to the embodiment and is used for carrying out configuration scheduling on equipment mounted on a bus in the chip, wherein the hardware acceleration circuit corresponds to the equipment one by one.

It should be noted that, the chip may be implemented by FPGA (fully called Field Programmable Gate Array, translated into field programmable gate array), or may be implemented by an application specific integrated circuit, an IP core, etc., and the specific implementation should consider an actual use scenario, which is not described here in detail.

In summary, the hardware acceleration circuit and the chip of the present application are used for performing one-to-one acceleration operation on a device mounted on a bus inside the chip, and at least include: front-stage interface module, main control module and back-stage interface module, wherein: the front-stage interface module is connected with the bus and is used for carrying out custom compiling and configuration on an input read operation instruction of the access equipment and a write operation instruction of the acceleration equipment; the master control module is connected with the front-stage interface module, performs corresponding operation on the write operation instruction based on the time reference provided by the chip before the effective moment, and transmits compiling and configuration information corresponding to the write operation instruction to the lower-stage module; the back-stage interface module is connected with the front-stage interface module and the main control module, is used for equipment access based on transmission of a read operation instruction, and accelerates the equipment based on received compiling and configuration information; the devices for executing acceleration are mutually decoupled under the action of the corresponding hardware acceleration circuits. According to the hardware acceleration circuit and the chip, the device register is cached in advance through the custom compiling and configuration operation of the front-stage interface module, the corresponding operation is executed through the main control module, so that the computing resources of the chip are saved, the devices can be configured and scheduled at the accurate effective moment, the devices are mutually decoupled, and the hardware acceleration circuit and the chip have wide application scenes. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. A hardware acceleration circuit for performing configuration scheduling on devices mounted on a bus inside a chip, the hardware acceleration circuit comprising: front-stage interface module, main control module and back-stage interface module, wherein:

2. The hardware acceleration circuit of claim 1, wherein: the time module comprises a counter, wherein the counter performs cycle counting at a constant frequency after chip reset is effective.

3. The hardware acceleration circuit of claim 1, wherein: the front interface module includes: address decoding unit, accelerating operation register, head memory unit and data memory unit, wherein:

4. A hardware acceleration circuit according to claim 3, characterized in that: the head memory unit and the data memory unit are identical in depth in the same hardware acceleration circuit.

5. The hardware acceleration circuit of claim 4, wherein: the main control module comprises: the device comprises a sequencing unit, a state machine control unit, a preprocessing and executing unit and a releasing unit, wherein: the sequencing unit, the preprocessing and executing unit and the releasing unit are all connected with the state machine control unit, and the state machine control unit controls the sequencing unit, the preprocessing and executing unit and the releasing unit to execute corresponding operations based on a write operation instruction, wherein after a chip is reset, the state machine control unit executes an idle state; after the device register is mapped, the state machine control unit orders instructions through the ordering unit based on the effective moment; after the equipment generates a starting signal, the state machine control unit transmits head information and data information of a write operation instruction to the rear-stage interface module through the preprocessing and executing unit before the time of generating efficiency; and when the preprocessing and executing unit transmits an instruction to the later-stage interface module, the preprocessing and executing unit outputs a finishing signal to enable the state machine control unit to enter a release state through the release unit, the state machine control unit continues to execute the next instruction, and when the last instruction is executed, the state machine control unit returns to an idle state through the release unit.

6. The hardware acceleration circuit of claim 5, wherein: the process of sorting comprises the following steps: after the sorting is triggered, firstly clearing the sorting result, and then sorting all the effective instructions according to the effective time of the header information; the preprocessing and executing unit transmits header information and data information of the write operation instruction to the later interface module before the effective moment based on the sequencing result.

7. The hardware acceleration circuit of claim 5, wherein: the operations performed by the preprocessing and executing unit include: comparing the effective time of the instruction with a time reference; and transmitting the head information and the data information of the write operation in the instruction to the later interface module by K clock cycles before the effective moment, wherein the K clock cycles are equal to the time consumption of the work of the later interface module, and the number of K is determined by a communication protocol between a bus and equipment.

8. The hardware acceleration circuit of claim 1, wherein: the back-stage interface module comprises a read-write operation conversion unit and an output control unit, wherein the read-write operation conversion unit is connected with the front-stage interface module and the main control module, and the read-write operation conversion unit is used for keeping the communication protocol of a bus and equipment consistent; the output control unit is connected with the read-write operation conversion unit.

9. A chip, characterized in that: the chip comprises at least one hardware acceleration circuit as set forth in any one of claims 1-8, for performing configuration scheduling on devices mounted on the bus inside the chip, wherein the hardware acceleration circuit corresponds to the devices one by one.