CN116414746A

CN116414746A - Pulse bus for improving bandwidth utilization rate of HBM chip and data processing method

Info

Publication number: CN116414746A
Application number: CN202310177002.7A
Authority: CN
Inventors: 陈铖
Original assignee: Jiuzhi Suzhou Intelligent Technology Co ltd
Current assignee: Jiuzhi Suzhou Intelligent Technology Co ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-07-11

Abstract

The embodiment of the specification discloses a ripple bus for improving the bandwidth utilization rate of an HBM chip and a data processing method, wherein the ripple bus comprises a ripple array formed by data processing nodes; wherein, data connection is established between adjacent data processing nodes; the first row of data processing nodes are connected with the HBM chip and are used for reading data from the HBM chip at the beginning time of a beat period; after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule; the last row of data processing nodes are connected with the HBM chip and are used for responding to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period. The scheme of the invention improves the upper limit of the bandwidth of the HBM system and can support the data transmission with larger bandwidth; simultaneously, the bandwidth provided by the HBM is fully utilized, and the bandwidth utilization rate is improved.

Description

Pulse bus for improving bandwidth utilization rate of HBM chip and data processing method

Technical Field

The present disclosure relates to the field of computer architecture, and in particular, to a pulse bus, a data processing method, an electronic device, and a storage medium for improving bandwidth utilization of an HBM chip.

Background

With the rapid development of algorithms, the computational power demands on chips are also rapidly increasing. With the advent of high-power chips, a high-bandwidth system-on-chip is required to provide data reading and writing for a chip computing part, and the development of high-bandwidth storage HB can just meet the demands of the chip. HBMs are a class of high performance DRAMs that have evolved rapidly in recent years to provide very high bandwidth for systems on chip. The development of HBM also presents challenges for network-on-chip No architecture. Due to the limitations of the highest frequency and layout wiring of the conventional chip, the available system bandwidth is limited, and the high bandwidth of the HBM is difficult to fully utilize. Existing system-on-chip bus structures include Mesh bus structures, star bus structures, and the like. How to overcome the defects that the existing bus structure can not fully adapt to the bandwidth requirement of the high-power chip of the HBM and can not fully utilize the high bandwidth provided by the HBM, and optimize the bus structure is a technical problem to be solved urgently.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a ripple bus, a data processing method, an electronic device, and a storage medium, which can improve the bandwidth utilization of an HBM chip.

In order to solve the above technical problems, the embodiments of the present specification are implemented as follows:

in a first aspect, a ripple bus for improving bandwidth utilization of an HBM chip is provided, where the ripple bus includes a ripple array formed by arranging data processing nodes; wherein, data connection is established between adjacent data processing nodes; the first row of data processing nodes are connected with the HBM chip and are used for reading data from the HBM chip at the beginning time of a beat period; after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule; the last row of data processing nodes are connected with the HBM chip and are used for responding to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period.

Further, the systolic array is an array formed by m×n data processing nodes; wherein m and n are the number of rows and columns of the systolic array, respectively, and m and n are both greater than 1.

Further, the data processing nodes are configured to implement the same or different data processing functions, where the data processing functions include at least one of data operation, data storage, and data interface call.

Further, the data processing node responds to a preset instruction corresponding to the beat to execute a corresponding data processing function.

Further, the data-directed transfer rule includes transferring data to neighboring systolic nodes above and to the right of the current data processing node.

Further, according to the size of the pulse array, the data directional transmission rule and the relevance between the data output by the last row of data processing nodes in the current beat period and the data read from the HBM chip at the beginning time of the next beat period, the beat number of the beat period is determined, wherein the beat number takes a value range of [ m, m+n ].

Further, the HBM chip temporarily stores the data processing result of the data processing node.

In a second aspect, a data processing method for improving bandwidth utilization rate of an HBM chip is provided, wherein the HBM chip sends data to be processed to a pulse bus according to a preset beat; the data processing nodes are connected with each other through the pulse buses, wherein the pulse buses comprise pulse arrays formed by arranging data processing nodes, and data connection is established between adjacent data processing nodes; during a beat period, the method comprises:

a first row of data processing nodes connected with the HBM chip read data from the HBM chip at the beginning time of a beat period;

after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule;

and the last row of data processing nodes connected with the HBM chip responds to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period.

Further, the data processing nodes are used for realizing the same or different data processing functions, and the data processing functions comprise at least one of data operation, data storage and data interface call; and/or the data processing node responds to a preset instruction corresponding to the beat to execute the corresponding data processing function.

Further, the data directional transfer rule includes transferring data to adjacent systolic nodes above and to the right of the current data processing node; and/or determining the number of beats of the beat period according to the size of the pulse array, the data directional transmission rule and the relevance between the data output by the last row of data processing nodes in the current beat period and the data read from the HBM chip at the beginning time of the next beat period, wherein the range of the number of beats is [ m, m+n ].

In a third aspect, an electronic device is provided, comprising: a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of the second aspect.

In a fourth aspect, a computer readable storage medium is presented, the computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of the second aspect.

The specification can achieve at least the following technical effects:

the scheme of the invention forms a pulsation bus structure by constructing the pulsation array formed by arranging the data processing nodes, designs the connection mode of the corresponding data processing nodes and the data transmission mode of the pulsation bus, can effectively improve the upper limit of the bandwidth of the HBM system and supports the data transmission of larger bandwidth; simultaneously, the bandwidth provided by the HBM is fully utilized, and the bandwidth utilization rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a ripple bus structure for improving bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 2 is one of schematic diagrams of a pulse bus beat pipeline for improving the bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 3 is a second schematic diagram of a ripple bus structure for improving bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 4 is a third schematic diagram of a ripple bus structure for improving bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a ripple bus structure for improving bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 6 is a schematic flow chart of a data processing method for improving the bandwidth utilization of an HBM chip according to an embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

A heterogeneous blockchain management platform scheme referred to in this specification is detailed below by way of specific examples.

Key terms

High bandwidth storage HBM: is a CPU/GPU memory chip (i.e., a "RAM"), i.e., a plurality of DDR chips are stacked together and packaged with a GPU, realize the DDR combination array of large capacity, high bit width. The HBM stack is not physically integrated with the CPU or GPU, but is compactly and quickly connected through an interposer, and the HBM has almost the same characteristics as a chip-integrated RAM, and thus has higher speed and higher bandwidth. At present, the appearance of on-chip HBM (High Bandwidth Memory ) enables AI/deep learning to be completely put on-chip, and the bandwidth is not limited by the interconnection number of chip pins at the same time of improving the integration level, so that IO bottleneck is solved to a certain extent.

Bus structure: the system is divided into an internal bus, a system bus and an external bus. The internal bus is divided into an on-chip bus and an element-level bus. An on-chip bus refers to a connection of parts inside an integrated circuit chip. The number of the internal modules of the early chip is less, the structure is single, and star-shaped, full-connection or switch Crossbar topology structures are adopted. As multi-core processors gradually replace single-core processors, the IP in the chip is gradually increased, and how to handle the communication between them becomes an important grip for solving the performance of the chip. The novel on-chip bus has three main basic structures: (1) The ring bus is widely applied to consumer product market and server market chips at present; (2) the Mesh bus is mainly applied to a server chip; (3) The Torus bus deformed on the Mesh basis is formed by connecting the nodes of each row and column end to form a plurality of rings, and can be regarded as a combination of (1) and (2). In addition, there is a tree bus structure, i.e. a unidirectional loop, which causes a failure of one node, and the following nodes will not work.

Systolic Array: an array structure. Pulsation means that its way and process of operation is as if it were the way and process of the human blood circulation system. In such an array configuration, data "flows" rhythmically in a predetermined "pipelined" fashion between the processing elements of the array. During the data flow, all processing units process the data flowing to it simultaneously in parallel, so that it can achieve a high parallel processing speed. When the data flow mode is preset, all corresponding processing is completed in the process of flowing into the processing unit array to flowing out of the processing unit array, the data is not required to be input again, and only the boundary processing units of the array are communicated with the outside, so that the processing speed of the array machine is improved under the condition that the input and output speed of the array machine is not increased. Because the array and the processing unit have simple structures and consistent rules, the high modularization degree can be achieved, and the method is very suitable for the design and the manufacture of very large scale integrated circuits.

The invention aims at solving the technical problems that the existing bus structure cannot fully adapt to the bandwidth requirement of an HBM high-power chip and cannot fully utilize the high bandwidth provided by the HBM, and the bus structure is optimized. In general, the improvement of HBM bandwidth characteristics can be considered from three perspectives, i.e., performance, power consumption, and area. Since HBMs support high bandwidth, which refers to the amount of data that can be transmitted in a particular unit of time, their high bandwidth characteristics make HBMs mainly useful in high performance computing scenarios. In general, HBM memory bandwidth refers to the amount of data that can be transmitted per unit time, and the simplest method to increase the bandwidth is to increase the number of data transmission lines; of course, a similar purpose may be achieved by including a step up in the system clock frequency or an increase in the chip area. From the viewpoint of increasing the number of data transmission lines, each HBM is actually composed of up to 1024 data pins, and the data transmission path inside the HBM is significantly increased with the development of each generation of products. However, the chip size limits the increase of the transmission path. Since the addition is not only a data transmission line but also a transmission/reception circuit using each transmission line. In addition, as the number of transmission lines increases, the difficulty of matching the length and configuration of each transmission line in equal amounts increases, so that the operation speed cannot be increased. Therefore, how to select an appropriate bus structure is critical to improving HBM bandwidth utilization to an optimal balance with computational speed.

Example 1

In order to achieve the purpose of improving the bandwidth utilization rate of the HBM, the embodiment of the invention provides a pulse type bus structure, and the bandwidth utilization rate is higher than that of the existing bus structure. Meanwhile, the embodiment of the invention also provides a detailed explanation of the data processing node connection mode based on the pulse bus topology structure and the data transmission mode of the pulse bus. Specifically, the ripple bus structure shown in fig. 1 to 5 in the embodiment of the present invention is formed by a 4×4 ripple array, and the following embodiment is also exemplified by the ripple bus structure formed by the 4×4 ripple array. It should be understood that the size of the systolic array can be adjusted according to the requirement for improving the bandwidth utilization rate of the HBM chip, and the data transmission details such as the beat of the counter-current water will also be changed by the systolic array with the same size. However, all the improvements that can achieve the technical effect of improving the bandwidth utilization rate of the HBM chip by constructing the systolic array and according to the designed corresponding data processing node connection mode and the systolic bus data transmission mode are within the technical scheme protection scope of the embodiment of the invention.

As shown in fig. 1, a pulsation total for improving the bandwidth utilization of an HBM chip is provided for an embodiment of the present invention. The ripple bus comprises a ripple array formed by arranging data processing nodes; wherein, data connection is established between adjacent data processing nodes; the first row of data processing nodes are connected with the HBM chip and are used for reading data from the HBM chip at the beginning time of a beat period; after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule; the last row of data processing nodes are connected with the HBM chip and are used for responding to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period.

Optionally, the systolic array is an array of m×n data processing nodes; wherein m and n are the number of rows and columns of the systolic array, respectively, and m and n are both greater than 1.

Optionally, the data processing nodes are configured to implement the same or different data processing functions, where the data processing functions include at least one of data operation, data storage, and data interface call. In particular, the data processing nodes act as nodes in the bus structure network, requiring data to be read from or written to other nodes on the bus.

Optionally, the data processing node responds to a preset instruction corresponding to the beat to execute a corresponding data processing function.

Optionally, the data-directed transfer rule includes transferring data to adjacent systolic nodes above and to the right of the current data processing node.

Optionally, according to the size of the systolic array, the data directional transmission rule and the correlation between the data output by the last row of data processing nodes in the current beat period and the data read from the HBM chip at the beginning time of the next beat period, determining the beat number of the beat period, wherein the beat number takes a value range of [ m, m+n ].

Optionally, the HBM chip temporarily stores a data processing result of the data processing node.

Specifically, the ripple bus topology structure, the data node connection and the data transmission manner of the ripple bus disclosed in the embodiments of the present invention are described in detail by using the diagrams of the ripple bus structures shown in fig. 2 to 5. Since the systolic bus structure shown in fig. 2 to 5 is a systolic array of size 4×4, m=4, n=4. At this time, the first row of data processing nodes are Node00, node01, node02, node03, and the last row of data processing nodes are Node30, node31, node32, node33. The data flow is in the direction of the Node arrows shown in fig. 2 to 5, i.e. towards adjacent data processing nodes in turn upwards and to the right.

As shown in fig. 2, for one running clock cycle, data enters Node00, node01, node02, node03 from the HBM chip at the same time at time P0. After the processing of the adjacent nodes upwards and rightwards, the data passes through the Node30 at the moment P4, and if the data written into the HBM chip by the Node30 at the moment P4 is not the data needed by the next running time period, the next running time P5 is continuously executed; when the data passes through the Node31 at the time point P5, if the data written into the HBM chip by the Node31 at the time point P5 is not the data required by the next running clock cycle, continuing to execute the next running clock P6; when the data passes through the Node32 at the time P6, if the data written into the HBM chip by the Node32 at the time P6 is not the data needed by the next running clock cycle, continuing to execute the next running clock P7; that is, the data passes through the Node33 at the time P7, and the data of the HBM chip is written in through the Node33 at the time P7, whether the data is needed for the next running clock cycle or not, at this time, all running clock cycles have been executed, and the next running clock cycle is entered. Meanwhile, the HBM chip temporarily stores the data results calculated at the time of P0-P7 so as to calculate corresponding data.

As shown in fig. 3, for one running clock cycle, data enters Node00, node01, node02, node03 from the HBM chip at the same time at time P0. After the processing of the adjacent nodes upwards and rightwards, the data passes through the Node30 at the moment P4, and if the data written into the HBM chip by the Node30 at the moment P4 is not the data needed by the next running time period, the next running time P5 is continuously executed; when the data passes through the Node31 at the time point P5, if the data written into the HBM chip by the Node31 at the time point P5 is not the data required by the next running clock cycle, continuing to execute the next running clock P6; when the data passes through the Node32 at the time of P6, if the data written into the HBM chip via the Node32 at the time of P6 is the data needed for the next running clock cycle, the running clock cycle is completely executed at the moment, and the next running clock cycle is entered. Meanwhile, the HBM chip temporarily stores the data results calculated at the time of P0-P6 so as to calculate corresponding data.

Similarly, as shown in fig. 4, for one running clock cycle, data enters Node00, node01, node02, node03 from the HBM chip at the same time at time P0. After the processing of the adjacent nodes upwards and rightwards, the data passes through the Node31 at the time P5, and if the data written into the HBM chip through the Node31 at the time P5 is the data required by the next running clock cycle, the running clock cycle of the present round is completely executed at the moment, and the next running clock cycle is entered. Meanwhile, the HBM chip temporarily stores the data results calculated at the time of P0-P5 so as to calculate corresponding data. As shown in fig. 5, for one running clock cycle, data enters Node00, node01, node02, node03 from the HBM chip at the same time at time P0. After the processing of the adjacent nodes to the right and the upward, the data passes through the Node30 at the time P4, if the data written into the HBM chip by the Node30 at the time P5 is the data needed by the next running clock cycle, the running clock cycle of the present round is completely executed at the moment, and the next running clock cycle is entered.

The scheme of the invention forms a pulsation bus structure by constructing the pulsation array formed by arranging the data processing nodes, and the designed corresponding data processing node connection mode and the data transmission mode of the pulsation bus, so that the upper limit of the bandwidth of the HBM system can be effectively improved, and the data transmission with larger bandwidth is supported; simultaneously, the bandwidth provided by the HBM is fully utilized, and the bandwidth utilization rate is improved.

Example two

Referring to fig. 6, a flow chart of a data processing method for improving the bandwidth utilization of an HBM chip according to an embodiment of the present invention is shown. The HBM chip sends the data to be processed to the pulse bus according to a preset beat; the ripple bus comprises a ripple array formed by arranging data processing nodes, and data connection is established between adjacent data processing nodes.

During a beat period, the method comprises:

s1: the first row of data processing nodes connected with the HBM chip read data from the HBM chip at the beginning time of the beat period.

S2: after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule.

Further, the data directional transfer rule includes transferring data to adjacent systolic nodes above and to the right of the current data processing node; and/or determining the number of beats of the beat period according to the size of the pulse array, the data directional transmission rule and the relevance between the data output by the last row of data processing nodes in the current beat period and the data read from the HBM chip at the beginning time of the next beat period, wherein the range of the number of beats is [ m+1, m+n ].

S3: and the last row of data processing nodes connected with the HBM chip responds to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period.

Example III

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 7, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 7, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs, and forms a shared resource access control device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:

the HBM chip sends the data to be processed to the pulse bus according to a preset beat; the data processing nodes are connected with each other through the pulse buses, wherein the pulse buses comprise pulse arrays formed by arranging data processing nodes, and data connection is established between adjacent data processing nodes; during a beat period, the method comprises:

The heterogeneous blockchain management platform implementation method disclosed in the embodiment shown in fig. 6 of the present specification can be applied to a processor or implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

Of course, in addition to the software implementation, the electronic device of the embodiments of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Example IV

The present description also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the heterogeneous blockchain management platform implementation method of the embodiment shown in fig. 6, and in particular to perform the method of:

In summary, the foregoing description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the protection scope of the present specification.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims

1. The pulse bus for improving the bandwidth utilization rate of the HBM chip is characterized by comprising a pulse array formed by arranging data processing nodes; wherein, data connection is established between adjacent data processing nodes; the first row of data processing nodes are connected with the HBM chip and are used for reading data from the HBM chip at the beginning time of a beat period; after the current data processing node finishes corresponding data processing, responding to the corresponding beat, and transmitting data to the adjacent data processing nodes according to the data directional transmission rule; the last row of data processing nodes are connected with the HBM chip and are used for responding to the corresponding beat to write data into the HBM chip so as to complete the data processing operation of the current beat period.

2. The ripple bus for improving bandwidth utilization of HBM chips of claim 1 wherein said ripple array is an array of mxn data processing nodes; wherein m and n are the number of rows and columns of the systolic array, respectively, and m and n are both greater than 1.

3. The pulsatile bus for improving bandwidth utilization of an HBM chip of claim 1, wherein said data processing nodes are configured to implement the same or different data processing functions including at least one of data manipulation, data storage, data interface invocation.

4. A pulsating bus for increasing bandwidth utilization of HBM chips according to claim 3 wherein said data processing nodes perform respective data processing functions in response to preset instructions corresponding to beats.

5. The improved HBM chip bandwidth utilization systolic bus of claim 1 wherein said data-directed transfer rules comprise transferring data to adjacent systolic nodes above and to the right of a current data processing node.

6. The pulse bus for improving bandwidth utilization of an HBM chip according to claim 5, wherein the number of beats of a beat cycle is determined according to the pulse array size, the data directional transfer rule, and the correlation of the data output by the last row of data processing nodes in the current beat cycle and the data read from the HBM chip at the start time of the next beat cycle, wherein the number of beats has a value in the range of [ m, m+n ].

7. The ripple bus for improving bandwidth utilization of an HBM chip of claim 1, wherein said HBM chip temporarily stores data processing results of said data processing nodes.

8. The data processing method for improving the bandwidth utilization rate of the HBM chip is characterized in that the HBM chip sends data to be processed to a pulse bus according to a preset beat; the data processing nodes are connected with each other through the pulse buses, wherein the pulse buses comprise pulse arrays formed by arranging data processing nodes, and data connection is established between adjacent data processing nodes; during a beat period, the method comprises:

9. The method for improving bandwidth utilization of an HBM chip according to claim 8 wherein said data processing nodes are configured to implement the same or different data processing functions including at least one of data operations, data storage, data interface calls; and/or the data processing node responds to a preset instruction corresponding to the beat to execute a corresponding data processing function.

10. The method of increasing HBM chip bandwidth utilization of claim 8 wherein said data-directed transfer rules include transferring data to adjacent systolic nodes above and to the right of a current data processing node; and/or determining the number of beats of the beat period according to the size of the pulse array, the data directional transmission rule and the relevance between the data output by the last row of data processing nodes in the current beat period and the data read from the HBM chip at the beginning time of the next beat period, wherein the range of the number of beats is [ m, m+n ].

11. An electronic device, comprising: a processor; and a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any of claims 8 to 10.

12. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of claims 8-10.