WO2023123905A1 - Data transmission processing method in chip system and related apparatus - Google Patents

Data transmission processing method in chip system and related apparatus Download PDF

Info

Publication number
WO2023123905A1
WO2023123905A1 PCT/CN2022/099849 CN2022099849W WO2023123905A1 WO 2023123905 A1 WO2023123905 A1 WO 2023123905A1 CN 2022099849 W CN2022099849 W CN 2022099849W WO 2023123905 A1 WO2023123905 A1 WO 2023123905A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
chip
chiplet
data
area
Prior art date
Application number
PCT/CN2022/099849
Other languages
French (fr)
Chinese (zh)
Inventor
黎立煌
陈宁
王和国
曹庆新
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2023123905A1 publication Critical patent/WO2023123905A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication

Definitions

  • the present application relates to the field of communication technologies, and in particular to a data transmission processing method and related devices in a chip system.
  • a chip system may include multiple sub-chips, each of which has the function of independently processing data, and the multiple sub-chips are connected in a certain topology to realize mutual communication. Moreover, the multiple sub-chips can cooperatively process a single large-scale computing task in a model-parallel manner, so as to improve the processing efficiency of the task. In the process of cooperatively processing tasks, frequent interactive transmission of data is required among the multiple sub-chips, and the efficiency of the data transmission affects the processing performance of the entire chip system.
  • the embodiment of the present application discloses a data transmission processing method in a chip system and related devices, which can realize efficient data transmission between sub-chips in the chip system and improve the processing performance of the chip system.
  • the present application provides a data transmission processing method in a chip system, the method comprising:
  • the first sub-chip receives the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned first sub-chip and the aforementioned target sub-chip are sub-chips included in the chip system, and the plurality of sub-chips in the aforementioned chip system Arranged in the form of a matrix, each of the aforementioned multiple sub-chips is connected to surrounding adjacent sub-chips;
  • the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned destination with a smaller transmission bandwidth Principles of chiplets;
  • the aforementioned directional coordinate system is constructed around the aforementioned first sub-chip, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; the row where the aforementioned first sub-chip is located is located in the opposite direction on at least one of the aforementioned first direction axis and the aforementioned second direction axis; the column where the aforementioned first chiplet is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions .
  • the above-mentioned direction coordinate system is constructed centering on the sub-chip that currently needs to send data in the chip system, and then the sub-chip transmits the received data based on the direction coordinate system with the principle of less bandwidth consumption, thereby improving Data transmission efficiency, and then improve the processing performance of the chip system.
  • sending the data in the first data packet to the target chiplet by the first chiplet based on the direction coordinate system and the principle of less bandwidth consumption includes: when the target chiplet is on the target direction axis
  • the first chiplet sends the data along the direction of the target direction axis; the target direction axis is the first direction axis, the second direction axis, the third direction axis or the fourth direction axis.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
  • the aforementioned first chiplet is The second data packet is sent in the direction of the direction axis; the aforementioned second data packet includes the aforementioned data, the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent areas .
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
  • the first chiplet is along one of the direction axes of the two boundaries of the first region
  • the third data packet is sent in the direction of the third area
  • the fourth data packet is sent along the direction of one of the two boundary direction axes of the aforementioned third area;
  • the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first destination chiplet
  • the aforementioned fourth data packet includes the aforementioned data and the ID of the aforementioned second destination chiplet.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
  • the first chiplet sends the fifth data along the direction axis of the boundary of the target area package, the aforementioned fifth data packet includes the aforementioned data and the identification of the aforementioned first and second target chiplets; the aforementioned target area is the first area, the second area, the third area or the fourth area.
  • the orientation of the target sub-chip relative to the above-mentioned first sub-chip is determined based on the directional coordinate system constructed above, and the direction from the first sub-chip to the target sub-chip is quickly determined based on the determined orientation.
  • the shortest transmission path realizes fast forwarding of data, saves transmission bandwidth resources, and improves transmission efficiency.
  • the aforementioned first sub-chip includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used for Storing the data to be sent; the aforementioned method also includes: when there are at least two ports to send the aforementioned data, the aforementioned first sub-chip selects the first port to send the aforementioned data; the aforementioned first port is the sending buffer of the aforementioned at least two ports The port with the least amount of data to send.
  • data is sent through a port with a small amount to be sent, which can reduce the waiting time for data queuing and improve the efficiency of data sending.
  • the aforementioned first data packet includes multiple target chiplet identifiers, and the aforementioned multiple target chiplet identifiers include the aforementioned first chiplet identifier; the aforementioned method further includes:
  • the aforementioned first sub-chip stores the data in the aforementioned first data packet
  • the aforementioned first chiplet repackages the aforementioned data to obtain a sixth data packet
  • the aforementioned first sub-chip sends the aforementioned sixth data packet to a destination sub-chip other than the aforementioned first sub-chip.
  • the data packet can carry the identifiers of multiple destination sub-chips. Compared with the existing situation where each destination sends a data packet, the number of data packets to be sent can be reduced and the transmission bandwidth can be saved.
  • the present application provides a sub-chip, the sub-chip is a first sub-chip, and the aforementioned first sub-chip includes:
  • the receiving unit is configured to receive the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned first sub-chip and the aforementioned target sub-chip are sub-chips included in the chip system, and the plurality of sub-chips in the aforementioned chip system
  • the chips are arranged in a matrix, and each sub-chip in the aforementioned plurality of sub-chips is connected to surrounding adjacent sub-chips;
  • the sending unit is configured to send the data in the aforementioned first data packet to the aforementioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned destination with a smaller transmission bandwidth Principles of chiplets;
  • the aforementioned directional coordinate system is constructed around the aforementioned first sub-chip, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; the row where the aforementioned first sub-chip is located is located in the opposite direction on at least one of the aforementioned first direction axis and the aforementioned second direction axis; the column where the aforementioned first chiplet is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions .
  • the foregoing sending unit is specifically configured to:
  • the aforementioned target chiplet is on the target direction axis, send the aforementioned data along the direction of the aforementioned target direction axis;
  • the aforementioned target direction axis is the aforementioned first direction axis, the aforementioned second direction axis, the aforementioned third direction axis or the aforementioned Fourth orientation axis.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
  • the first target chiplet and the second target chiplet are located in two adjacent areas of the first area, the second area, the third area, and the fourth area respectively, send along the direction of the common direction axis
  • the second data packet; the aforementioned second data packet includes the aforementioned data, the identification of the first target chiplet and the second target chiplet;
  • the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent areas.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
  • the aforementioned first target chiplet When the aforementioned first target chiplet is in the aforementioned first area and the aforementioned second target chiplet is in the aforementioned third area, send the third data packet, and send the fourth data packet along the direction of one of the two boundary direction axes of the aforementioned third area;
  • the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first purpose sub-chip, and the aforementioned fourth data
  • the packet includes the aforementioned data and the aforementioned identification of the second target chiplet.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
  • the fifth data packet is sent along the direction of the direction axis of the boundary of the target area, and the fifth The data packet includes the aforementioned data and the identifications of the aforementioned first target chiplet and the second target chiplet; the aforementioned target area is the first area, the second area, the third area or the fourth area.
  • the aforementioned first sub-chip includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used for Store the data to be sent;
  • the aforementioned first sub-chip also includes a selection unit for:
  • the aforementioned first port is the port with the least amount of data to be sent in the sending buffer among the aforementioned at least two ports.
  • the aforementioned first data packet includes a plurality of target chiplet identifiers, and the identifiers of the aforementioned multiple target chiplets include the identifier of the aforementioned first chiplet; the aforementioned first chiplet further includes:
  • a storage unit configured to store the data in the aforementioned first data packet
  • An encapsulation unit configured to re-encapsulate the foregoing data to obtain a sixth data packet
  • the aforementioned sending unit is further configured to send the aforementioned sixth data packet to a destination chiplet other than the aforementioned first chiplet.
  • the present application provides a sub-chip, which includes a processor, a memory, and a communication port; wherein the aforementioned memory and the communication port are coupled to the aforementioned processor, the aforementioned communication port is used to send and receive data, and the aforementioned memory is used to store computer program, the aforementioned processor is used to call the aforementioned computer program, so that the aforementioned sub-chip executes any one of the aforementioned methods in the first aspect; the aforementioned sub-chip is a sub-chip included in the chip system, and the plurality of sub-chips in the aforementioned chip system are in the form of a matrix Arranged, each sub-chip in the foregoing plurality of sub-chips is connected to surrounding adjacent sub-chips.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the present application provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by a processor, the method described in any one of the first aspect is implemented.
  • the above-mentioned second aspect to the fifth aspect are all corresponding to implementing the method provided by any one of the above-mentioned first aspect. Therefore, the beneficial effects that it can achieve can refer to the beneficial effects in the corresponding method, and will not be repeated here.
  • Fig. 1 is a schematic diagram of the chip system provided by the present application.
  • FIG. 2 is a schematic structural diagram of a chiplet provided by the present application.
  • 3 to 6 are schematic diagrams of the chip system provided by the present application.
  • FIG. 7 is a schematic diagram of sub-chipset division provided by the present application.
  • FIG. 8 is a schematic flowchart of a data transmission processing method in the chip system provided by the present application.
  • FIG. 9 is a schematic diagram of the data packet structure provided by the present application.
  • Fig. 10 is a schematic diagram of the direction coordinate system provided by the present application.
  • FIG. 11 is a schematic diagram of a direction coordinate system based on subchip construction provided by the present application.
  • FIG. 12 is a schematic structural diagram of a virtual device provided by the present application.
  • FIG. 13 is a schematic structural diagram of a physical device provided by the present application.
  • FIG. 1 is a schematic structural diagram of a chip system provided by an embodiment of the present application.
  • the chip system 110 includes a plurality of sub-chips (16 sub-chips are exemplarily shown in FIG. 1 ), and the multiple sub-chips are connected according to a preset topology connection relationship.
  • the 16 sub-chips in FIG. 1 can be arranged in a matrix form, Then, a single chiplet is respectively connected to two, three or four surrounding chiplets.
  • FIG. 1 schematically shows the memory of some sub-chips.
  • the memory can be, for example, synchronous dynamic random access memory (synchronous dynamic random access memory, SDRAM) or double rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), DDRSDRAM can be abbreviated as DDR.
  • SDRAM synchronous dynamic random access memory
  • DDRSDRAM Double Data Rate SDRAM
  • Each sub-chip in the chip system 100 has complete processing capability and can perform tasks independently. Of course, multiple sub-chips in the chip system 100 can cooperate with each other to execute large-scale processing tasks.
  • FIG. 2 exemplarily shows a schematic structural diagram of the chiplets in the above-mentioned chip system 110 .
  • the structure of the sub-chip can be network-on-chip (network-on-chip, Presented in the form of NoC). It can be seen that the chiplet may include a processing module, a routing module, a static memory, a memory controller and four ports (d0, d1, d2 and d3).
  • the above-mentioned processing module is a control unit (control unit, CU) in the sub-chip, which is responsible for the management of each processing flow in the sub-chip.
  • CU control unit
  • the above-mentioned routing module is responsible for data synchronization inside the sub-chip, data synchronization between sub-chips, data broadcasting and data transmission.
  • the routing module also includes a control unit, which is used to manage the routing process in the routing module.
  • the routing module also includes a local buffer, which can be used to temporarily store data to be processed.
  • the routing module also includes the port forwarding mapping module (forwarding-port mapper, FPM), the FPM can be a hardware module or a software module, a port forwarding mapping table is stored in the FPM, and the port forwarding mapping table includes the mapping relationship between the destination sub-chip and the sending port, which can be used to forward the data packet Mapped to the corresponding port for sending.
  • the routing module stores the stream input table (stream in table, SIT) and stream output table (stream out table, SOT), the SIT and SOT are used for data transmission between sub-chips, which will be introduced in detail later, and will not be described in detail here.
  • the above-mentioned static memory may be a static random-access memory (static random-access memory, SRAM), etc., and is used for storing data in the sub-chip.
  • static random-access memory static random-access memory, SRAM
  • SRAM static random-access memory
  • the above-mentioned memory controller is connected to the memory corresponding to the sub-chip, and the memory controller may be, for example, a DDR controller or the like.
  • the above four ports are network interfaces of the sub-chips, which can realize data transmission between the above-mentioned sub-chips.
  • the connection between the above sub-chips and sub-chips is realized through the four ports.
  • the sub-chip may also include two such ports, for example, sub-chip 0 , sub-chip 3 , sub-chip 12 and sub-chip 15 in FIG. 1 .
  • the sub-chip may also include three such ports, such as the sub-chip 4 in FIG. 1 .
  • FIG. 3 exemplarily shows a chip system 120, which also includes a plurality of sub-chips (8 sub-chips are exemplarily shown in FIG. 2 ), and a plurality of sub-chips of the chip system 120 are arranged and connected in the form of a cuboid, which This connection method can shorten the data transmission path between sub-chips as much as possible.
  • the chiplets in the system-on-chip 120 reference may be made to the corresponding description in FIG. 2 above, and details will not be repeated here.
  • the above-mentioned chip system 110 or chip system 120 can be used as a chip subsystem, by A plurality of such subsystems form a larger system-on-a-chip, for example, refer to the system-on-a-chip 130 shown in FIG. 4 .
  • the system-on-a-chip 130 may include multiple subsystems of the above-mentioned chips.
  • FIG. 4 shows an example including 8 subsystems.
  • Each subsystem of the multiple subsystems may be the above-mentioned system-on-a-chip 110 or system-on-a-chip 120 .
  • Each subsystem can be regarded as a whole, then the multiple subsystems can be connected through a preset topological connection relationship, for example, the connection can be arranged in the form of a cuboid, as shown in FIG. 4 .
  • the above-mentioned chip system 110 is used as an example to illustrate the connection structure diagram of each subsystem in the chip system 130 , as shown in FIG. 5 .
  • the chip system 130 includes 8 subsystems, each of the 8 subsystems includes 16 sub-chips, and the 16 sub-chips can be arranged and connected in a matrix. Between two adjacent subsystems, the connection between the two subsystems can be realized by connecting any sub-chip in one subsystem to any sub-chip in the other subsystem.
  • the chiplets arranged at the corners of the matrix in each subsystem are exemplarily used as the chiplets connected to another subsystem. For example, in subsystem 0, the connection between subsystem 0 and subsystem 1 is established through chiplet 3 in subsystem 0 and chiplet 0 in subsystem 1.
  • connection between the subsystem 0 and the subsystem 2 is established through the chiplet 12 in the subsystem 0 and the chiplet 0 in the subsystem 2 .
  • connection between subsystem 0 and subsystem 4 is established through chiplet 0 in subsystem 0 and chiplet 0 in subsystem 4 .
  • the foregoing subsystems may further include a control bus, for example, refer to the chip system 140 shown in FIG. 6 .
  • the D-type bus shown in the chip system 140 is the control bus.
  • the system-on-a-chip 140 includes a central controller (the central controller may be a sub-chip or a control module in the system-on-a-chip 140 ), and the central controller may manage the task processing flow in the system-on-a-chip 140 .
  • the control bus is connected with the central controller for each subsystem to receive control instructions from the central controller.
  • each subsystem may be connected to the control line by a sub-chip, and after receiving the control command, the sub-chip may forward it to the corresponding sub-chip in the same subsystem.
  • each chiplet in each subsystem is connected to the control line for directly receiving control commands. This application does not limit the specific connection of the controller.
  • the system-on-a-chip provided in the embodiment of the present application also includes a central controller for managing the task processing flow of the entire system-on-a-chip.
  • the central controller may be a sub-chip or a control module in the chip system.
  • the central controller can obtain the load conditions and resource usage conditions of all sub-chips in the chip system, so as to assign tasks to each sub-chip based on these conditions.
  • a task scheduler (scheduler) in the central controller can be used to assign tasks to these sub-chips based on information such as load conditions and resource usage conditions of each sub-chip.
  • the above-mentioned central controller can also be responsible for data scheduling in the chip system. Specifically, the central controller obtains the data transmission status of each sub-chip through the control bus, and can know the congestion status of each transmission path and/or the port congestion status of each sub-chip by analyzing the data transmission status, so that based on these According to the situation, the data transmission strategy is formulated and sent to each sub-chip in the form of scheduling information. Each sub-chip sends corresponding data based on the scheduling information issued by the controller, thereby reducing the probability of congestion and improving data transmission efficiency.
  • each sub-chip is capable of processing data independently. But in the case of large data tasks, the processing efficiency of a single sub-chip is low.
  • the multiple sub-chips included in the above chip system may be divided into multiple sub-chip groups, and each sub-chip group includes at least one sub-chip. In this way, data tasks can be processed with the sub-chipset as a processing unit, thereby improving processing efficiency.
  • FIG. 7 takes the above chip system 110 as an example, and divides 16 sub-chips in the chip system into 9 sub-chip groups. Refer to the division situation shown in FIG. 7 for details, and each sub-chip group includes at least one sub-chip.
  • multiple subchips of the same subchipset may be adjacent subchips, such as subchipset 3 , subchipset 4 , subchipset 7 , and subchipset 8 .
  • the sub-chips of the same sub-chipset may be non-adjacent sub-chips, for example, the sub-chipset 2 is composed of non-adjacent sub-chips 1 and 12 .
  • the chip system provided in the embodiment of the present application may implement data task processing in a manner of data parallelism, model parallelism, or model parallelism plus data parallelism. in:
  • Data parallelism refers to dividing the data to be processed into several data blocks, and assigning the several data blocks to different sub-chipsets, and each chipset runs the same processing program to process the allocated data. For example, assuming that the data to be processed is divided into 3 data blocks, and the existing 3 sub-chipsets can run the same processing program to process the data blocks, then the first data block of the 3 data blocks can be sent to the 3 The first sub-chipset in the three sub-chipsets processes, the second data block in the three data blocks is sent to the second sub-chipset in the three sub-chipsets for processing, and the third data block in the three data blocks is processed. The data block is sent to the third sub-chipset among the three sub-chipsets for processing.
  • Model parallelism means that multiple sub-chipsets jointly complete a data processing task, and each sub-chipset in the multiple sub-chipsets only executes a part of the entire data processing task (the part of the steps may be one or more processing steps). For example, assuming that a data processing task requires three steps to be processed, two sub-chipsets can be configured to jointly complete the task. Wherein, the first sub-chipset completes the processing of the first two steps in the three steps, and the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the third step. Alternatively, three sub-chipsets can be configured to work together to accomplish this task.
  • the first sub-chipset completes the processing of the first step in the three steps
  • the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the second step
  • the third sub-chipset completes the processing of the second step from the first sub-chipset. 2.
  • the sub-chipset acquires the processed data to complete the processing in the third step. That is, each chipset may complete one or more steps, which may be specifically determined according to the load and resource usage of the chipset.
  • the method of model parallelism plus data parallelism combines the above two methods of data parallelism and model parallelism to process data. For example, if a data processing task needs to go through three steps to complete the processing, then three sub-chipsets can be configured to jointly complete the task. Wherein, the first sub-chipset completes the processing of the first step in the three steps, the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the second step, and the third sub-chipset completes the processing of the second step from the first sub-chipset. 2. The sub-chipset acquires the processed data to complete the processing in the third step. However, since the processing of the first step is relatively complicated, it takes more time to complete the processing of this step.
  • one or more sub-chipsets can be configured to jointly execute the processing task of the first step.
  • a fourth sub-chipset may be further configured to perform the processing task of the first step together with the aforementioned first sub-chipset.
  • the data for processing in the first step may be divided into two parts, one part is sent to the first sub-chipset for processing, and the other part is sent to the fourth sub-chipset for processing. Then, the processed data of the first sub-chipset and the fourth sub-chipset are sent to the second sub-chipset for processing in the second step.
  • each processing step can be processed by data parallel processing, or some processing steps can be processed by data parallel processing, which can be based on specific The implementation is determined, and this application does not limit it.
  • the central controller of the chip system may distribute data processing tasks to each sub-chipset.
  • Using model parallelism or model parallelism plus data parallelism to realize data task processing requires data transmission between sub-chips. Data transmission will generate time delay and reduce processing efficiency.
  • an embodiment of the present application provides a data transmission processing method in the chip system.
  • the data transmission processing method provided by the embodiment of the present application includes but is not limited to the following steps:
  • the first sub-chip receives a first data packet; wherein, the first data packet includes the identification of the target sub-chip; the first sub-chip and the target sub-chip are sub-chips included in the chip system, and the chip system includes multiple The chiplets are connected in a preset topology.
  • the chip system may be the chip system 110 , the chip system 120 , the chip system 130 , or the chip system 140 described above.
  • the first sub-chip may be any sub-chip in any one of these chip systems.
  • the system-on-a-chip where the first sub-chip is located is the first system-on-a-chip.
  • the first sub-chip receives the first data packet from another sub-chip in the first system-on-a-chip.
  • the first data packet may include one or more of information such as packet type (type), task identifier, data stream identifier, destination sub-chip identifier, packet number, and data. item. in:
  • the packet type is used to indicate a specific type of a packet, and the packet type may include a data packet (DATA), a header packet (Header), or a unblocked packet (unblock, UB) and the like.
  • the packet type of the first data packet is DATA.
  • the identifier of the task refers to the identifier of the data processing task corresponding to the package. Multiple data processing tasks can be processed simultaneously in the chip system, and each data processing task has its corresponding identifier.
  • the identifier of the task may be, for example, 1 or other identifiers, which is not limited in the present application.
  • the above-mentioned first data packet is a carrier for data in a certain data processing task to be transmitted between chiplets, therefore, the identifier of the task in the first data packet is the identifier of the certain data processing task.
  • each data packet can carry 1kb of data. If the total size of the transmitted data is 64kb, then these data can be split and encapsulated into 64 data packets for transmission, and the 64 data packets can be form a data stream.
  • Each data flow is configured with an identifier, which is the identifier of the data flow. The identifier of the data stream included in the first data packet is the identifier of the data stream where the first data packet is located.
  • the identification of the destination chiplet is used to indicate the destination to which the packet is destined.
  • the identifiers of the target chiplets in the first data packet may be identifiers of one or more target chiplets. If the data in the first data packet corresponds to one target sub-chip, the identifier of the target sub-chip in the first data packet is the identifier of the one destination sub-chip. If there are multiple target chiplets corresponding to the data in the first data packet, the identifiers of the target chiplets in the first data packet are the identifiers of the multiple target chiplets. For example, if there are two target chiplets of the data packet, which are chiplet 0 and chiplet 9 respectively, then the data packet includes the identifiers of chiplet 0 and chiplet 9 .
  • the packet number refers to the sequential numbering of a packet within the data stream to which it belongs.
  • the data is the payload (layout) in the packet, which is the actual transmission content.
  • the above-mentioned first data packet may also carry sideband information, and the sideband information may include one or more information of a task identifier, a data flow identifier, or a destination chiplet identifier.
  • the sideband information may not be encapsulated in the first data packet, but sent together with the first data packet.
  • the information included in the first data packet can only be kept alive by the routing module of the sub-chip, and other modules such as ports in the sub-chip are not aware of it. Therefore, in order to facilitate fast forwarding of the first data packet, the first data packet may be configured to carry the foregoing sideband information.
  • FIG. 1 In order to facilitate understanding of the format of the first data packet and the format of the side information, refer to FIG.
  • first data packet and the corresponding sideband information shown in FIG. 9 are only examples.
  • the first data packet may also include other information, and the sideband information may also include more information.
  • the present application There is no restriction on this.
  • the above-mentioned first sub-chip sends the data in the above-mentioned first data packet to the above-mentioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the smaller bandwidth consumption principle is to deliver the data with a smaller transmission bandwidth
  • the principle of the target sub-chip the direction coordinate system is constructed with the first sub-chip as the center.
  • FIG. 10 exemplarily shows a schematic diagram of a direction coordinate system centered on the first chiplet.
  • the directional coordinate system includes four directional axes: a first directional axis, a second directional axis, a third directional axis and a fourth directional axis.
  • the four direction axes all diverge outward from the center of the first sub-chip.
  • the first direction axis and the second direction axis are collinear and opposite in direction; the third direction axis and the fourth direction axis are collinear and opposite in direction.
  • the direction coordinate system also includes four regions: a first region, a second region, a third region and a fourth region.
  • the first area is bounded by the first direction axis and the third direction axis
  • the second area is bounded by the second direction axis and the third direction axis
  • the third area is bounded by the second direction axis and the fourth direction axis as boundaries
  • the fourth region is bounded by the first direction axis and the fourth direction axis.
  • the row of the first sub-chip is located on at least one of the first direction axis and the second direction axis
  • the column of the first sub-chip is located on the third direction axis and the second direction axis. on at least one of the four directional axes.
  • FIG. 11 An example can be seen in FIG. 11 .
  • the sub-chip 5 in the chip system is the first sub-chip
  • a direction coordinate system is established with the sub-chip 5 as the center.
  • the second row of the chiplet 5 is located on the first direction axis and the second direction axis
  • the second column of the chiplet 5 is located on the third direction axis and the fourth direction axis .
  • Chiplet 2 and the chiplet 3 are located in the first region of the directional coordinate system.
  • Chiplet 0 is located in the second area of the orientation coordinate system.
  • Chiplet 8 and chiplet 12 are located in the third region of the directional coordinate system.
  • the chiplet 10 , the chiplet 11 , the chiplet 14 and the chiplet 15 are located in the fourth area of the directional coordinate system.
  • the sub-chip 0 is the first sub-chip in the above-mentioned chip system in FIG. 11 .
  • a direction coordinate system is established centering on the sub-chip 0 .
  • the first row of the chiplet 0 is located on the first directional axis
  • the first column of the chiplet 0 is located on the fourth directional axis. Then, except for the sub-chips in the row and column where the sub-chip 0 is located, all other sub-chips are located in the fourth area of the coordinate system in this direction.
  • the sending of the data in the first data packet to the destination chiplet by the first sub-chip based on the direction coordinate system and the principle of less bandwidth consumption includes: the destination sub-chip included in the first data packet When the chip is on the target direction axis, the above-mentioned first sub-chip sends the data of the first data packet along the direction of the target direction axis; the target direction axis is the first direction axis, the second direction axis, the The third directional axis or the fourth directional axis.
  • FIG. 11 For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
  • the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 7 . If the data packet only includes the identifier of one destination chiplet, then, since the chiplet 7 is located on the first direction axis, the chiplet 5 sends the data packet along the direction of the first direction axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 6 , and then the sub-chip 6 forwards the data packet to the sub-chip 7 .
  • the chiplet 7 as one of the target chiplets is located on the first direction axis. Therefore, the chiplet 5 copies the data in the data packet to generate a new data packet, and sends the new data packet along the direction of the first direction axis. That is, the sub-chip 5 first sends the new data packet to the sub-chip 6 , and then the sub-chip 6 forwards it to the sub-chip 7 .
  • the newly generated data packet includes the identification of the chiplet 7 .
  • the first data packet includes identifiers of the first destination chiplet and the second destination chiplet.
  • the first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first In the case where the target chiplet and the second target chiplet are respectively located in two adjacent areas of the first area, the second area, the third area and the fourth area of the coordinate system, the first chiplet is along the common direction A second packet is sent in the direction of the axis.
  • the second data packet includes the data, identifiers of the first destination chiplet and the second destination chiplet.
  • the common direction axis is the direction axis of the common boundary of the two adjacent regions. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
  • the chiplet 5 is the above-mentioned first chiplet, and it receives a data packet, and the identification of the target chiplet in the data packet indicates that the target chiplets are the chiplet 8 and the chiplet 14 .
  • the sub-chip 8 is located in the third area, and the sub-chip 14 is located in the fourth area, these two areas are adjacent areas, and the common boundary is the fourth direction axis. Therefore, the chiplet 5 sends the data packet along the direction of the fourth directional axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 9 , and then the sub-chip 9 further forwards it.
  • the sub-chip 9 can also be regarded as the above-mentioned first sub-chip, and a direction coordinate system is established with the sub-chip 9 as the center, and then data is forwarded based on the above-mentioned principle of smaller bandwidth consumption.
  • the first data packet includes identifiers of the first destination chiplet and the second destination chiplet.
  • the first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first In the case where the target chiplet is in the first area of the coordinate system and the second target chiplet is in the third area, the first chiplet is along one of the direction axes of the two boundaries of the first area.
  • the third data packet is sent in a direction, and the fourth data packet is sent along the direction of one of the two boundary direction axes of the third area.
  • the third data packet includes the data and the identifier of the first destination chiplet.
  • the fourth data packet includes the data and the identifier of the second destination chiplet.
  • the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identification of the destination sub-chip in the data packet indicates that the destination sub-chips are the sub-chip 2 and the sub-chip 12 .
  • the chiplet 2 is located in the first area, and the chiplet 12 is located in the third area.
  • the chiplet 5 can regenerate two data packets: data packet A and data packet B based on the data in the received data packet.
  • the data package A includes data and the identification of the sub-chip 2
  • the data package B includes the data and the identification of the sub-chip 12 .
  • the data packet A is sent along the direction of the first direction axis or the third direction axis.
  • the data packet A is sent along the direction of the first direction axis, that is, the data packet A is first sent to the sub-chip 6 , and then the sub-chip 6 forwards the data packet A to the sub-chip 2 .
  • the chiplet 5 sends the data packet B along the direction of the second direction axis or the fourth direction axis.
  • the data packet B is sent along the direction of the fourth direction axis, that is, the data packet B is first sent to the sub-chip 9 , and then the sub-chip 9 continues to forward it further.
  • the first data packet includes identifiers of the first destination chiplet and the second destination chiplet.
  • the first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first When the target chiplet is in the second area of the coordinate system, and the second target chiplet is in the fourth area, the first chiplet is along one of the direction axes of the two boundaries of the second area.
  • the third data packet is sent in the direction, and the fourth data packet is sent along the direction of one of the two boundary direction axes of the fourth area.
  • the third data packet includes the data and the identifier of the first destination chiplet.
  • the fourth data packet includes the data and the identifier of the second destination chiplet.
  • chiplet 5 is the above-mentioned first chiplet, and it receives a data packet, and the identification of the target chiplet in the data packet indicates that the target chiplets are chiplet 0 and chiplet 10 .
  • Chiplet 0 is located in the second area
  • chiplet 10 is located in the fourth area.
  • the chiplet 5 can regenerate two data packets: data packet C and data packet D based on the data in the received data packet.
  • the data package C includes data and the identification of the sub-chip 0
  • the data package D includes the data and the identification of the sub-chip 10 .
  • the data packet C is sent along the direction of the second direction axis or the third direction axis.
  • the data packet C is sent along the direction of the second direction axis, that is, the data packet C is first sent to the sub-chip 4 , and then the sub-chip 4 forwards the data packet C to the sub-chip 0 .
  • the chiplet 5 sends the data packet D along the direction of the first direction axis or the fourth direction axis.
  • the data packet D is sent along the direction of the first direction axis, that is, the data packet D is first sent to the sub-chip 6 , and then the sub-chip 6 forwards it to the sub-chip 10 .
  • the first data packet includes identifiers of the first destination chiplet and the second destination chiplet.
  • the first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first When the target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the first chiplet sends the fifth data packet along the direction axis of the boundary of the target area.
  • the fifth data packet includes the data and identifications of the first and second destination chiplets.
  • the target area is the first area, the second area, the third area or the fourth area.
  • the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identification of the destination sub-chip in the data packet indicates that the destination sub-chips are the sub-chip 14 and the sub-chip 9 .
  • the chiplet 14 is located in the fourth region, and the chiplet 9 is located on the fourth axis.
  • the fourth direction axis is the boundary direction axis of the fourth area, then the chiplet 5 can send the received data packet along the direction of the fourth direction axis, that is, to the chiplet 9 .
  • the chiplet 9 After the chiplet 9 receives the data packet, it stores the data in the data packet. And make a copy of the data to regenerate a data package.
  • the new data packet includes the identification of the sub-chip 14 , and then the new data packet is sent to the sub-chip 13 or the sub-chip 10 , and then forwarded to the sub-chip 14 by the sub-chip 13 or the sub-chip 10 .
  • the above-mentioned first sub-chip may also send the data in the above-mentioned first data packet based on the congestion situation of its own port.
  • each sub-chip includes a plurality of ports for communicating with other sub-chips.
  • each port is configured with a corresponding sending buffer, and the sending buffer is used to store data to be sent.
  • the first sub-chip After the first sub-chip receives the first data packet, it parses the first data packet to obtain the identity of the destination sub-chip in the first data packet. If the identification of the target chiplet indicates that the first chiplet is the target chiplet, then the first chiplet extracts the data stored in the first data packet for subsequent processing. Otherwise, the first sub-chip looks up the sending port of the first data packet in its own forwarding mapping table by using the identifier of the destination sub-chip as an index. For the introduction of the forwarding mapping table, reference may be made to the corresponding description in the foregoing description of FIG. 2 , which will not be repeated here. If there are multiple sending ports found, the specific sending port may be determined based on the congestion conditions of the sending buffers of the multiple sending ports. Specifically, in order to improve data transmission efficiency, the port with the least amount of data to be sent among the sending buffers of the multiple sending ports may be selected to send the first data packet.
  • the forwarding mapping table in the first chiplet may be initialized based on the above-mentioned direction coordinate system and the above-mentioned principle of smaller bandwidth consumption.
  • the chiplet 5 can be determined based on the constructed direction coordinate system and the principle of smaller bandwidth consumption:
  • a data packet destined for the sub-chip 2 can be sent through its own port d0 or d1. Therefore, in the forwarding mapping table of the chiplet 5 , the sending port corresponding to the destination chip 2 is port d0 or d1 .
  • the sending port corresponding to the destination chip 2 is port d0 or d1 .
  • the first chiplet extracts the first data packet The data in is stored for subsequent processing.
  • the first chiplet uses the identifiers of the remaining target chiplets as an index to look up the data sending port of the first data packet in its own forwarding mapping table.
  • the identification of the above-mentioned remaining target sub-chip is one, then, in the same way, after finding the corresponding sending port, select the port with the least amount of data to be sent in the sending buffer of the sending port to send the first data packet The data. Specifically, the data will be repackaged into a data packet for transmission, and the identification of the target sub-chip in the repackaged data packet no longer includes the identification of the first sub-chip, but only includes the identification of the remaining target sub-chips .
  • the first sub-chip searches for corresponding sending ports in its own forwarding mapping table. If the found sending ports are the same, then the data included in the first data packet may be copied to regenerate a data packet, and the newly generated data packet includes the identifiers of the remaining destination chiplets. And send this newly generated packet from the same send port found. Similarly, the sending port may be the port with the least amount of data to be sent in the sending buffer among the found sending ports.
  • the remaining target sub-chips are sub-chips A and sub-chips B.
  • the above-mentioned first sub-chip looks up the sending port mapped with the identifier of the sub-chip A and the sending port mapped with the identifier of the sub-chip B in its own forwarding mapping table. Assuming that the found sending ports are different, the first sub-chip may regenerate two data packets: data packet A and data packet B.
  • Both data packets include the data included in the above-mentioned first data packet, wherein the identifier of the target sub-chip included in the data packet A is the identifier of the sub-chip A, and the identifier of the target sub-chip included in the data packet B is the identifier of the sub-chip B. Then, the data packet A and the data packet B are sent through the respectively found sending ports. Similarly, the sending port may be the port with the least amount of data to be sent in the sending buffer among the found sending ports.
  • the embodiment of the present application transmits the received data based on the data transmission situation in the chip system, so as to flexibly schedule the data transmission, improve the data transmission efficiency, and further improve the processing performance of the chip system.
  • each device includes a corresponding hardware structure and/or software module for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • the embodiments of the present application may divide the device into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in this embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 12 shows a specific logical structural diagram of the device, which may be the above-mentioned first sub-chip.
  • the device 1200 includes:
  • the receiving unit 1201 is configured to receive the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned device 1200 and the aforementioned target sub-chip are sub-chips included in the chip system, and the multiple sub-chips in the aforementioned chip system Arranged in the form of a matrix, each of the aforementioned multiple sub-chips is connected to surrounding adjacent sub-chips;
  • the sending unit 1202 is configured to send the data in the aforementioned first data packet to the aforementioned destination chiplet based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned The principle of the target chip;
  • the aforementioned directional coordinate system is constructed around the aforementioned device 1200, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; on at least one of the direction axis and the aforementioned second direction axis; the row in which the device 1200 is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions.
  • the foregoing sending unit 1202 is specifically configured to:
  • the aforementioned target chiplet is on the target direction axis, send the aforementioned data along the direction of the aforementioned target direction axis;
  • the aforementioned target direction axis is the aforementioned first direction axis, the aforementioned second direction axis, the aforementioned third direction axis or the aforementioned Fourth orientation axis.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
  • the first target chiplet and the second target chiplet are located in two adjacent areas of the first area, the second area, the third area, and the fourth area respectively, send along the direction of the common direction axis
  • the second data packet; the aforementioned second data packet includes the aforementioned data, the identification of the first target chiplet and the second target chiplet;
  • the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent regions.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
  • the aforementioned first target chiplet When the aforementioned first target chiplet is in the aforementioned first area and the aforementioned second target chiplet is in the aforementioned third area, send the third data packet, and send the fourth data packet along the direction of one of the two boundary direction axes of the aforementioned third area;
  • the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first purpose sub-chip, and the aforementioned fourth data
  • the packet includes the aforementioned data and the aforementioned identification of the second target chiplet.
  • the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis.
  • the four-direction axis is the boundary;
  • the aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
  • the fifth data packet is sent along the direction of the direction axis of the boundary of the target area, and the fifth The data packet includes the aforementioned data and the identifications of the aforementioned first target chiplet and the second target chiplet; the aforementioned target area is the first area, the second area, the third area or the fourth area.
  • the aforementioned device 1200 includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used to store the data sent;
  • the aforementioned device 1200 also includes a selection unit for:
  • the aforementioned first port is the port with the least amount of data to be sent in the sending buffer among the aforementioned at least two ports.
  • the aforementioned first data packet includes multiple target chiplet identifiers, and the aforementioned multiple target chiplet identifiers include the aforementioned device 1200; the aforementioned device 1200 further includes:
  • a storage unit configured to store the data in the aforementioned first data packet
  • An encapsulation unit configured to re-encapsulate the foregoing data to obtain a sixth data packet
  • the aforementioned sending unit 1202 is further configured to send the aforementioned sixth data packet to a destination chiplet other than the aforementioned device 1200 .
  • FIG. 13 is a schematic diagram of a specific hardware structure of the device provided by the present application.
  • the device 1300 may be the above-mentioned first chiplet.
  • the device 1300 includes: a processor 1301 , a memory 1302 and a communication port 1303 .
  • the processor 1301 , the communication port 1303 and the memory 1302 may be connected to each other or through a bus 1304 .
  • the memory 1302 is used to store computer programs and data of the device 1300, and the memory 1302 may include but not limited to random storage memory (random access memory, RAM), read-only memory (read-only memory, ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD-ROM), etc.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • portable read-only memory compact disc read-only memory
  • the communication port 1303 includes a sending port and a receiving port.
  • the number of the communication port 1303 may be multiple, and is used to support the device 1300 to communicate, for example, to receive or send data or messages.
  • the communication port 1303 may be the ports d0, d1, d2 and d3 shown in FIG. 2 above.
  • the processor 1301 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the processor 1301 may be the processing module shown in FIG. 2 above.
  • the processor 1301 may be used to read the program stored in the above-mentioned memory 1302, so that the device 1300 executes the operations performed by the first chiplet as described above in FIG. 8 and its specific embodiments.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor as described in any of the above-mentioned FIG. 8 and its possible method embodiments. Operation performed by the first chiplet.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product is read and executed by a computer, the operation performed by the first sub-chip described in any of the above-mentioned FIG. 8 and its possible method embodiments will be realized.

Abstract

Embodiments of the present application provide a data transmission processing method in a chip system and a related apparatus. The method comprises: a first sub-chip receives a first data packet, wherein the first data packet comprises an identifier of a target sub-chip, the first sub-chip and the target sub-chip are sub-chips comprised in the chip system, a plurality of sub-chips in the chip system are arranged in the form of a matrix, and each sub-chip among the plurality of sub-chips is connected to surrounding adjacent sub-chips; and the first sub-chip sends data in the first data packet to the target sub-chip according to a small bandwidth consumption principle on the basis of a direction coordinate system, wherein the small bandwidth consumption principle is a principle that data is sent to the target sub-chip by using a small transmission bandwidth; and the direction coordinate system is constructed by using the first sub-chip as the center. According to the present application, efficient data transmission between sub-chips in a chip system can be achieved, and the processing performance of the chip system is improved.

Description

芯片系统中的数据传输处理方法及相关装置Data transmission processing method and related device in chip system 技术领域technical field
本申请涉及通信技术领域,尤其涉及一种芯片系统中的数据传输处理方法及相关装置。The present application relates to the field of communication technologies, and in particular to a data transmission processing method and related devices in a chip system.
本申请要求于2021年12月28日提交中国专利局,申请号为202111633357.X、申请名称为“芯片系统中的数据传输处理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111633357.X and the application name "Data transmission processing method and related device in the chip system" submitted to the China Patent Office on December 28, 2021, the entire content of which is passed References are incorporated in this application.
背景技术Background technique
一个芯片系统可以包括多个子芯片,每个子芯片都具备单独处理数据的功能,该多个子芯片以一定的拓扑连接以实现互相通信。并且,该多个子芯片可以通过模型并行的方式协同处理单个大型计算任务,以提高任务的处理效率。在协同处理任务的过程中,该多个子芯片之间需要频繁进行数据的交互传输,该数据传输的效率影响着整个芯片系统的处理性能。A chip system may include multiple sub-chips, each of which has the function of independently processing data, and the multiple sub-chips are connected in a certain topology to realize mutual communication. Moreover, the multiple sub-chips can cooperatively process a single large-scale computing task in a model-parallel manner, so as to improve the processing efficiency of the task. In the process of cooperatively processing tasks, frequent interactive transmission of data is required among the multiple sub-chips, and the efficiency of the data transmission affects the processing performance of the entire chip system.
技术问题technical problem
本申请实施例公开了一种芯片系统中的数据传输处理方法及相关装置,能够实现芯片系统中子芯片之间的高效数据传输,提高芯片系统的处理性能。The embodiment of the present application discloses a data transmission processing method in a chip system and related devices, which can realize efficient data transmission between sub-chips in the chip system and improve the processing performance of the chip system.
第一方面,本申请提供一种芯片系统中的数据传输处理方法,该方法包括:In a first aspect, the present application provides a data transmission processing method in a chip system, the method comprising:
第一子芯片接收第一数据包;其中,前述第一数据包包括目的子芯片的标识;前述第一子芯片和前述目的子芯片为芯片系统包括的子芯片,前述芯片系统中的多个子芯片以矩阵的形式排列,前述多个子芯片中的每个子芯片与周围相邻的子芯片连接;The first sub-chip receives the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned first sub-chip and the aforementioned target sub-chip are sub-chips included in the chip system, and the plurality of sub-chips in the aforementioned chip system Arranged in the form of a matrix, each of the aforementioned multiple sub-chips is connected to surrounding adjacent sub-chips;
前述第一子芯片基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据,前述较小带宽消耗原则为以较小的传输带宽将前述数据送达前述目的子芯片的原则;The aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned destination with a smaller transmission bandwidth Principles of chiplets;
前述方向坐标系以前述第一子芯片为中心构建,前述方向坐标系包括第一方向轴、第二方向轴、第三方向轴和第四方向轴;前述第一子芯片所在的行位于方向相反的前述第一方向轴和前述第二方向轴中的至少一个方向轴上;前述第一子芯片所在的列位于方向相反的前述第三方向轴和前述第四方向轴中的至少一个方向轴上。The aforementioned directional coordinate system is constructed around the aforementioned first sub-chip, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; the row where the aforementioned first sub-chip is located is located in the opposite direction on at least one of the aforementioned first direction axis and the aforementioned second direction axis; the column where the aforementioned first chiplet is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions .
本申请中,通过在芯片系统中以当前需要发送数据的子芯片为中心构建上述方向坐标系,然后,该子芯片基于方向坐标系以较小带宽消耗原则来传输接收到的数据,从而可以提高数据的传输效率,进而提高芯片系统的处理性能。In this application, the above-mentioned direction coordinate system is constructed centering on the sub-chip that currently needs to send data in the chip system, and then the sub-chip transmits the received data based on the direction coordinate system with the principle of less bandwidth consumption, thereby improving Data transmission efficiency, and then improve the processing performance of the chip system.
一种可能的实施方式中,前述第一子芯片基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据包括:在前述目的子芯片处于目标方向轴上的情况下,前述第一子芯片沿着前述目标方向轴的方向发送前述数据;前述目标方向轴为前述第一方向轴、前述第二方向轴、前述第三方向轴或前述第四方向轴。In a possible implementation manner, sending the data in the first data packet to the target chiplet by the first chiplet based on the direction coordinate system and the principle of less bandwidth consumption includes: when the target chiplet is on the target direction axis In some cases, the first chiplet sends the data along the direction of the target direction axis; the target direction axis is the first direction axis, the second direction axis, the third direction axis or the fourth direction axis.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述第一子芯片基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据包括:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
在前述第一目的子芯片和第二目的子芯片分别处于前述第一区域、第二区域、第三区域和第四区域中相邻的两个区域的情况下,前述第一子芯片沿着共同方向轴的方向发送第二数据包;前述第二数据包包括前述数据、第一目的子芯片和第二目的子芯片的标识;前述共同方向轴为前述相邻的两个区域共同边界的方向轴。In the case where the aforementioned first target chiplet and the second target chiplet are located in two adjacent areas of the aforementioned first area, second area, third area, and fourth area respectively, the aforementioned first chiplet is The second data packet is sent in the direction of the direction axis; the aforementioned second data packet includes the aforementioned data, the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent areas .
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述第一子芯片基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据包括:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
在前述第一目的子芯片处于前述第一区域,前述第二目的子芯片处于前述第三区域的情况下,前述第一子芯片沿着前述第一区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着前述第三区域两条边界方向轴中的一个方向轴的方向发送第四数据包;前述第三数据包包括前述数据和前述第一目的子芯片的标识,前述第四数据包包括前述数据和前述第二目的子芯片的标识。In the case where the first target chiplet is in the first region and the second target chiplet is in the third region, the first chiplet is along one of the direction axes of the two boundaries of the first region The third data packet is sent in the direction of the third area, and the fourth data packet is sent along the direction of one of the two boundary direction axes of the aforementioned third area; the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first destination chiplet , the aforementioned fourth data packet includes the aforementioned data and the ID of the aforementioned second destination chiplet.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述第一子芯片基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据包括:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned first sub-chip sends the data in the aforementioned first data packet to the aforementioned purpose sub-chip based on the direction coordinate system with the principle of less bandwidth consumption include:
在前述第一目的子芯片处于目标区域中,前述第二目的子芯片处于前述目标区域边界的方向轴上的情况下,前述第一子芯片沿着前述目标区域边界方向轴的方向发送第五数据包,前述第五数据包包括前述数据和前述第一目的子芯片和第二目的子芯片的标识;前述目标区域为第一区域、第二区域、第三区域或第四区域。In the case where the first target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the first chiplet sends the fifth data along the direction axis of the boundary of the target area package, the aforementioned fifth data packet includes the aforementioned data and the identification of the aforementioned first and second target chiplets; the aforementioned target area is the first area, the second area, the third area or the fourth area.
上述几种可能的实现方式中,基于上述构建的方向坐标系确定目的子芯片相对于上述第一子芯片的方位,基于该确定出的方位快速确定由该第一子芯片前往该目的子芯片的最短传输路径,从而实现了数据的快速转发,节省了传输带宽资源,提高了传输效率。In the above several possible implementations, the orientation of the target sub-chip relative to the above-mentioned first sub-chip is determined based on the directional coordinate system constructed above, and the direction from the first sub-chip to the target sub-chip is quickly determined based on the determined orientation. The shortest transmission path realizes fast forwarding of data, saves transmission bandwidth resources, and improves transmission efficiency.
一种可能的实施方式中,前述第一子芯片包括多个端口,前述多个端口中每个端口与另一个子芯片连接,前述每个端口对应有一个发送缓冲区,前述发送缓冲区用于存放待发送的数据;前述方法还包括:存在至少两个端口发送前述数据的情况下,前述第一子芯片选择第一端口发送前述数据;前述第一端口为前述至少两个端口中发送缓冲区内待发送的数据量最少的端口。In a possible implementation manner, the aforementioned first sub-chip includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used for Storing the data to be sent; the aforementioned method also includes: when there are at least two ports to send the aforementioned data, the aforementioned first sub-chip selects the first port to send the aforementioned data; the aforementioned first port is the sending buffer of the aforementioned at least two ports The port with the least amount of data to send.
本申请中,通过待发送量较小的端口发送数据,可以减少数据排队等待的时间,提高数据发送的效率。In this application, data is sent through a port with a small amount to be sent, which can reduce the waiting time for data queuing and improve the efficiency of data sending.
一种可能的实施方式中,前述第一数据包包括的目的子芯片的标识为多个,前述多个目的子芯片的标识中包括前述第一子芯片的标识;前述方法还包括:In a possible implementation manner, the aforementioned first data packet includes multiple target chiplet identifiers, and the aforementioned multiple target chiplet identifiers include the aforementioned first chiplet identifier; the aforementioned method further includes:
前述第一子芯片存储前述第一数据包中的数据;The aforementioned first sub-chip stores the data in the aforementioned first data packet;
前述第一子芯片将前述数据重新封装获得第六数据包;The aforementioned first chiplet repackages the aforementioned data to obtain a sixth data packet;
前述第一子芯片向除前述第一子芯片之外的目的子芯片发送前述第六数据包。The aforementioned first sub-chip sends the aforementioned sixth data packet to a destination sub-chip other than the aforementioned first sub-chip.
本申请中,数据包可以携带多个目的子芯片的标识,相比于现有的每个目的地都发送有一个数据包的情况,可以减少发送的数据包的数量,节省传输带宽。In this application, the data packet can carry the identifiers of multiple destination sub-chips. Compared with the existing situation where each destination sends a data packet, the number of data packets to be sent can be reduced and the transmission bandwidth can be saved.
第二方面,本申请提供一种子芯片,该子芯片为第一子芯片,前述第一子芯片包括:In a second aspect, the present application provides a sub-chip, the sub-chip is a first sub-chip, and the aforementioned first sub-chip includes:
接收单元,用于接收第一数据包;其中,前述第一数据包包括目的子芯片的标识;前述第一子芯片和前述目的子芯片为芯片系统包括的子芯片,前述芯片系统中的多个子芯片以矩阵的形式排列,前述多个子芯片中的每个子芯片与周围相邻的子芯片连接;The receiving unit is configured to receive the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned first sub-chip and the aforementioned target sub-chip are sub-chips included in the chip system, and the plurality of sub-chips in the aforementioned chip system The chips are arranged in a matrix, and each sub-chip in the aforementioned plurality of sub-chips is connected to surrounding adjacent sub-chips;
发送单元,用于基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据,前述较小带宽消耗原则为以较小的传输带宽将前述数据送达前述目的子芯片的原则;The sending unit is configured to send the data in the aforementioned first data packet to the aforementioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned destination with a smaller transmission bandwidth Principles of chiplets;
前述方向坐标系以前述第一子芯片为中心构建,前述方向坐标系包括第一方向轴、第二方向轴、第三方向轴和第四方向轴;前述第一子芯片所在的行位于方向相反的前述第一方向轴和前述第二方向轴中的至少一个方向轴上;前述第一子芯片所在的列位于方向相反的前述第三方向轴和前述第四方向轴中的至少一个方向轴上。The aforementioned directional coordinate system is constructed around the aforementioned first sub-chip, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; the row where the aforementioned first sub-chip is located is located in the opposite direction on at least one of the aforementioned first direction axis and the aforementioned second direction axis; the column where the aforementioned first chiplet is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions .
一种可能的实施方式中,前述发送单元具体用于:In a possible implementation manner, the foregoing sending unit is specifically configured to:
在前述目的子芯片处于目标方向轴上的情况下,沿着前述目标方向轴的方向发送前述数据;前述目标方向轴为前述第一方向轴、前述第二方向轴、前述第三方向轴或前述第四方向轴。In the case that the aforementioned target chiplet is on the target direction axis, send the aforementioned data along the direction of the aforementioned target direction axis; the aforementioned target direction axis is the aforementioned first direction axis, the aforementioned second direction axis, the aforementioned third direction axis or the aforementioned Fourth orientation axis.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
在前述第一目的子芯片和第二目的子芯片分别处于前述第一区域、第二区域、第三区域和第四区域中相邻的两个区域的情况下,沿着共同方向轴的方向发送第二数据包;前述第二数据包包括前述数据、第一目的子芯片和第二目的子芯片的标识;前述共同方向轴为前述相邻的两个区域共同边界的方向轴。In the case that the first target chiplet and the second target chiplet are located in two adjacent areas of the first area, the second area, the third area, and the fourth area respectively, send along the direction of the common direction axis The second data packet; the aforementioned second data packet includes the aforementioned data, the identification of the first target chiplet and the second target chiplet; the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent areas.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
在前述第一目的子芯片处于前述第一区域,前述第二目的子芯片处于前述第三区域的情况下,沿着前述第一区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着前述第三区域两条边界方向轴中的一个方向轴的方向发送第四数据包;前述第三数据包包括前述数据和前述第一目的子芯片的标识,前述第四数据包包括前述数据和前述第二目的子芯片的标识。When the aforementioned first target chiplet is in the aforementioned first area and the aforementioned second target chiplet is in the aforementioned third area, send the third data packet, and send the fourth data packet along the direction of one of the two boundary direction axes of the aforementioned third area; the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first purpose sub-chip, and the aforementioned fourth data The packet includes the aforementioned data and the aforementioned identification of the second target chiplet.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit is specifically used for:
在前述第一目的子芯片处于目标区域中,前述第二目的子芯片处于前述目标区域边界的方向轴上的情况下,沿着前述目标区域边界方向轴的方向发送第五数据包,前述第五数据包包括前述数据和前述第一目的子芯片和第二目的子芯片的标识;前述目标区域为第一区域、第二区域、第三区域或第四区域。When the first target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the fifth data packet is sent along the direction of the direction axis of the boundary of the target area, and the fifth The data packet includes the aforementioned data and the identifications of the aforementioned first target chiplet and the second target chiplet; the aforementioned target area is the first area, the second area, the third area or the fourth area.
一种可能的实施方式中,前述第一子芯片包括多个端口,前述多个端口中每个端口与另一个子芯片连接,前述每个端口对应有一个发送缓冲区,前述发送缓冲区用于存放待发送的数据;In a possible implementation manner, the aforementioned first sub-chip includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used for Store the data to be sent;
前述第一子芯片还包括选择单元,用于:The aforementioned first sub-chip also includes a selection unit for:
存在至少两个端口发送前述数据的情况下,选择第一端口发送前述数据;前述第一端口为前述至少两个端口中发送缓冲区内待发送的数据量最少的端口。When there are at least two ports to send the aforementioned data, select the first port to send the aforementioned data; the aforementioned first port is the port with the least amount of data to be sent in the sending buffer among the aforementioned at least two ports.
一种可能的实施方式中,前述第一数据包包括的目的子芯片的标识为多个,前述多个目的子芯片的标识中包括前述第一子芯片的标识;前述第一子芯片还包括:In a possible implementation manner, the aforementioned first data packet includes a plurality of target chiplet identifiers, and the identifiers of the aforementioned multiple target chiplets include the identifier of the aforementioned first chiplet; the aforementioned first chiplet further includes:
存储单元,用于存储前述第一数据包中的数据;a storage unit, configured to store the data in the aforementioned first data packet;
封装单元,用于将前述数据重新封装获得第六数据包;An encapsulation unit, configured to re-encapsulate the foregoing data to obtain a sixth data packet;
前述发送单元,还用于向除前述第一子芯片之外的目的子芯片发送前述第六数据包。The aforementioned sending unit is further configured to send the aforementioned sixth data packet to a destination chiplet other than the aforementioned first chiplet.
第三方面,本申请提供一种子芯片,该子芯片包括处理器、存储器和通信端口;其中,前述存储器和通信端口与前述处理器耦合,前述通信端口用于收发数据,前述存储器用于存储计算机程序,前述处理器用于调用前述计算机程序,以使得前述子芯片执行第一方面任一项前述的方法;前述子芯片为芯片系统包括的子芯片,前述芯片系统中的多个子芯片以矩阵的形式排列,前述多个子芯片中的每个子芯片与周围相邻的子芯片连接。In a third aspect, the present application provides a sub-chip, which includes a processor, a memory, and a communication port; wherein the aforementioned memory and the communication port are coupled to the aforementioned processor, the aforementioned communication port is used to send and receive data, and the aforementioned memory is used to store computer program, the aforementioned processor is used to call the aforementioned computer program, so that the aforementioned sub-chip executes any one of the aforementioned methods in the first aspect; the aforementioned sub-chip is a sub-chip included in the chip system, and the plurality of sub-chips in the aforementioned chip system are in the form of a matrix Arranged, each sub-chip in the foregoing plurality of sub-chips is connected to surrounding adjacent sub-chips.
第四方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,前述计算机程序被处理器执行时,实现第一方面任意一项所述的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium. When the aforementioned computer program is executed by a processor, the method described in any one of the first aspect is implemented.
第五方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机程序,当该计算机程序被处理器执行时,实现第一方面任意一项所述的方法。In a fifth aspect, the present application provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by a processor, the method described in any one of the first aspect is implemented.
可以理解地,上述第二方面至第五方面均对应用于执行上述第一方面中任一项所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。It can be understood that, the above-mentioned second aspect to the fifth aspect are all corresponding to implementing the method provided by any one of the above-mentioned first aspect. Therefore, the beneficial effects that it can achieve can refer to the beneficial effects in the corresponding method, and will not be repeated here.
附图说明Description of drawings
下面将对本申请实施例中所需要使用的附图作介绍。The drawings that need to be used in the embodiments of the present application will be introduced below.
图1为本申请提供的芯片系统示意图;Fig. 1 is a schematic diagram of the chip system provided by the present application;
图2为本申请提供的子芯片的结构示意图;FIG. 2 is a schematic structural diagram of a chiplet provided by the present application;
图3至图6为本申请提供的芯片系统示意图;3 to 6 are schematic diagrams of the chip system provided by the present application;
图7为本申请提供的子芯片组划分示意图;FIG. 7 is a schematic diagram of sub-chipset division provided by the present application;
图8为本申请提供的芯片系统中的数据传输处理方法的流程示意图;FIG. 8 is a schematic flowchart of a data transmission processing method in the chip system provided by the present application;
图9为本申请提供的数据包结构示意图;FIG. 9 is a schematic diagram of the data packet structure provided by the present application;
图10为本申请提供的方向坐标系的示意图;Fig. 10 is a schematic diagram of the direction coordinate system provided by the present application;
图11为本申请提供的基于子芯片构建方向坐标系的示意图;FIG. 11 is a schematic diagram of a direction coordinate system based on subchip construction provided by the present application;
图12为本申请提供的虚拟装置的结构示意图;FIG. 12 is a schematic structural diagram of a virtual device provided by the present application;
图13为本申请提供的实体装置的结构示意图。FIG. 13 is a schematic structural diagram of a physical device provided by the present application.
本发明的实施方式Embodiments of the present invention
下面结合附图对本申请的实施例进行描述。Embodiments of the present application are described below in conjunction with the accompanying drawings.
图1所示为本申请实施例提供的一种芯片系统的结构示意图。芯片系统110包括多个子芯片(图1中示例性示出了16个子芯片),该多个子芯片按照预设的拓扑连接关系连接,例如,图1中的16个子芯片可以按照矩阵的形式排列,然后,单个子芯片分别与周围的两个、三个或者四个子芯片连接。FIG. 1 is a schematic structural diagram of a chip system provided by an embodiment of the present application. The chip system 110 includes a plurality of sub-chips (16 sub-chips are exemplarily shown in FIG. 1 ), and the multiple sub-chips are connected according to a preset topology connection relationship. For example, the 16 sub-chips in FIG. 1 can be arranged in a matrix form, Then, a single chiplet is respectively connected to two, three or four surrounding chiplets.
每个子芯片都有各自的内存,图1中示例性画出了部分子芯片的内存。内存例如可以是同步动态随机存储器(synchronous dynamic random access memory,SDRAM)或者双倍速率同步动态随机存储器(Double Data Rate SDRAM,DDRSDRAM),DDRSDRAM可以简写为DDR。Each sub-chip has its own memory, and FIG. 1 schematically shows the memory of some sub-chips. The memory can be, for example, synchronous dynamic random access memory (synchronous dynamic random access memory, SDRAM) or double rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), DDRSDRAM can be abbreviated as DDR.
芯片系统100中每个子芯片都具有完整的处理能力,可以独立执行任务。当然,芯片系统100中的多个子芯片可以互相协作执行大型的处理任务。Each sub-chip in the chip system 100 has complete processing capability and can perform tasks independently. Of course, multiple sub-chips in the chip system 100 can cooperate with each other to execute large-scale processing tasks.
参见图2,图2示例性示出了上述芯片系统110中的子芯片的结构示意图。子芯片的结构可以是以片上网络(network-on-chip, NoC)的形式呈现。可以看到,子芯片可以包括处理模块、路由模块、静态存储器、内存控制器和四个端口(d0、d1、d2和d3)。Referring to FIG. 2 , FIG. 2 exemplarily shows a schematic structural diagram of the chiplets in the above-mentioned chip system 110 . The structure of the sub-chip can be network-on-chip (network-on-chip, Presented in the form of NoC). It can be seen that the chiplet may include a processing module, a routing module, a static memory, a memory controller and four ports (d0, d1, d2 and d3).
上述处理模块为子芯片中的控制单元(control unit,CU),负责子芯片中各个处理流程的管理。The above-mentioned processing module is a control unit (control unit, CU) in the sub-chip, which is responsible for the management of each processing flow in the sub-chip.
上述路由模块负责子芯片内部的数据同步、子芯片之间的数据同步、数据广播和数据传输。其中,路由模块也包括一个控制单元,该控制单元用于负责路由模块中路由流程的管理。路由模块中还包括本地的缓冲区,可以用于暂时存储待处理的数据。路由模块中还包括端口转发映射模块(forwarding-port mapper, FPM),该FPM可以是一个硬件模块或软件模块,该FPM中存储有一个端口转发映射表,该端口转发映射表中包括目的子芯片和发送端口的映射关系,可以用于将数据包映射到对应的端口进行发送。路由模块中存储有流输入表(stream in table, SIT)和流输出表(stream out table, SOT),该SIT和SOT用于子芯片之间的数据传输,后面会详细介绍,此处暂不详述。The above-mentioned routing module is responsible for data synchronization inside the sub-chip, data synchronization between sub-chips, data broadcasting and data transmission. Wherein, the routing module also includes a control unit, which is used to manage the routing process in the routing module. The routing module also includes a local buffer, which can be used to temporarily store data to be processed. The routing module also includes the port forwarding mapping module (forwarding-port mapper, FPM), the FPM can be a hardware module or a software module, a port forwarding mapping table is stored in the FPM, and the port forwarding mapping table includes the mapping relationship between the destination sub-chip and the sending port, which can be used to forward the data packet Mapped to the corresponding port for sending. The routing module stores the stream input table (stream in table, SIT) and stream output table (stream out table, SOT), the SIT and SOT are used for data transmission between sub-chips, which will be introduced in detail later, and will not be described in detail here.
上述静态存储器可以是静态随机存取存储器(static random-access memory,SRAM)等,用于存储子芯片中的数据。The above-mentioned static memory may be a static random-access memory (static random-access memory, SRAM), etc., and is used for storing data in the sub-chip.
上述内存控制器与子芯片对应的内存连接,该内存控制器例如可以是DDR控制器等。The above-mentioned memory controller is connected to the memory corresponding to the sub-chip, and the memory controller may be, for example, a DDR controller or the like.
上述四个端口(d0、d1、d2和d3)是子芯片的网络接口,可以实现上述子芯片与子芯片之间的数据传输。上述子芯片和子芯片之间的连接就是通过该四个端口来实现。可选的,子芯片也可以包括两个该端口,例如图1中的子芯片0、子芯片3、子芯片12和子芯片15。或者,可选的,子芯片也可以包括三个该端口,例如图1中的子芯片4等。The above four ports ( d0 , d1 , d2 and d3 ) are network interfaces of the sub-chips, which can realize data transmission between the above-mentioned sub-chips. The connection between the above sub-chips and sub-chips is realized through the four ports. Optionally, the sub-chip may also include two such ports, for example, sub-chip 0 , sub-chip 3 , sub-chip 12 and sub-chip 15 in FIG. 1 . Or, optionally, the sub-chip may also include three such ports, such as the sub-chip 4 in FIG. 1 .
本申请实施例提供的芯片系统不限于上述图1所示的结构,还可以是其它的结构,例如参见图3。图3示例性示出了芯片系统120,该芯片系统120同样包括多个子芯片(图2中示例性示出了8个子芯片),该芯片系统120的多个子芯片以长方体的形式排列连接,这种连接方式可以使得子芯片之间的数据传输路径尽可能地缩短。芯片系统120中的子芯片的结构可以参见上述图2中对应的描述,此处不再赘述。The chip system provided by the embodiment of the present application is not limited to the structure shown in FIG. 1 , but may also have other structures, for example, refer to FIG. 3 . FIG. 3 exemplarily shows a chip system 120, which also includes a plurality of sub-chips (8 sub-chips are exemplarily shown in FIG. 2 ), and a plurality of sub-chips of the chip system 120 are arranged and connected in the form of a cuboid, which This connection method can shorten the data transmission path between sub-chips as much as possible. For the structure of the chiplets in the system-on-chip 120, reference may be made to the corresponding description in FIG. 2 above, and details will not be repeated here.
一种可能的实施方式中,对于超大型的任务,需要更多的子芯片一起来处理以提高任务处理效率,那么,可以通过将上述芯片系统110或芯片系统120作为一个芯片的子系统,由多个该子系统组成一个较大的芯片系统,示例性地,可以参见图4所示的芯片系统130。In a possible implementation manner, for ultra-large tasks, more sub-chips are required to process together to improve task processing efficiency, then, the above-mentioned chip system 110 or chip system 120 can be used as a chip subsystem, by A plurality of such subsystems form a larger system-on-a-chip, for example, refer to the system-on-a-chip 130 shown in FIG. 4 .
该芯片系统130可以包括多个上述芯片的子系统,图4中以包括8个子系统为例示出,该多个子系统的每个子系统可以为上述芯片系统110或芯片系统120。可以将每个子系统看成是一个整体,那么,该多个子系统可以通过预设定的拓扑连接关系进行连接,例如,可以以长方体的形式排列连接,如图4所示。为了便于理解芯片系统130中各个子系统的连接方式,示例性地,以子系统为上述芯片系统110为例示出该芯片系统130中各个子系统的连接结构示意图,可以参见图5。The system-on-a-chip 130 may include multiple subsystems of the above-mentioned chips. FIG. 4 shows an example including 8 subsystems. Each subsystem of the multiple subsystems may be the above-mentioned system-on-a-chip 110 or system-on-a-chip 120 . Each subsystem can be regarded as a whole, then the multiple subsystems can be connected through a preset topological connection relationship, for example, the connection can be arranged in the form of a cuboid, as shown in FIG. 4 . In order to facilitate understanding of the connection manner of each subsystem in the chip system 130 , as an example, the above-mentioned chip system 110 is used as an example to illustrate the connection structure diagram of each subsystem in the chip system 130 , as shown in FIG. 5 .
在图5中可以看到,芯片系统130中包括8个子系统,该8个子系统中每个子系统包括16个子芯片,该16个子芯片可以以矩阵的形式排列连接。相邻两个子系统之间,可以通过一个子系统中的任意一个子芯片与另一个子系统中的任意一个子芯片连接来实现该两个子系统的连接。图5中示例性地以每个子系统中排列在矩阵拐角的子芯片作为与另一个子系统连接的子芯片。例如,子系统0中,子系统0与子系统1之间是通过子系统0中的子芯片3与子系统1中的子芯片0来建立连接。子系统0与子系统2之间是通过子系统0中的子芯片12与子系统2中的子芯片0来建立连接。子系统0与子系统4之间是通过子系统0中的子芯片0与子系统4中的子芯片0来建立连接。As can be seen in FIG. 5 , the chip system 130 includes 8 subsystems, each of the 8 subsystems includes 16 sub-chips, and the 16 sub-chips can be arranged and connected in a matrix. Between two adjacent subsystems, the connection between the two subsystems can be realized by connecting any sub-chip in one subsystem to any sub-chip in the other subsystem. In FIG. 5 , the chiplets arranged at the corners of the matrix in each subsystem are exemplarily used as the chiplets connected to another subsystem. For example, in subsystem 0, the connection between subsystem 0 and subsystem 1 is established through chiplet 3 in subsystem 0 and chiplet 0 in subsystem 1. The connection between the subsystem 0 and the subsystem 2 is established through the chiplet 12 in the subsystem 0 and the chiplet 0 in the subsystem 2 . The connection between subsystem 0 and subsystem 4 is established through chiplet 0 in subsystem 0 and chiplet 0 in subsystem 4 .
一种可能的实施方式中,上述子系统之间还可以包括控制总线,例如可以参见图6所示的芯片系统140。该芯片系统140中所示的D型总线即为控制总线。芯片系统140中包括一个中心控制器(该中心控制器可以是芯片系统140中的一个子芯片或者控制模块等),该中心控制器可以管理该芯片系统140中的任务处理流程。该控制总线与中心控制器连接,以用于各个子系统接收中心控制器的控制指令。具体实现中,每个子系统可以由一个子芯片连接该控制线,该子芯片接收到控制指令后可以代为转发到同子系统内对应的子芯片。或者,每个子系统中的每个子芯片都连接该控制线以用于直接接收控制指令。本申请对控制器的具体连接不做限制。In a possible implementation manner, the foregoing subsystems may further include a control bus, for example, refer to the chip system 140 shown in FIG. 6 . The D-type bus shown in the chip system 140 is the control bus. The system-on-a-chip 140 includes a central controller (the central controller may be a sub-chip or a control module in the system-on-a-chip 140 ), and the central controller may manage the task processing flow in the system-on-a-chip 140 . The control bus is connected with the central controller for each subsystem to receive control instructions from the central controller. In a specific implementation, each subsystem may be connected to the control line by a sub-chip, and after receiving the control command, the sub-chip may forward it to the corresponding sub-chip in the same subsystem. Alternatively, each chiplet in each subsystem is connected to the control line for directly receiving control commands. This application does not limit the specific connection of the controller.
除了上述芯片系统140之外,本申请实施例提供的芯片系统(例如上述芯片系统110、芯片系统120和芯片系统130等)也包括一个中心控制器,用于管理整个芯片系统的任务处理流程。同理,该中心控制器可以是芯片系统中的一个子芯片或者控制模块等。中心控制器可以获取芯片系统中所有子芯片的负载情况和资源使用情况等,从而可以基于这些情况来为各个子芯片分配任务。示例性地,可以通过中心控制器中的任务调度器(scheduler)来基于各个子芯片的负载情况和资源使用情况等信息为这些子芯片分配任务。In addition to the above-mentioned system-on-a-chip 140, the system-on-a-chip provided in the embodiment of the present application (such as the system-on-a-chip 110, system-on-a-chip 120, and system-on-a-chip 130) also includes a central controller for managing the task processing flow of the entire system-on-a-chip. Similarly, the central controller may be a sub-chip or a control module in the chip system. The central controller can obtain the load conditions and resource usage conditions of all sub-chips in the chip system, so as to assign tasks to each sub-chip based on these conditions. Exemplarily, a task scheduler (scheduler) in the central controller can be used to assign tasks to these sub-chips based on information such as load conditions and resource usage conditions of each sub-chip.
上述中心控制器还可以负责芯片系统中的数据调度。具体的,中心控制器通过控制总线获取各个子芯片的数据传输情况,通过对这些数据传输情况的分析可以获知各个传输路径的拥塞情况和/或获知各个子芯片的端口拥塞情况,从而可以基于这些情况制定数据的传输策略,并以调度信息的形式下发给各个子芯片。各个子芯片基于控制器下发的调度信息来对应发送数据,从而降低了拥塞的概率,提高了数据传输效率。The above-mentioned central controller can also be responsible for data scheduling in the chip system. Specifically, the central controller obtains the data transmission status of each sub-chip through the control bus, and can know the congestion status of each transmission path and/or the port congestion status of each sub-chip by analyzing the data transmission status, so that based on these According to the situation, the data transmission strategy is formulated and sent to each sub-chip in the form of scheduling information. Each sub-chip sends corresponding data based on the scheduling information issued by the controller, thereby reducing the probability of congestion and improving data transmission efficiency.
在本申请实施例提供的芯片系统中,每个子芯片都具备独立处理数据的能力。但是在数据任务较大的情况下,单个子芯片的处理效率较低。为了提高任务的处理效率,可以将上述芯片系统中包括的多个子芯片划分成多个子芯片组,每个子芯片组至少包括一个子芯片。这样可以以子芯片组为处理单位来处理数据任务,从而提高处理效率。为了便于理解芯片组,可以参见图7。In the chip system provided in the embodiment of the present application, each sub-chip is capable of processing data independently. But in the case of large data tasks, the processing efficiency of a single sub-chip is low. In order to improve task processing efficiency, the multiple sub-chips included in the above chip system may be divided into multiple sub-chip groups, and each sub-chip group includes at least one sub-chip. In this way, data tasks can be processed with the sub-chipset as a processing unit, thereby improving processing efficiency. In order to facilitate the understanding of the chipset, please refer to Figure 7.
图7以上述芯片系统110为例,将芯片系统中的16个子芯片划分成9个子芯片组,具体参见图7所示的划分情况,每个子芯片组至少包括一个子芯片。另外,同一个子芯片组的多个子芯片可以是相邻的子芯片,例如子芯片组3、子芯片组4、子芯片组7和子芯片组8。或者,同一个子芯片组的多个子芯片可以是不相邻的子芯片,例如子芯片组2,该子芯片组2由不相邻的子芯片1和子芯片12组成。FIG. 7 takes the above chip system 110 as an example, and divides 16 sub-chips in the chip system into 9 sub-chip groups. Refer to the division situation shown in FIG. 7 for details, and each sub-chip group includes at least one sub-chip. In addition, multiple subchips of the same subchipset may be adjacent subchips, such as subchipset 3 , subchipset 4 , subchipset 7 , and subchipset 8 . Alternatively, the sub-chips of the same sub-chipset may be non-adjacent sub-chips, for example, the sub-chipset 2 is composed of non-adjacent sub-chips 1 and 12 .
本申请实施例提供的芯片系统可以采用数据并行、模型并行或者模型并行加数据并行的方式来实现数据任务的处理。其中:The chip system provided in the embodiment of the present application may implement data task processing in a manner of data parallelism, model parallelism, or model parallelism plus data parallelism. in:
数据并行是指把待处理的数据划分成若干数据块,将该若干数据块分别分配到不同的子芯片组上,每一个芯片组运行同样的处理程序对所分派的数据进行处理。例如,假设待处理数据被划分为3个数据块,现有3个子芯片组可以运行同样的处理程序来处理数据块,那么,可以将该3个数据块中的第1数据块发送给该3个子芯片组中的第1个子芯片组处理,将该3个数据块中的第2数据块发送给该3个子芯片组中的第2个子芯片组处理,将该3个数据块中的第3数据块发送给该3个子芯片组中的第3个子芯片组处理。Data parallelism refers to dividing the data to be processed into several data blocks, and assigning the several data blocks to different sub-chipsets, and each chipset runs the same processing program to process the allocated data. For example, assuming that the data to be processed is divided into 3 data blocks, and the existing 3 sub-chipsets can run the same processing program to process the data blocks, then the first data block of the 3 data blocks can be sent to the 3 The first sub-chipset in the three sub-chipsets processes, the second data block in the three data blocks is sent to the second sub-chipset in the three sub-chipsets for processing, and the third data block in the three data blocks is processed. The data block is sent to the third sub-chipset among the three sub-chipsets for processing.
模型并行是指多个子芯片组共同完成一个数据处理任务,该多个子芯片组中每一个子芯片组只执行整个数据处理任务的部分步骤(该部分步骤可以是一个或多个处理步骤)。例如,假设一个数据处理任务需要经过3个步骤才能完成处理,那么,可以配置两个子芯片组来共同完成该任务。其中,第1个子芯片组完成该3个步骤中的前2步骤的处理,第2子芯片组从第1子芯片组获取处理后的数据以完成第3步骤的处理。或者,可以配置三个子芯片组来共同完成该任务。其中,第1子芯片组完成该3个步骤中的第1步骤的处理,第2子芯片组从第1子芯片组获取处理后的数据完成第2步骤的处理,第3子芯片组从第2子芯片组获取处理后的数据完成第3步骤的处理。即每个芯片组完成的步骤可以是一个或多个,具体可以根据芯片组的负载情况和资源使用情况来确定。Model parallelism means that multiple sub-chipsets jointly complete a data processing task, and each sub-chipset in the multiple sub-chipsets only executes a part of the entire data processing task (the part of the steps may be one or more processing steps). For example, assuming that a data processing task requires three steps to be processed, two sub-chipsets can be configured to jointly complete the task. Wherein, the first sub-chipset completes the processing of the first two steps in the three steps, and the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the third step. Alternatively, three sub-chipsets can be configured to work together to accomplish this task. Wherein, the first sub-chipset completes the processing of the first step in the three steps, the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the second step, and the third sub-chipset completes the processing of the second step from the first sub-chipset. 2. The sub-chipset acquires the processed data to complete the processing in the third step. That is, each chipset may complete one or more steps, which may be specifically determined according to the load and resource usage of the chipset.
模型并行加数据并行的方式是结合了上述数据并行和模型并行两种方式来处理数据。例如,一个数据处理任务需要经过3个步骤才能完成处理,那么,可以配置三个子芯片组来共同完成该任务。其中,第1子芯片组完成该3个步骤中的第1步骤的处理,第2子芯片组从第1子芯片组获取处理后的数据完成第2步骤的处理,第3子芯片组从第2子芯片组获取处理后的数据完成第3步骤的处理。但是,由于该第1步骤的处理比较复杂,需要花费较多的时间才能完成该步骤的处理,为了提高处理效率,可以再配置一个或多个子芯片组来共同执行该第1步骤的处理任务。例如,可以再配置一个第4子芯片组来和前述第1子芯片组一起来执行该第1步骤的处理任务。具体的,可以将用于进行第1步骤处理的数据分成两份,一份发送给该第1子芯片组处理,另一份发送给该第4子芯片组处理。然后,该第1子芯片组和第4子芯片组处理完之后的数据一起发送给第2子芯片组进行第2步骤的处理。The method of model parallelism plus data parallelism combines the above two methods of data parallelism and model parallelism to process data. For example, if a data processing task needs to go through three steps to complete the processing, then three sub-chipsets can be configured to jointly complete the task. Wherein, the first sub-chipset completes the processing of the first step in the three steps, the second sub-chipset obtains the processed data from the first sub-chipset to complete the processing of the second step, and the third sub-chipset completes the processing of the second step from the first sub-chipset. 2. The sub-chipset acquires the processed data to complete the processing in the third step. However, since the processing of the first step is relatively complicated, it takes more time to complete the processing of this step. In order to improve the processing efficiency, one or more sub-chipsets can be configured to jointly execute the processing task of the first step. For example, a fourth sub-chipset may be further configured to perform the processing task of the first step together with the aforementioned first sub-chipset. Specifically, the data for processing in the first step may be divided into two parts, one part is sent to the first sub-chipset for processing, and the other part is sent to the fourth sub-chipset for processing. Then, the processed data of the first sub-chipset and the fourth sub-chipset are sent to the second sub-chipset for processing in the second step.
需要说明的是,上述模型并行加数据并行的方式中,可以是每个处理步骤都采用数据并行的处理方式来处理,或者可以是部分处理步骤采用数据并行的处理方式来处理,具体可以根据具体实现确定,本申请对此不做限制。It should be noted that, in the above method of model parallelism plus data parallelism, each processing step can be processed by data parallel processing, or some processing steps can be processed by data parallel processing, which can be based on specific The implementation is determined, and this application does not limit it.
在具体实现中,可以通过芯片系统的中心控制器将数据处理任务分配到各个子芯片组。采用模型并行或者模型并行加数据并行的方式实现数据任务的处理,需要在子芯片之间进行数据的传输。数据传输会产生时延导致处理效率降低。为了实现芯片系统中子芯片之间的高效数据传输,提高芯片系统的处理性能,本申请实施例提供了一种芯片系统中的数据传输处理方法。In a specific implementation, the central controller of the chip system may distribute data processing tasks to each sub-chipset. Using model parallelism or model parallelism plus data parallelism to realize data task processing requires data transmission between sub-chips. Data transmission will generate time delay and reduce processing efficiency. In order to realize efficient data transmission between sub-chips in a chip system and improve the processing performance of the chip system, an embodiment of the present application provides a data transmission processing method in the chip system.
参见图8,本申请实施例提供的数据传输处理方法包括但不限于如下步骤:Referring to Figure 8, the data transmission processing method provided by the embodiment of the present application includes but is not limited to the following steps:
S801、第一子芯片接收第一数据包;其中,该第一数据包包括目的子芯片的标识;该第一子芯片和该目的子芯片为芯片系统包括的子芯片,该芯片系统包括的多个子芯片以预设的拓扑结构连接。S801. The first sub-chip receives a first data packet; wherein, the first data packet includes the identification of the target sub-chip; the first sub-chip and the target sub-chip are sub-chips included in the chip system, and the chip system includes multiple The chiplets are connected in a preset topology.
该芯片系统可以是前述介绍的芯片系统110、芯片系统120、芯片系统130或者芯片系统140等。该第一子芯片可以是这些芯片系统中任意一个芯片系统中的任意一个子芯片。为了便于后面的描述,成该第一子芯片所在的芯片系统为第一芯片系统。该第一子芯片从该第一芯片系统中的另一个子芯片中接收到上述第一数据包。The chip system may be the chip system 110 , the chip system 120 , the chip system 130 , or the chip system 140 described above. The first sub-chip may be any sub-chip in any one of these chip systems. For the convenience of the following description, the system-on-a-chip where the first sub-chip is located is the first system-on-a-chip. The first sub-chip receives the first data packet from another sub-chip in the first system-on-a-chip.
在具体实现中,该第一数据包中可以包括包(packet)的类型(type)、任务的标识、数据流的标识、目的子芯片的标识、包编号和数据等信息中的一项或多项。其中:In a specific implementation, the first data packet may include one or more of information such as packet type (type), task identifier, data stream identifier, destination sub-chip identifier, packet number, and data. item. in:
包的类型用于指示一个包的具体类型,包的类型可以包括数据包(DATA)、头包(Header)或者解除等待的包(unblock, UB)等等。上述第一数据包的包类型为DATA。The packet type is used to indicate a specific type of a packet, and the packet type may include a data packet (DATA), a header packet (Header), or a unblocked packet (unblock, UB) and the like. The packet type of the first data packet is DATA.
任务的标识指的是包对应的数据处理任务的标识。在芯片系统中可以同时处理多个数据处理任务,每个数据处理任务都有其对应的标识。该任务的标识例如可以是1或者其它的标识符号,本申请对此不做限制。上述第一数据包为某个数据处理任务中的数据在子芯片之间传输的载体,因此,该第一数据包中的任务的标识即为该某个数据处理任务的标识。The identifier of the task refers to the identifier of the data processing task corresponding to the package. Multiple data processing tasks can be processed simultaneously in the chip system, and each data processing task has its corresponding identifier. The identifier of the task may be, for example, 1 or other identifiers, which is not limited in the present application. The above-mentioned first data packet is a carrier for data in a certain data processing task to be transmitted between chiplets, therefore, the identifier of the task in the first data packet is the identifier of the certain data processing task.
数据流的标识:关于数据流,一个子芯片向另一个子芯片发送数据,这些数据被封装成多个数据包,这些数据包按顺序编号并发送,这些连续发送的数据包形成数据流。一种可能的实现中,每个数据包可以携带1kb的数据,若传输的数据总大小为64kb,那么,可以将这些数据拆分封装成64个数据包进行发送,该64个数据包则可以形成一个数据流。每个数据流都配置有一个标识,该标识即为数据流的标识。上述第一数据包中包括的数据流的标识为该第一数据包所在的数据流的标识。Identification of data flow: Regarding data flow, one sub-chip sends data to another sub-chip, and these data are packaged into multiple data packets, and these data packets are numbered and sent in sequence, and these continuously sent data packets form a data flow. In a possible implementation, each data packet can carry 1kb of data. If the total size of the transmitted data is 64kb, then these data can be split and encapsulated into 64 data packets for transmission, and the 64 data packets can be form a data stream. Each data flow is configured with an identifier, which is the identifier of the data flow. The identifier of the data stream included in the first data packet is the identifier of the data stream where the first data packet is located.
目的子芯片的标识用于指示包所去往的目的地。上述第一数据包中的目的子芯片的标识可以是一个或多个目的子芯片的标识。若该第一数据包中的数据对应的目的子芯片为一个,则第一数据包中的目的子芯片的标识为该一个目的子芯片的标识。若第一数据包中的数据对应的目的子芯片为多个,则第一数据包中的目的子芯片的标识为该多个目的子芯片的标识。例如,若数据包的目的子芯片有两个,分别为子芯片0和子芯片9,那么,该数据包包括该子芯片0和子芯片9的标识。The identification of the destination chiplet is used to indicate the destination to which the packet is destined. The identifiers of the target chiplets in the first data packet may be identifiers of one or more target chiplets. If the data in the first data packet corresponds to one target sub-chip, the identifier of the target sub-chip in the first data packet is the identifier of the one destination sub-chip. If there are multiple target chiplets corresponding to the data in the first data packet, the identifiers of the target chiplets in the first data packet are the identifiers of the multiple target chiplets. For example, if there are two target chiplets of the data packet, which are chiplet 0 and chiplet 9 respectively, then the data packet includes the identifiers of chiplet 0 and chiplet 9 .
包编号指的是一个包在其所属的数据流中的顺序编号。The packet number refers to the sequential numbering of a packet within the data stream to which it belongs.
数据即为包中的负载(layout),是实际传输的内容。The data is the payload (layout) in the packet, which is the actual transmission content.
一种可能的实施方式中,上述第一数据包中还可以携带边带信息,这些边带信息可以包括任务的标识、数据流的标识或目的子芯片的标识中的一项或多项信息。这些边带信息可以不封装在第一数据包内,而是随着第一数据包一起发送。在具体实现中,第一数据包内部包括的信息只有子芯片的路由模块才能活着,子芯片中的端口等其它模块并不感知。因此,为了便于快速转发第一数据包,可以配置第一数据包携带上述边带信息。为了便于理解第一数据包的格式和边带信息的格式,可以示例性地参见图9。图9所示的第一数据包的格式和对应的边带信息仅为示例,在具体实现中第一数据包中还可以包括其它的信息,边带信息也可以包括更多的信息,本申请对此不做限制。In a possible implementation manner, the above-mentioned first data packet may also carry sideband information, and the sideband information may include one or more information of a task identifier, a data flow identifier, or a destination chiplet identifier. The sideband information may not be encapsulated in the first data packet, but sent together with the first data packet. In a specific implementation, the information included in the first data packet can only be kept alive by the routing module of the sub-chip, and other modules such as ports in the sub-chip are not aware of it. Therefore, in order to facilitate fast forwarding of the first data packet, the first data packet may be configured to carry the foregoing sideband information. In order to facilitate understanding of the format of the first data packet and the format of the side information, refer to FIG. 9 by way of example. The format of the first data packet and the corresponding sideband information shown in FIG. 9 are only examples. In a specific implementation, the first data packet may also include other information, and the sideband information may also include more information. The present application There is no restriction on this.
S802、上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据,该较小带宽消耗原则为以较小的传输带宽将该数据送达该目的子芯片的原则;该方向坐标系以该第一子芯片为中心构建。S802. The above-mentioned first sub-chip sends the data in the above-mentioned first data packet to the above-mentioned destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the smaller bandwidth consumption principle is to deliver the data with a smaller transmission bandwidth The principle of the target sub-chip: the direction coordinate system is constructed with the first sub-chip as the center.
首先介绍一下以第一子芯片为中心构建的方向坐标系。图10示例性示出了以第一子芯片为中心构建的方向坐标系的示意图。可以看到,该方向坐标系包括四个方向轴:第一方向轴、第二方向轴、第三方向轴和第四方向轴。该四个方向轴均是以第一子芯片为中心向外发散。其中,第一方向轴和第二方向轴共线且方向相反;第三方向轴和第四方向轴共线且方向相反。该方向坐标系还包括四个区域:第一区域、第二区域、第三区域和第四区域。其中,该第一区域以该第一方向轴和该第三方向轴为边界;该第二区域以该第二方向轴和该第三方向轴为边界;该第三区域以该第二方向轴和该第四方向轴为边界,该第四区域以该第一方向轴和该第四方向轴为边界。Firstly, the direction coordinate system constructed with the first sub-chip as the center is introduced. FIG. 10 exemplarily shows a schematic diagram of a direction coordinate system centered on the first chiplet. It can be seen that the directional coordinate system includes four directional axes: a first directional axis, a second directional axis, a third directional axis and a fourth directional axis. The four direction axes all diverge outward from the center of the first sub-chip. Wherein, the first direction axis and the second direction axis are collinear and opposite in direction; the third direction axis and the fourth direction axis are collinear and opposite in direction. The direction coordinate system also includes four regions: a first region, a second region, a third region and a fourth region. Wherein, the first area is bounded by the first direction axis and the third direction axis; the second area is bounded by the second direction axis and the third direction axis; the third area is bounded by the second direction axis and the fourth direction axis as boundaries, and the fourth region is bounded by the first direction axis and the fourth direction axis.
在芯片系统中,该第一子芯片所在的行位于该第一方向轴和该第二方向轴中的至少一个方向轴上,该第一子芯片所在的列位于该第三方向轴和该第四方向轴中的至少一个方向轴上。示例性地可以参见图11。假设芯片系统中的子芯片5为第一子芯片,那么,以该子芯片5为中心建立方向坐标系。在该方向坐标系中,该子芯片5所述的第二行位于第一方向轴和第二方向轴上,该子芯片5所述的第二列位于第三方向轴和第四方向轴上。然后,子芯片2和子芯片3位于该方向坐标系的第一区域。子芯片0位于该方向坐标系的第二区域。子芯片8和子芯片12位于该方向坐标系的第三区域。子芯片10、子芯片11、子芯片14和子芯片15位于该方向坐标系的第四区域。In the chip system, the row of the first sub-chip is located on at least one of the first direction axis and the second direction axis, and the column of the first sub-chip is located on the third direction axis and the second direction axis. on at least one of the four directional axes. An example can be seen in FIG. 11 . Assuming that the sub-chip 5 in the chip system is the first sub-chip, then a direction coordinate system is established with the sub-chip 5 as the center. In the direction coordinate system, the second row of the chiplet 5 is located on the first direction axis and the second direction axis, and the second column of the chiplet 5 is located on the third direction axis and the fourth direction axis . Then, the chiplet 2 and the chiplet 3 are located in the first region of the directional coordinate system. Chiplet 0 is located in the second area of the orientation coordinate system. Chiplet 8 and chiplet 12 are located in the third region of the directional coordinate system. The chiplet 10 , the chiplet 11 , the chiplet 14 and the chiplet 15 are located in the fourth area of the directional coordinate system.
一种可能的实施方式中,若上述图11的芯片系统中子芯片0第一子芯片,那么,以该子芯片0为中心建立方向坐标系。在该方向坐标系中,该子芯片0所述的第一行位于第一方向轴上,该子芯片0所述的第一列位于第四方向轴上。然后,除了子芯片0所在的行和所在的列的子芯片,其余子芯片均位于该方向坐标系的第四区域。In a possible implementation manner, if the sub-chip 0 is the first sub-chip in the above-mentioned chip system in FIG. 11 , then a direction coordinate system is established centering on the sub-chip 0 . In the directional coordinate system, the first row of the chiplet 0 is located on the first directional axis, and the first column of the chiplet 0 is located on the fourth directional axis. Then, except for the sub-chips in the row and column where the sub-chip 0 is located, all other sub-chips are located in the fourth area of the coordinate system in this direction.
一种可能的实施方式中,上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据包括:在上述第一数据包中包括的目的子芯片处于目标方向轴上的情况下,上述第一子芯片沿着该目标方向轴的方向发送该第一数据包的数据;该目标方向轴为该第一方向轴、该第二方向轴、该第三方向轴或该第四方向轴。为了便于理解,结合上述图11为例说明。In a possible implementation manner, the sending of the data in the first data packet to the destination chiplet by the first sub-chip based on the direction coordinate system and the principle of less bandwidth consumption includes: the destination sub-chip included in the first data packet When the chip is on the target direction axis, the above-mentioned first sub-chip sends the data of the first data packet along the direction of the target direction axis; the target direction axis is the first direction axis, the second direction axis, the The third directional axis or the fourth directional axis. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
在图11中,假设子芯片5为上述第一子芯片,其接收到一个数据包,该数据包中的目的子芯片的标识指示目的子芯片为子芯片7。若该数据包中只包括一个目的子芯片的标识,那么,由于该子芯片7位于第一方向轴上,因此,子芯片5沿着该第一方向轴的方向发送该数据包。即子芯片5先将数据包发送给子芯片6,再由子芯片6转发给子芯片7。若该数据包中包括多个目的子芯片的标识,作为其中一个目的子芯片的子芯片7位于第一方向轴上。因此,子芯片5复制一份该数据包中的数据新生成一个数据包,并将该新的数据包沿着该第一方向轴的方向发送。即子芯片5先将该新的数据包发送给子芯片6,再由子芯片6转发给子芯片7。该新生成的数据包包括该子芯片7的标识。In FIG. 11 , it is assumed that the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 7 . If the data packet only includes the identifier of one destination chiplet, then, since the chiplet 7 is located on the first direction axis, the chiplet 5 sends the data packet along the direction of the first direction axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 6 , and then the sub-chip 6 forwards the data packet to the sub-chip 7 . If the data packet includes identifiers of multiple target chiplets, the chiplet 7 as one of the target chiplets is located on the first direction axis. Therefore, the chiplet 5 copies the data in the data packet to generate a new data packet, and sends the new data packet along the direction of the first direction axis. That is, the sub-chip 5 first sends the new data packet to the sub-chip 6 , and then the sub-chip 6 forwards it to the sub-chip 7 . The newly generated data packet includes the identification of the chiplet 7 .
一种可能的实施方式中,上述第一数据包中包括第一目的子芯片和第二目的子芯片的标识。上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据包括:以上述第一子芯片为中心建立的方向坐标系中,在该第一目的子芯片和第二目的子芯片分别处于该坐标系的第一区域、第二区域、第三区域和第四区域中相邻的两个区域的情况下,该第一子芯片沿着共同方向轴的方向发送第二数据包。该第二数据包包括该数据、第一目的子芯片和第二目的子芯片的标识。该共同方向轴为该相邻的两个区域共同边界的方向轴。为了便于理解,结合上述图11为例说明。In a possible implementation manner, the first data packet includes identifiers of the first destination chiplet and the second destination chiplet. The first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first In the case where the target chiplet and the second target chiplet are respectively located in two adjacent areas of the first area, the second area, the third area and the fourth area of the coordinate system, the first chiplet is along the common direction A second packet is sent in the direction of the axis. The second data packet includes the data, identifiers of the first destination chiplet and the second destination chiplet. The common direction axis is the direction axis of the common boundary of the two adjacent regions. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
在图11中,假设子芯片5为上述第一子芯片,其接收到一个数据包,该数据包中的目的子芯片的标识指示目的子芯片为子芯片8和子芯片14。子芯片8位于第三区域,子芯片14位于第四区域,该两个区域为相邻区域,共同的边界为第四方向轴。因此,子芯片5沿着该第四方向轴的方向发送该数据包。即子芯片5先将数据包发送给子芯片9,再由子芯片9进行进一步的转发。具体的,可以将子芯片9也看成是上述第一子芯片,以该子芯片9为中心建立方向坐标系,然后再基于上述较小带宽消耗原则转发数据。In FIG. 11 , it is assumed that the chiplet 5 is the above-mentioned first chiplet, and it receives a data packet, and the identification of the target chiplet in the data packet indicates that the target chiplets are the chiplet 8 and the chiplet 14 . The sub-chip 8 is located in the third area, and the sub-chip 14 is located in the fourth area, these two areas are adjacent areas, and the common boundary is the fourth direction axis. Therefore, the chiplet 5 sends the data packet along the direction of the fourth directional axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 9 , and then the sub-chip 9 further forwards it. Specifically, the sub-chip 9 can also be regarded as the above-mentioned first sub-chip, and a direction coordinate system is established with the sub-chip 9 as the center, and then data is forwarded based on the above-mentioned principle of smaller bandwidth consumption.
一种可能的实施方式中,上述第一数据包中包括第一目的子芯片和第二目的子芯片的标识。上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据包括:以上述第一子芯片为中心建立的方向坐标系中,在该第一目的子芯片处于该坐标系的第一区域,该第二目的子芯片处于该第三区域的情况下,该第一子芯片沿着该第一区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着该第三区域两条边界方向轴中的一个方向轴的方向发送第四数据包。该第三数据包包括该数据和该第一目的子芯片的标识。该第四数据包包括该数据和该第二目的子芯片的标识。为了便于理解,结合上述图11为例说明。In a possible implementation manner, the first data packet includes identifiers of the first destination chiplet and the second destination chiplet. The first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first In the case where the target chiplet is in the first area of the coordinate system and the second target chiplet is in the third area, the first chiplet is along one of the direction axes of the two boundaries of the first area The third data packet is sent in a direction, and the fourth data packet is sent along the direction of one of the two boundary direction axes of the third area. The third data packet includes the data and the identifier of the first destination chiplet. The fourth data packet includes the data and the identifier of the second destination chiplet. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
在图11中,假设子芯片5为上述第一子芯片,其接收到一个数据包,该数据包中的目的子芯片的标识指示目的子芯片为子芯片2和子芯片12。子芯片2位于第一区域,子芯片12位于第三区域。那么,子芯片5可以基于接收的数据包中的数据重新生成两个数据包:数据包A和数据包B。数据包A中包括数据和子芯片2的标识,数据包B中包括数据和子芯片12的标识。然后,沿着第一方向轴或第三方向轴的方向发送该数据包A。例如,沿着第一方向轴的方向发送数据包A,即先将数据包A发送给子芯片6,再由子芯片6将数据包A转发给子芯片2。另外子芯片5沿着第二方向轴或第四方向轴的方向发送该数据包B。例如,沿着第四方向轴的方向发送数据包B,即先将数据包B发送给子芯片9,再由子芯片9继续进一步转发。In FIG. 11 , it is assumed that the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identification of the destination sub-chip in the data packet indicates that the destination sub-chips are the sub-chip 2 and the sub-chip 12 . The chiplet 2 is located in the first area, and the chiplet 12 is located in the third area. Then, the chiplet 5 can regenerate two data packets: data packet A and data packet B based on the data in the received data packet. The data package A includes data and the identification of the sub-chip 2 , and the data package B includes the data and the identification of the sub-chip 12 . Then, the data packet A is sent along the direction of the first direction axis or the third direction axis. For example, the data packet A is sent along the direction of the first direction axis, that is, the data packet A is first sent to the sub-chip 6 , and then the sub-chip 6 forwards the data packet A to the sub-chip 2 . In addition, the chiplet 5 sends the data packet B along the direction of the second direction axis or the fourth direction axis. For example, the data packet B is sent along the direction of the fourth direction axis, that is, the data packet B is first sent to the sub-chip 9 , and then the sub-chip 9 continues to forward it further.
一种可能的实施方式中,上述第一数据包中包括第一目的子芯片和第二目的子芯片的标识。上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据包括:以上述第一子芯片为中心建立的方向坐标系中,在该第一目的子芯片处于该坐标系的第二区域,该第二目的子芯片处于该第四区域的情况下,该第一子芯片沿着该第二区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着该第四区域两条边界方向轴中的一个方向轴的方向发送第四数据包。该第三数据包包括该数据和该第一目的子芯片的标识。该第四数据包包括该数据和该第二目的子芯片的标识。为了便于理解,结合上述图11为例说明。In a possible implementation manner, the first data packet includes identifiers of the first destination chiplet and the second destination chiplet. The first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first When the target chiplet is in the second area of the coordinate system, and the second target chiplet is in the fourth area, the first chiplet is along one of the direction axes of the two boundaries of the second area. The third data packet is sent in the direction, and the fourth data packet is sent along the direction of one of the two boundary direction axes of the fourth area. The third data packet includes the data and the identifier of the first destination chiplet. The fourth data packet includes the data and the identifier of the second destination chiplet. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
在图11中,假设子芯片5为上述第一子芯片,其接收到一个数据包,该数据包中的目的子芯片的标识指示目的子芯片为子芯片0和子芯片10。子芯片0位于第二区域,子芯片10位于第四区域。那么,子芯片5可以基于接收的数据包中的数据重新生成两个数据包:数据包C和数据包D。数据包C中包括数据和子芯片0的标识,数据包D中包括数据和子芯片10的标识。然后,沿着第二方向轴或第三方向轴的方向发送该数据包C。例如,沿着第二方向轴的方向发送数据包C,即先将数据包C发送给子芯片4,再由子芯片4将数据包C转发给子芯片0。另外子芯片5沿着第一方向轴或第四方向轴的方向发送该数据包D。例如,沿着第一方向轴的方向发送数据包D,即先将数据包D发送给子芯片6,再由子芯片6转发给子芯片10。In FIG. 11 , it is assumed that chiplet 5 is the above-mentioned first chiplet, and it receives a data packet, and the identification of the target chiplet in the data packet indicates that the target chiplets are chiplet 0 and chiplet 10 . Chiplet 0 is located in the second area, and chiplet 10 is located in the fourth area. Then, the chiplet 5 can regenerate two data packets: data packet C and data packet D based on the data in the received data packet. The data package C includes data and the identification of the sub-chip 0 , and the data package D includes the data and the identification of the sub-chip 10 . Then, the data packet C is sent along the direction of the second direction axis or the third direction axis. For example, the data packet C is sent along the direction of the second direction axis, that is, the data packet C is first sent to the sub-chip 4 , and then the sub-chip 4 forwards the data packet C to the sub-chip 0 . In addition, the chiplet 5 sends the data packet D along the direction of the first direction axis or the fourth direction axis. For example, the data packet D is sent along the direction of the first direction axis, that is, the data packet D is first sent to the sub-chip 6 , and then the sub-chip 6 forwards it to the sub-chip 10 .
一种可能的实施方式中,上述第一数据包中包括第一目的子芯片和第二目的子芯片的标识。上述第一子芯片基于方向坐标系以较小带宽消耗原则向上述目的子芯片发送上述第一数据包中的数据包括:以上述第一子芯片为中心建立的方向坐标系中,在该第一目的子芯片处于目标区域中,该第二目的子芯片处于该目标区域边界的方向轴上的情况下,该第一子芯片沿着该目标区域边界方向轴的方向发送第五数据包。该第五数据包包括该数据和该第一目的子芯片和第二目的子芯片的标识。该目标区域为第一区域、第二区域、第三区域或第四区域。为了便于理解,结合上述图11为例说明。In a possible implementation manner, the first data packet includes identifiers of the first destination chiplet and the second destination chiplet. The first chiplet sending the data in the first data packet to the target chiplet based on the direction coordinate system and the principle of small bandwidth consumption includes: in the direction coordinate system established centering on the first chiplet, in the first When the target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the first chiplet sends the fifth data packet along the direction axis of the boundary of the target area. The fifth data packet includes the data and identifications of the first and second destination chiplets. The target area is the first area, the second area, the third area or the fourth area. For ease of understanding, it will be described with reference to the above-mentioned FIG. 11 as an example.
在图11中,假设子芯片5为上述第一子芯片,其接收到一个数据包,该数据包中的目的子芯片的标识指示目的子芯片为子芯片14和子芯片9。子芯片14位于第四区域,子芯片9位于第四方向轴上。第四方向轴为该第四区域的边界方向轴,那么,子芯片5可以该接收的数据包沿着第四方向轴的方向发送,即发送给子芯片9。子芯片9接收到该数据包后,存储数据包中的数据。并复制一份该数据重新生成一个数据包。该新的数据包包括子芯片14的标识,然后将该新的数据包发送给子芯片13或者子芯片10,再由该子芯片13或者子芯片10转发给子芯片14。In FIG. 11 , it is assumed that the sub-chip 5 is the above-mentioned first sub-chip, and it receives a data packet, and the identification of the destination sub-chip in the data packet indicates that the destination sub-chips are the sub-chip 14 and the sub-chip 9 . The chiplet 14 is located in the fourth region, and the chiplet 9 is located on the fourth axis. The fourth direction axis is the boundary direction axis of the fourth area, then the chiplet 5 can send the received data packet along the direction of the fourth direction axis, that is, to the chiplet 9 . After the chiplet 9 receives the data packet, it stores the data in the data packet. And make a copy of the data to regenerate a data package. The new data packet includes the identification of the sub-chip 14 , and then the new data packet is sent to the sub-chip 13 or the sub-chip 10 , and then forwarded to the sub-chip 14 by the sub-chip 13 or the sub-chip 10 .
一种可能的实现方式,上述第一子芯片还可以基于自身端口的拥塞情况来发送上述第一数据包中的数据。In a possible implementation manner, the above-mentioned first sub-chip may also send the data in the above-mentioned first data packet based on the congestion situation of its own port.
具体的,基于上述关于芯片系统的介绍可知,每个子芯片包括多个与其它子芯片通信的端口。其中,每个端口配置有对应的发送缓冲区,该发送缓冲区用于存储待发送的数据。Specifically, based on the above introduction about the chip system, it can be known that each sub-chip includes a plurality of ports for communicating with other sub-chips. Wherein, each port is configured with a corresponding sending buffer, and the sending buffer is used to store data to be sent.
上述第一子芯片接收到上述第一数据包之后,解析该第一数据包获知该第一数据包中的目的子芯片的标识。若该目的子芯片的标识指示该第一子芯片为目的子芯片,那么,该第一子芯片提取该第一数据包中的数据存储,以待后续处理。否则,该第一子芯片以该目的子芯片的标识为索引,在自身的转发映射表查找该第一数据包的发送端口。关于转发映射表的介绍可以参见前述关于图2的描述中对应的描述,此处不再赘述。若查找到的发送端口包括多个,那么,可以基于该多个发送端口的发送缓冲区的拥塞情况来确定具体的发送端口。具体的,为了提高数据的传输效率,可以选择该多个发送端口的发送缓冲区中待发送的数据量最少的端口来发送该第一数据包。After the first sub-chip receives the first data packet, it parses the first data packet to obtain the identity of the destination sub-chip in the first data packet. If the identification of the target chiplet indicates that the first chiplet is the target chiplet, then the first chiplet extracts the data stored in the first data packet for subsequent processing. Otherwise, the first sub-chip looks up the sending port of the first data packet in its own forwarding mapping table by using the identifier of the destination sub-chip as an index. For the introduction of the forwarding mapping table, reference may be made to the corresponding description in the foregoing description of FIG. 2 , which will not be repeated here. If there are multiple sending ports found, the specific sending port may be determined based on the congestion conditions of the sending buffers of the multiple sending ports. Specifically, in order to improve data transmission efficiency, the port with the least amount of data to be sent among the sending buffers of the multiple sending ports may be selected to send the first data packet.
可选的,上述第一子芯片中的转发映射表可以是基于上述方向坐标系和上述较小带宽消耗原则来初始化的。示例性地,以图11为例,假设子芯片5为该第一子芯片,并假设子芯片2为目的子芯片,那么子芯片5基于构建的方向坐标系和较小带宽消耗原则可以确定:可以通过自身的端口d0或d1发送目的地为子芯片2的数据包。因此,在子芯片5的转发映射表中,目的地为子芯片2对应的发送端口为端口d0或d1。其它目的子芯片的情况可以参考此处的描述,不再赘述。Optionally, the forwarding mapping table in the first chiplet may be initialized based on the above-mentioned direction coordinate system and the above-mentioned principle of smaller bandwidth consumption. Exemplarily, taking FIG. 11 as an example, assuming that the chiplet 5 is the first chiplet and the chiplet 2 is the target chiplet, then the chiplet 5 can be determined based on the constructed direction coordinate system and the principle of smaller bandwidth consumption: A data packet destined for the sub-chip 2 can be sent through its own port d0 or d1. Therefore, in the forwarding mapping table of the chiplet 5 , the sending port corresponding to the destination chip 2 is port d0 or d1 . For other purpose sub-chips, reference may be made to the description here, and details are not repeated here.
一种可能的实施方式中,若上述第一数据包中包括多个目的子芯片的标识,并且该第一子芯片为其中一个目的子芯片,那么,该第一子芯片提取该第一数据包中的数据存储,以待后续处理。并且,该第一子芯片会以剩下的目的子芯片的标识为索引在自身的转发映射表查找该第一数据包的数据的发送端口。In a possible implementation manner, if the above-mentioned first data packet includes identifications of multiple target chiplets, and the first chiplet is one of the target chiplets, then the first chiplet extracts the first data packet The data in is stored for subsequent processing. In addition, the first chiplet uses the identifiers of the remaining target chiplets as an index to look up the data sending port of the first data packet in its own forwarding mapping table.
若上述剩下的目的子芯片的标识为一个,那么,同理,查找到对应的发送端口后,选择发送端口的发送缓冲区中待发送的数据量最少的端口来发送该第一数据包中的数据。具体的,该数据会被重新封装为一个数据包进行发送,该重新封装的数据包中的目的子芯片的标识不再包括第一子芯片的标识,只包括该剩下的目的子芯片的标识。If the identification of the above-mentioned remaining target sub-chip is one, then, in the same way, after finding the corresponding sending port, select the port with the least amount of data to be sent in the sending buffer of the sending port to send the first data packet The data. Specifically, the data will be repackaged into a data packet for transmission, and the identification of the target sub-chip in the repackaged data packet no longer includes the identification of the first sub-chip, but only includes the identification of the remaining target sub-chips .
若上述剩下的目的子芯片的标识还有多个,那么,该第一子芯片在自身的转发映射表中分别查找对应的发送端口。若查找到的发送端口相同,那么,可以复制上述第一数据包包括的数据重新生成一个数据包,该新生成的数据包中包括该多个剩下的目的子芯片的标识。并将该新生成的数据包从查找到的相同的发送端口发送。同理,该发送端口可以是找到的发送端口中发送缓冲区待发送数据量最少的端口。If there are more than one identifiers of the remaining destination sub-chips, then the first sub-chip searches for corresponding sending ports in its own forwarding mapping table. If the found sending ports are the same, then the data included in the first data packet may be copied to regenerate a data packet, and the newly generated data packet includes the identifiers of the remaining destination chiplets. And send this newly generated packet from the same send port found. Similarly, the sending port may be the port with the least amount of data to be sent in the sending buffer among the found sending ports.
或者,若上述剩下的目的子芯片的标识还有多个,以两个为例,假设该剩下的目的子芯片为子芯片A和子芯片B。上述第一子芯片在自身的转发映射表中查找该子芯片A的标识映射的发送端口,以及查找该子芯片B的标识映射的发送端口。假设查找到的发送端口不同,那么,第一子芯片可以重新生成两个数据包:数据包A和数据包B。两个数据包均包括上述第一数据包包括的数据,其中数据包A包括的目的子芯片的标识为子芯片A的标识,数据包B包括的目的子芯片的标识为子芯片B的标识。然后,通过各自查找到的发送端口发送该数据包A和数据包B。同理,该发送端口可以是找到的发送端口中发送缓冲区待发送数据量最少的端口。Alternatively, if there are more than one identifiers of the remaining target sub-chips, two are taken as an example, assuming that the remaining target sub-chips are sub-chips A and sub-chips B. The above-mentioned first sub-chip looks up the sending port mapped with the identifier of the sub-chip A and the sending port mapped with the identifier of the sub-chip B in its own forwarding mapping table. Assuming that the found sending ports are different, the first sub-chip may regenerate two data packets: data packet A and data packet B. Both data packets include the data included in the above-mentioned first data packet, wherein the identifier of the target sub-chip included in the data packet A is the identifier of the sub-chip A, and the identifier of the target sub-chip included in the data packet B is the identifier of the sub-chip B. Then, the data packet A and the data packet B are sent through the respectively found sending ports. Similarly, the sending port may be the port with the least amount of data to be sent in the sending buffer among the found sending ports.
综上所述,本申请实施例通过基于芯片系统内的数据传输情况来传输接收到的数据,从而可以灵活调度数据的发送,提高数据的传输效率,进而提高芯片系统的处理性能。To sum up, the embodiment of the present application transmits the received data based on the data transmission situation in the chip system, so as to flexibly schedule the data transmission, improve the data transmission efficiency, and further improve the processing performance of the chip system.
上述主要对本申请实施例提供的芯片系统中的数据传输处理方法进行了介绍。可以理解的是,各个设备为了实现上述对应的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本文中所公开的实施例描述的各示例的单元及步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但这种实现不应认为超出本申请的范围。The foregoing mainly introduces the data transmission processing method in the chip system provided by the embodiment of the present application. It can be understood that, in order to realize the above corresponding functions, each device includes a corresponding hardware structure and/or software module for performing each function. Combining the units and steps of each example described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
本申请实施例可以根据上述方法示例对设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiments of the present application may divide the device into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in this embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图12示出了装置的一种具体的逻辑结构示意图,该装置可以是上述第一子芯片。该装置1200包括:In the case of dividing each functional module corresponding to each function, FIG. 12 shows a specific logical structural diagram of the device, which may be the above-mentioned first sub-chip. The device 1200 includes:
接收单元1201,用于接收第一数据包;其中,前述第一数据包包括目的子芯片的标识;前述装置1200和前述目的子芯片为芯片系统包括的子芯片,前述芯片系统中的多个子芯片以矩阵的形式排列,前述多个子芯片中的每个子芯片与周围相邻的子芯片连接;The receiving unit 1201 is configured to receive the first data packet; wherein, the aforementioned first data packet includes the identification of the target sub-chip; the aforementioned device 1200 and the aforementioned target sub-chip are sub-chips included in the chip system, and the multiple sub-chips in the aforementioned chip system Arranged in the form of a matrix, each of the aforementioned multiple sub-chips is connected to surrounding adjacent sub-chips;
发送单元1202,用于基于方向坐标系以较小带宽消耗原则向前述目的子芯片发送前述第一数据包中的数据,前述较小带宽消耗原则为以较小的传输带宽将前述数据送达前述目的子芯片的原则;The sending unit 1202 is configured to send the data in the aforementioned first data packet to the aforementioned destination chiplet based on the direction coordinate system with a smaller bandwidth consumption principle, and the aforementioned smaller bandwidth consumption principle is to deliver the aforementioned data to the aforementioned The principle of the target chip;
前述方向坐标系以前述装置1200为中心构建,前述方向坐标系包括第一方向轴、第二方向轴、第三方向轴和第四方向轴;前述装置1200所在的行位于方向相反的前述第一方向轴和前述第二方向轴中的至少一个方向轴上;前述装置1200所在的列位于方向相反的前述第三方向轴和前述第四方向轴中的至少一个方向轴上。The aforementioned directional coordinate system is constructed around the aforementioned device 1200, and the aforementioned directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; on at least one of the direction axis and the aforementioned second direction axis; the row in which the device 1200 is located is located on at least one of the aforementioned third direction axis and the aforementioned fourth direction axis in opposite directions.
一种可能的实施方式中,前述发送单元1202具体用于:In a possible implementation manner, the foregoing sending unit 1202 is specifically configured to:
在前述目的子芯片处于目标方向轴上的情况下,沿着前述目标方向轴的方向发送前述数据;前述目标方向轴为前述第一方向轴、前述第二方向轴、前述第三方向轴或前述第四方向轴。In the case that the aforementioned target chiplet is on the target direction axis, send the aforementioned data along the direction of the aforementioned target direction axis; the aforementioned target direction axis is the aforementioned first direction axis, the aforementioned second direction axis, the aforementioned third direction axis or the aforementioned Fourth orientation axis.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元1202具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
在前述第一目的子芯片和第二目的子芯片分别处于前述第一区域、第二区域、第三区域和第四区域中相邻的两个区域的情况下,沿着共同方向轴的方向发送第二数据包;前述第二数据包包括前述数据、第一目的子芯片和第二目的子芯片的标识;前述共同方向轴为前述相邻的两个区域共同边界的方向轴。In the case that the first target chiplet and the second target chiplet are located in two adjacent areas of the first area, the second area, the third area, and the fourth area respectively, send along the direction of the common direction axis The second data packet; the aforementioned second data packet includes the aforementioned data, the identification of the first target chiplet and the second target chiplet; the aforementioned common direction axis is the direction axis of the common boundary of the aforementioned two adjacent regions.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元1202具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
在前述第一目的子芯片处于前述第一区域,前述第二目的子芯片处于前述第三区域的情况下,沿着前述第一区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着前述第三区域两条边界方向轴中的一个方向轴的方向发送第四数据包;前述第三数据包包括前述数据和前述第一目的子芯片的标识,前述第四数据包包括前述数据和前述第二目的子芯片的标识。When the aforementioned first target chiplet is in the aforementioned first area and the aforementioned second target chiplet is in the aforementioned third area, send the third data packet, and send the fourth data packet along the direction of one of the two boundary direction axes of the aforementioned third area; the aforementioned third data packet includes the aforementioned data and the identification of the aforementioned first purpose sub-chip, and the aforementioned fourth data The packet includes the aforementioned data and the aforementioned identification of the second target chiplet.
一种可能的实施方式中,前述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;前述第一区域以前述第一方向轴和前述第三方向轴为边界,前述第二区域以前述第二方向轴和前述第三方向轴为边界,前述第三区域以前述第二方向轴和前述第四方向轴为边界,前述第四区域以前述第一方向轴和前述第四方向轴为边界;In a possible implementation manner, the aforementioned direction coordinate system further includes a first area, a second area, a third area, and a fourth area; the aforementioned first area is bounded by the aforementioned first direction axis and the aforementioned third direction axis, and the aforementioned The second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, and the fourth area is bounded by the first direction axis and the first direction axis. The four-direction axis is the boundary;
前述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;前述发送单元1202具体用于:The aforementioned first data packet includes the identification of the first purpose sub-chip and the second purpose sub-chip; the aforementioned sending unit 1202 is specifically used for:
在前述第一目的子芯片处于目标区域中,前述第二目的子芯片处于前述目标区域边界的方向轴上的情况下,沿着前述目标区域边界方向轴的方向发送第五数据包,前述第五数据包包括前述数据和前述第一目的子芯片和第二目的子芯片的标识;前述目标区域为第一区域、第二区域、第三区域或第四区域。When the first target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the fifth data packet is sent along the direction of the direction axis of the boundary of the target area, and the fifth The data packet includes the aforementioned data and the identifications of the aforementioned first target chiplet and the second target chiplet; the aforementioned target area is the first area, the second area, the third area or the fourth area.
一种可能的实施方式中,前述装置1200包括多个端口,前述多个端口中每个端口与另一个子芯片连接,前述每个端口对应有一个发送缓冲区,前述发送缓冲区用于存放待发送的数据;In a possible implementation manner, the aforementioned device 1200 includes a plurality of ports, each of the aforementioned plurality of ports is connected to another sub-chip, each of the aforementioned ports corresponds to a sending buffer, and the aforementioned sending buffer is used to store the data sent;
前述装置1200还包括选择单元,用于:The aforementioned device 1200 also includes a selection unit for:
存在至少两个端口发送前述数据的情况下,选择第一端口发送前述数据;前述第一端口为前述至少两个端口中发送缓冲区内待发送的数据量最少的端口。When there are at least two ports to send the aforementioned data, select the first port to send the aforementioned data; the aforementioned first port is the port with the least amount of data to be sent in the sending buffer among the aforementioned at least two ports.
一种可能的实施方式中,前述第一数据包包括的目的子芯片的标识为多个,前述多个目的子芯片的标识中包括前述装置1200的标识;前述装置1200还包括:In a possible implementation manner, the aforementioned first data packet includes multiple target chiplet identifiers, and the aforementioned multiple target chiplet identifiers include the aforementioned device 1200; the aforementioned device 1200 further includes:
存储单元,用于存储前述第一数据包中的数据;a storage unit, configured to store the data in the aforementioned first data packet;
封装单元,用于将前述数据重新封装获得第六数据包;An encapsulation unit, configured to re-encapsulate the foregoing data to obtain a sixth data packet;
前述发送单元1202,还用于向除前述装置1200之外的目的子芯片发送前述第六数据包。The aforementioned sending unit 1202 is further configured to send the aforementioned sixth data packet to a destination chiplet other than the aforementioned device 1200 .
图13所示为本申请提供的装置的一种具体的硬件结构示意图。该装置1300可以是上述第一子芯片。该装置1300包括:处理器1301、存储器1302和通信端口1303。处理器1301、通信端口1303以及存储器1302可以相互连接或者通过总线1304相互连接。FIG. 13 is a schematic diagram of a specific hardware structure of the device provided by the present application. The device 1300 may be the above-mentioned first chiplet. The device 1300 includes: a processor 1301 , a memory 1302 and a communication port 1303 . The processor 1301 , the communication port 1303 and the memory 1302 may be connected to each other or through a bus 1304 .
示例性的,存储器1302用于存储装置1300的计算机程序和数据,存储器1302可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。示例性的,该存储器1302可以是上述图2中所示的静态存储器。Exemplarily, the memory 1302 is used to store computer programs and data of the device 1300, and the memory 1302 may include but not limited to random storage memory (random access memory, RAM), read-only memory (read-only memory, ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD-ROM), etc. Exemplarily, the memory 1302 may be the static memory shown in FIG. 2 above.
通信端口1303包括发送端口和接收端口,通信端口1303的个数可以为多个,用于支持装置1300进行通信,例如接收或发送数据或消息等。示例性地,该通信端口1303可以是上述图2中所示的端口d0、d1、d2和d3。The communication port 1303 includes a sending port and a receiving port. The number of the communication port 1303 may be multiple, and is used to support the device 1300 to communicate, for example, to receive or send data or messages. Exemplarily, the communication port 1303 may be the ports d0, d1, d2 and d3 shown in FIG. 2 above.
示例性的,处理器1301可以是中央处理器单元、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。示例性地,该处理器1301可以是上述图2中所示的处理模块。该处理器1301可以用于读取上述存储器1302中存储的程序,使得装置1300执行如上述图8及其具体的实施例中所述的第一子芯片执行的操作。Exemplarily, the processor 1301 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component or any combination thereof. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. Exemplarily, the processor 1301 may be the processing module shown in FIG. 2 above. The processor 1301 may be used to read the program stored in the above-mentioned memory 1302, so that the device 1300 executes the operations performed by the first chiplet as described above in FIG. 8 and its specific embodiments.
图13所示装置1300中各个单元的具体操作以及有益效果可以参见上述图8及其具体的方法实施例中对应的描述,此处不再赘述。For the specific operations and beneficial effects of each unit in the apparatus 1300 shown in FIG. 13 , refer to the corresponding description in FIG. 8 and its specific method embodiment above, and details are not repeated here.
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行上述图8及其可能的方法实施例中任一实施例所述的第一子芯片执行的操作。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor as described in any of the above-mentioned FIG. 8 and its possible method embodiments. Operation performed by the first chiplet.
本申请实施例还提供一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述图8及其可能的方法实施例中任一实施例所述的第一子芯片执行的操作将被实现。An embodiment of the present application also provides a computer program product. When the computer program product is read and executed by a computer, the operation performed by the first sub-chip described in any of the above-mentioned FIG. 8 and its possible method embodiments will be realized.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, rather than limiting them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.

Claims (10)

  1. 一种芯片系统中的数据传输处理方法,其中,所述方法包括:A data transmission processing method in a chip system, wherein the method includes:
    第一子芯片接收第一数据包;其中,所述第一数据包包括目的子芯片的标识;所述第一子芯片和所述目的子芯片为芯片系统包括的子芯片,所述芯片系统中的多个子芯片以矩阵的形式排列,所述多个子芯片中的每个子芯片与周围相邻的子芯片连接;The first sub-chip receives the first data packet; wherein, the first data packet includes the identification of the target sub-chip; the first sub-chip and the target sub-chip are sub-chips included in the chip system, and in the chip system The multiple sub-chips are arranged in a matrix, and each sub-chip in the multiple sub-chips is connected to surrounding adjacent sub-chips;
    所述第一子芯片基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据,所述较小带宽消耗原则为以较小的传输带宽将所述数据送达所述目的子芯片的原则;The first sub-chip sends the data in the first data packet to the destination sub-chip based on the direction coordinate system with a smaller bandwidth consumption principle, and the smaller bandwidth consumption principle is to transfer the The principle of data delivery to the target sub-chip;
    所述方向坐标系以所述第一子芯片为中心构建,所述方向坐标系包括第一方向轴、第二方向轴、第三方向轴和第四方向轴;所述第一子芯片所在的行位于方向相反的所述第一方向轴和所述第二方向轴中的至少一个方向轴上;所述第一子芯片所在的列位于方向相反的所述第三方向轴和所述第四方向轴中的至少一个方向轴上。The directional coordinate system is constructed around the first sub-chip, and the directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; where the first sub-chip is located The row is located on at least one of the first direction axis and the second direction axis in opposite directions; the column where the first chiplet is located is located on the third direction axis and the fourth direction axis in opposite directions. on at least one of the orientation axes.
  2. 根据权利要求1所述的方法,其中,所述第一子芯片基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据包括:The method according to claim 1, wherein the first sub-chip sends the data in the first data packet to the destination sub-chip based on a direction coordinate system with a principle of less bandwidth consumption comprising:
    在所述目的子芯片处于目标方向轴上的情况下,所述第一子芯片沿着所述目标方向轴的方向发送所述数据;所述目标方向轴为所述第一方向轴、所述第二方向轴、所述第三方向轴或所述第四方向轴。When the target chiplet is on the target direction axis, the first chiplet sends the data along the direction of the target direction axis; the target direction axis is the first direction axis, the The second direction axis, the third direction axis or the fourth direction axis.
  3. 根据权利要求1所述的方法,其中,所述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;所述第一区域以所述第一方向轴和所述第三方向轴为边界,所述第二区域以所述第二方向轴和所述第三方向轴为边界,所述第三区域以所述第二方向轴和所述第四方向轴为边界,所述第四区域以所述第一方向轴和所述第四方向轴为边界;The method according to claim 1, wherein the directional coordinate system further comprises a first area, a second area, a third area and a fourth area; three direction axes as boundaries, the second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, the fourth area is bounded by the first orientation axis and the fourth orientation axis;
    所述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;所述第一子芯片基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据包括:The first data packet includes the identification of the first target chiplet and the second target chiplet; the first chiplet sends the first data to the target chiplet based on the direction coordinate system and the principle of less bandwidth consumption The data in the package includes:
    在所述第一目的子芯片和第二目的子芯片分别处于所述第一区域、第二区域、第三区域和第四区域中相邻的两个区域的情况下,所述第一子芯片沿着共同方向轴的方向发送第二数据包;所述第二数据包包括所述数据、第一目的子芯片和第二目的子芯片的标识;所述共同方向轴为所述相邻的两个区域共同边界的方向轴。In the case where the first target chiplet and the second target chiplet are located in two adjacent areas of the first area, the second area, the third area and the fourth area respectively, the first chiplet Sending a second data packet along the direction of the common direction axis; the second data packet includes the data, the identification of the first purpose sub-chip and the second purpose sub-chip; the common direction axis is the two adjacent The direction axis of the common boundary of the regions.
  4. 根据权利要求1所述的方法,其中,所述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;所述第一区域以所述第一方向轴和所述第三方向轴为边界,所述第二区域以所述第二方向轴和所述第三方向轴为边界,所述第三区域以所述第二方向轴和所述第四方向轴为边界,所述第四区域以所述第一方向轴和所述第四方向轴为边界;The method according to claim 1, wherein the directional coordinate system further comprises a first area, a second area, a third area and a fourth area; three direction axes as boundaries, the second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, the fourth area is bounded by the first orientation axis and the fourth orientation axis;
    所述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;所述第一子芯片基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据包括:The first data packet includes the identification of the first target chiplet and the second target chiplet; the first chiplet sends the first data to the target chiplet based on the direction coordinate system and the principle of less bandwidth consumption The data in the package includes:
    在所述第一目的子芯片处于所述第一区域,所述第二目的子芯片处于所述第三区域的情况下,所述第一子芯片沿着所述第一区域两条边界的方向轴中的一个方向轴的方向发送第三数据包,并沿着所述第三区域两条边界方向轴中的一个方向轴的方向发送第四数据包;所述第三数据包包括所述数据和所述第一目的子芯片的标识,所述第四数据包包括所述数据和所述第二目的子芯片的标识。When the first target chiplet is in the first area and the second target chiplet is in the third area, the first chiplet is along the direction of the two boundaries of the first area The third data packet is sent in the direction of one of the direction axes in the axis, and the fourth data packet is sent in the direction of one of the two boundary direction axes of the third area; the third data packet includes the data and the identifier of the first destination chiplet, the fourth data packet includes the data and the identifier of the second destination chiplet.
  5. 根据权利要求1所述的方法,其中,所述方向坐标系还包括第一区域、第二区域、第三区域和第四区域;所述第一区域以所述第一方向轴和所述第三方向轴为边界,所述第二区域以所述第二方向轴和所述第三方向轴为边界,所述第三区域以所述第二方向轴和所述第四方向轴为边界,所述第四区域以所述第一方向轴和所述第四方向轴为边界;The method according to claim 1, wherein the directional coordinate system further comprises a first area, a second area, a third area and a fourth area; three direction axes as boundaries, the second area is bounded by the second direction axis and the third direction axis, the third area is bounded by the second direction axis and the fourth direction axis, the fourth area is bounded by the first orientation axis and the fourth orientation axis;
    所述第一数据包中包括第一目的子芯片和第二目的子芯片的标识;所述第一子芯片基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据包括:The first data packet includes the identification of the first target chiplet and the second target chiplet; the first chiplet sends the first data to the target chiplet based on the direction coordinate system and the principle of less bandwidth consumption The data in the package includes:
    在所述第一目的子芯片处于目标区域中,所述第二目的子芯片处于所述目标区域边界的方向轴上的情况下,所述第一子芯片沿着所述目标区域边界方向轴的方向发送第五数据包,所述第五数据包包括所述数据和所述第一目的子芯片和第二目的子芯片的标识;所述目标区域为第一区域、第二区域、第三区域或第四区域。In the case where the first target chiplet is in the target area and the second target chiplet is on the direction axis of the boundary of the target area, the direction axis of the first chiplet along the boundary of the target area Sending a fifth data packet in the direction, the fifth data packet including the data and the identification of the first target chiplet and the second target chiplet; the target area is the first area, the second area, and the third area or the fourth area.
  6. 根据权利要求1所述的方法,其中,所述第一子芯片包括多个端口,所述多个端口中每个端口与另一个子芯片连接,所述每个端口对应有一个发送缓冲区,所述发送缓冲区用于存放待发送的数据;The method according to claim 1, wherein the first sub-chip includes a plurality of ports, each port of the plurality of ports is connected to another sub-chip, and each port corresponds to a sending buffer, The sending buffer is used to store data to be sent;
    所述方法还包括:The method also includes:
    存在至少两个端口发送所述数据的情况下,所述第一子芯片选择第一端口发送所述数据;所述第一端口为所述至少两个端口中发送缓冲区内待发送的数据量最少的端口。When there are at least two ports to send the data, the first sub-chip selects the first port to send the data; the first port is the amount of data to be sent in the send buffer of the at least two ports Minimal number of ports.
  7. 根据权利要求1-6任一项所述的方法,其中,所述第一数据包包括的目的子芯片的标识为多个,所述多个目的子芯片的标识中包括所述第一子芯片的标识;所述方法还包括:The method according to any one of claims 1-6, wherein the first data packet includes a plurality of target chiplet identifiers, and the identifiers of the plurality of target chiplets include the first chiplet the identification of; said method also includes:
    所述第一子芯片存储所述第一数据包中的数据;The first chiplet stores data in the first data packet;
    所述第一子芯片将所述数据重新封装获得第六数据包;The first chiplet repackages the data to obtain a sixth data packet;
    所述第一子芯片向除所述第一子芯片之外的目的子芯片发送所述第六数据包。The first chiplet sends the sixth data packet to a destination chiplet other than the first chiplet.
  8. 一种子芯片,其中,所述子芯片为第一子芯片,所述第一子芯片包括:A sub-chip, wherein the sub-chip is a first sub-chip, and the first sub-chip includes:
    接收单元,用于接收第一数据包;其中,所述第一数据包包括目的子芯片的标识;所述第一子芯片和所述目的子芯片为芯片系统包括的子芯片,所述芯片系统中的多个子芯片以矩阵的形式排列,所述多个子芯片中的每个子芯片与周围相邻的子芯片连接;A receiving unit, configured to receive a first data packet; wherein, the first data packet includes an identification of a target sub-chip; the first sub-chip and the target sub-chip are sub-chips included in a chip system, and the chip system The plurality of subchips in the matrix are arranged in the form of a matrix, and each subchip in the plurality of subchips is connected to surrounding adjacent subchips;
    发送单元,用于基于方向坐标系以较小带宽消耗原则向所述目的子芯片发送所述第一数据包中的数据,所述较小带宽消耗原则为以较小的传输带宽将所述数据送达所述目的子芯片的原则;A sending unit, configured to send the data in the first data packet to the destination chiplet based on the direction coordinate system with a smaller bandwidth consumption principle, the smaller bandwidth consumption principle is to transmit the data with a smaller transmission bandwidth The principle of delivering the target sub-chip;
    所述方向坐标系以所述第一子芯片为中心构建,所述方向坐标系包括第一方向轴、第二方向轴、第三方向轴和第四方向轴;所述第一子芯片所在的行位于方向相反的所述第一方向轴和所述第二方向轴中的至少一个方向轴上;所述第一子芯片所在的列位于方向相反的所述第三方向轴和所述第四方向轴中的至少一个方向轴上。The directional coordinate system is constructed around the first sub-chip, and the directional coordinate system includes a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis; where the first sub-chip is located The row is located on at least one of the first direction axis and the second direction axis in opposite directions; the column where the first chiplet is located is located on the third direction axis and the fourth direction axis in opposite directions. on at least one of the orientation axes.
  9. 一种子芯片,其中,包括处理器、存储器和通信端口;其中,所述存储器和通信端口与所述处理器耦合,所述通信端口用于收发数据,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述子芯片执行如权利要求1-7任一项所述的方法;A sub-chip, including a processor, a memory and a communication port; wherein the memory and the communication port are coupled to the processor, the communication port is used to send and receive data, the memory is used to store computer programs, the The processor is used to call the computer program, so that the sub-chip executes the method according to any one of claims 1-7;
    所述子芯片为芯片系统包括的子芯片,所述芯片系统中的多个子芯片以矩阵的形式排列,所述多个子芯片中的每个子芯片与周围相邻的子芯片连接。The sub-chips are sub-chips included in a chip system, and a plurality of sub-chips in the chip system are arranged in a matrix, and each sub-chip in the plurality of sub-chips is connected to surrounding adjacent sub-chips.
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1-7任意一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1-7 is implemented.
PCT/CN2022/099849 2021-12-28 2022-06-20 Data transmission processing method in chip system and related apparatus WO2023123905A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111633357.XA CN114297130A (en) 2021-12-28 2021-12-28 Data transmission processing method in chip system and related device
CN202111633357.X 2021-12-28

Publications (1)

Publication Number Publication Date
WO2023123905A1 true WO2023123905A1 (en) 2023-07-06

Family

ID=80971560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099849 WO2023123905A1 (en) 2021-12-28 2022-06-20 Data transmission processing method in chip system and related apparatus

Country Status (2)

Country Link
CN (1) CN114297130A (en)
WO (1) WO2023123905A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297130A (en) * 2021-12-28 2022-04-08 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device
CN117041186B (en) * 2023-10-07 2024-01-30 苏州仰思坪半导体有限公司 Data transmission method, chip system, computing device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6961782B1 (en) * 2000-03-14 2005-11-01 International Business Machines Corporation Methods for routing packets on a linear array of processors
CN101802810A (en) * 2007-05-31 2010-08-11 雷丁大学 Processor
CN112822124A (en) * 2020-12-31 2021-05-18 深圳云天励飞技术股份有限公司 Multi-chip communication system, method, chip and storage medium
CN113597621A (en) * 2018-12-29 2021-11-02 华为技术有限公司 Computing resource allocation technique and neural network system
CN114297130A (en) * 2021-12-28 2022-04-08 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device
CN114328623A (en) * 2021-12-28 2022-04-12 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6961782B1 (en) * 2000-03-14 2005-11-01 International Business Machines Corporation Methods for routing packets on a linear array of processors
CN101802810A (en) * 2007-05-31 2010-08-11 雷丁大学 Processor
CN113597621A (en) * 2018-12-29 2021-11-02 华为技术有限公司 Computing resource allocation technique and neural network system
CN112822124A (en) * 2020-12-31 2021-05-18 深圳云天励飞技术股份有限公司 Multi-chip communication system, method, chip and storage medium
CN114297130A (en) * 2021-12-28 2022-04-08 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device
CN114328623A (en) * 2021-12-28 2022-04-12 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device

Also Published As

Publication number Publication date
CN114297130A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
US11882025B2 (en) System and method for facilitating efficient message matching in a network interface controller (NIC)
WO2023123905A1 (en) Data transmission processing method in chip system and related apparatus
US11256656B2 (en) Hybrid programmable many-core device with on-chip interconnect
CN104821887A (en) Device and Method for Packet Processing with Memories Having Different Latencies
US5991797A (en) Method for directing I/O transactions between an I/O device and a memory
US7996569B2 (en) Method and system for zero copy in a virtualized network environment
WO2023123902A1 (en) Data transmission processing method in chip system, and related device
US8312197B2 (en) Method of routing an interrupt signal directly to a virtual processing unit in a system with one or more physical processing units
WO2022094771A1 (en) Network chip and network device
US10936525B2 (en) Flexible routing of network data within a programmable integrated circuit
US20090006546A1 (en) Multiple node remote messaging
US10248315B2 (en) Devices and methods for interconnecting server nodes
CN110995598B (en) Variable-length message data processing method and scheduling device
WO2022001417A1 (en) Data transmission method, processor system, and memory access system
US9515963B2 (en) Universal network interface controller
US11074206B1 (en) Message protocol for a data processing system
US11403250B2 (en) Operation accelerator, switch, task scheduling method, and processing system
US20230229622A1 (en) Processing of ethernet packets at a programmable integrated circuit
KR20240024188A (en) network interface device
CN106844263B (en) Configurable multiprocessor-based computer system and implementation method
CN114039894B (en) Network performance optimization method, system, device and medium based on vector packet
WO2023185666A1 (en) Switch fabric unit, data forwarding method, switching frame, and network system
CN117812148A (en) High-efficient general memory data removes device based on FPGA

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913194

Country of ref document: EP

Kind code of ref document: A1