CN114328623A - Data transmission processing method in chip system and related device - Google Patents

Data transmission processing method in chip system and related device Download PDF

Info

Publication number
CN114328623A
CN114328623A CN202111633371.XA CN202111633371A CN114328623A CN 114328623 A CN114328623 A CN 114328623A CN 202111633371 A CN202111633371 A CN 202111633371A CN 114328623 A CN114328623 A CN 114328623A
Authority
CN
China
Prior art keywords
sub
chip
data
destination
chips
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111633371.XA
Other languages
Chinese (zh)
Inventor
黎立煌
陈宁
王和国
曹庆新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN202111633371.XA priority Critical patent/CN114328623A/en
Publication of CN114328623A publication Critical patent/CN114328623A/en
Priority to PCT/CN2022/099777 priority patent/WO2023123902A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a data transmission processing method and a related device in a chip system, wherein the method comprises the following steps: the source sub-chip receives the configuration parameters and configures a preset flow output table according to the configuration parameters; the stream output table comprises an identification of a data stream and an identification of a destination sub-chip of the data stream; the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure; the source sub-chip executes a first data processing task to obtain output data; and the source sub-chip generates a plurality of data packets from the output data based on the stream output table and transmits the data packets, wherein the data packets comprise the identification of the data stream and the identification of the destination sub-chip. According to the method and the device, efficient data transmission among the sub-chips in the chip system can be realized, and the processing performance of the chip system is improved.

Description

Data transmission processing method in chip system and related device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data transmission processing method and a related apparatus in a chip system.
Background
A system-on-a-chip may include a plurality of sub-chips, each having the capability of individually processing data, connected in a topology to enable communication with each other. Moreover, the multiple sub-chips can cooperatively process a single large-scale computing task in a model parallel mode so as to improve the processing efficiency of the task. In the process of the cooperative processing task, the multiple sub-chips need to frequently perform data interactive transmission, and the efficiency of the data transmission affects the processing performance of the whole chip system.
Disclosure of Invention
The embodiment of the application discloses a data transmission processing method and a related device in a chip system, which can realize high-efficiency data transmission among sub-chips in the chip system and improve the processing performance of the chip system.
In a first aspect, the present application provides a data transmission processing method in a chip system, where the method includes:
the source sub-chip receives the configuration parameters and configures a preset flow output table according to the configuration parameters; the stream output table comprises an identifier of a data stream and an identifier of a destination sub-chip of the data stream; the source sub-chip and the target sub-chip are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
the source sub-chip executes a first data processing task to obtain output data;
and the source sub-chip generates and transmits a plurality of data packets according to the output data based on the stream output table, wherein the data packets comprise the identification of the data stream and the identification of the target sub-chip.
In the application, the flow output table in the source sub-chip is configured in advance, so that the source sub-chip can quickly generate and send the data packet through inquiring the configured flow output table, the sending efficiency of data is improved, high-efficiency data transmission among the sub-chips in the chip system is realized, and the processing performance of the chip system is improved.
In a possible implementation manner, the stream output table further includes one or more items of an identifier of the first data processing task, indication information of the number of packets that have sent the data stream, and a start address in the source chiplet for storing data included in the data stream; the data packet further includes an identification of the first data processing task.
In the present application, the stream output table includes information that can be used to distinguish different tasks, and the generated data packet also includes a corresponding task identifier to indicate the task to which the data in the data packet belongs. The indication information of the number of the data packets and the initial address included in the stream output table can quickly calculate the storage address of the data to be sent, so that data can be quickly acquired and sent in a packaged mode.
In one possible implementation manner, before the source chiplet generates and sends the output data into a plurality of data packets based on the stream output table, the method further includes: the source sub-chip receives the unblocking data packet; wherein, the unblocking data packet includes an identifier of the data stream, and the unblocking data packet is used to indicate to the source sub-chip that the destination sub-chip is ready to receive the data stream.
In the application, the destination sub-chip prepares for receiving data and then sends the unblocking data packet to the source sub-chip, so that the situations of data packet loss and the like caused by data receiving due to the fact that the destination sub-chip is not prepared can be avoided.
In a possible implementation manner, the sending, by the source chiplet, the plurality of data packets includes: the source sub-chip sends the plurality of data packets based on the port forwarding mapping table; the port forwarding mapping table includes a mapping relationship between the identifier of the destination sub-chip and the sending port.
In the application, the sending port of the data packet is determined by inquiring the port forwarding mapping table so as to realize the rapid transmission of the data packet.
In a possible implementation manner, when there are a plurality of destination sub-chips of the data stream, the data packet includes identifiers of the plurality of destination sub-chips.
In the application, the data packet can carry the identifiers of the plurality of destination sub-chips, and compared with the existing situation that each destination sends one data packet, the number of the sent data packets can be reduced, and the transmission bandwidth is saved.
In a possible implementation manner, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier; when the destination sub-chip is a sub-chip of the subsystem, the data packet further includes an identifier of the subsystem.
In the application, the destination of the data packet can be quickly positioned through the identification of the chip subsystem, and the data transmission efficiency is improved.
In a second aspect, the present application provides a method for processing data transmission in a chip system, where the method includes:
the target sub-chip receives configuration parameters and configures a preset flow input table according to the configuration parameters, wherein the flow input table comprises at least one data flow identifier to be received; the target sub-chip is a sub-chip of a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
when the target sub-chip receives a data packet, judging whether the flow input table contains a data flow identifier in the data packet;
and when the flow input table contains the data flow identification in the data packet, storing the data in the data packet.
In the application, the stream input table in the target sub-chip is configured in advance, so that the target sub-chip can quickly receive the data packet to process the data packet through the stream input table configured by inquiry, the data receiving efficiency is improved, the efficient data transmission among the sub-chips in the chip system is realized, and the processing performance of the chip system is improved.
In a possible embodiment, the stream input table further includes one or more items of an identifier of the first data processing task, information indicating the number of packets that have received the data stream, and a start address of the destination sub-chip for storing data included in the data stream; the data packet further includes an identification of the first data processing task.
In this application, the flow entry table includes an identification of the task that can be used to distinguish the packets of different tasks. The indication information of the number of the data packets and the initial address included in the stream output table can quickly calculate the storage address of the data in the received data packet, so that the data storage can be quickly realized.
In a possible implementation manner, before the destination chiplet receives the data packet, the destination chiplet further includes: the target sub-chip sends a unblocking data packet; the unblocking data packet is used for indicating the source sub-chip that the target sub-chip is ready to receive the data stream; the source sub-chip is a sub-chip for transmitting the data packet in the chip system.
In the application, the destination sub-chip prepares for receiving data and then sends the unblocking data packet to the source sub-chip, so that the situations of data packet loss and the like caused by data receiving due to the fact that the destination sub-chip is not prepared can be avoided.
In a possible embodiment, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier; the data packet further includes an identification of the subsystem.
In the application, the destination of the data packet can be quickly positioned through the identification of the chip subsystem, and the data transmission efficiency is improved.
In a third aspect, the present application provides a data transmission processing method in a chip system, where the method includes:
the controller distributes a first data processing task for the source sub-chip, and data obtained after the first data processing task is executed is sent to the destination sub-chip in a data stream mode; the controller, the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
the controller configures an identifier for the data stream;
the controller sends the identifier of the data stream and the identifier of the target sub-chip to the source sub-chip; wherein, the identifier of the data stream and the identifier of the destination sub-chip are used for associating and storing the stream output table of the source sub-chip; the stream output table is the basis for the source sub-chip to send data.
The method and the device have the advantages that the controller is used for distributing tasks for the source sub-chips, and the stream output tables of the source sub-chips are configured, so that the source sub-chips can quickly generate and send data packets through inquiring the configured stream output tables, the sending efficiency of data is improved, efficient data transmission among the sub-chips in the chip system is realized, and the processing performance of the chip system is improved.
In a possible embodiment, the method further comprises:
the controller allocates a second data processing task to the target sub-chip, and the second data processing task is executed based on the data obtained after the execution of the first data processing task is completed;
the controller sends the identifier of the data stream to the target sub-chip; the identifier of the data stream is used for being stored in a stream input table of the target sub-chip, and the stream input table is a basis for the target sub-chip to receive data.
The task is distributed to the target sub-chip through the controller, the stream input table of the target sub-chip is configured, the target sub-chip can rapidly receive the data packet to process through inquiring the configured stream input table, the data receiving efficiency is improved, efficient data transmission among the sub-chips in the chip system is achieved, and the processing performance of the chip system is improved.
In a possible embodiment, the method further comprises:
the controller acquires data transmission conditions among the sub-chips in the chip system;
the controller generates scheduling information for the source sub-chip based on the data transmission condition, wherein the scheduling information indicates that the data stream is sent to a sending port of the destination sub-chip in the source sub-chip;
the controller sends the scheduling information to the source sub-chip.
In the application, the controller realizes the scheduling of the data packet sending port based on the data transmission condition of the whole chip system, so that the data packet is prevented from being transmitted from a crowded path, and the data packet transmission efficiency is improved.
In a fourth aspect, the present application provides a source chiplet comprising:
a receiving unit, configured to receive configuration parameters;
the configuration unit is used for configuring a preset flow output table according to the configuration parameters; the stream output table comprises an identifier of a data stream and an identifier of a destination sub-chip of the data stream; the source sub-chip and the target sub-chip are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
the execution unit is used for executing the first data processing task to obtain output data;
a generating unit, configured to generate a plurality of data packets from the output data based on the stream output table, where each data packet includes an identifier of the data stream and an identifier of the destination sub-chip;
and a transmitting unit, configured to transmit the plurality of data packets.
In a possible embodiment, the stream output table further includes one or more items of an identifier of the first data processing task, information indicating the number of packets that have been sent from the data stream, and a start address of the source chiplet for storing data included in the data stream;
the data packet further includes an identification of the first data processing task.
In a possible embodiment, the receiving unit is further configured to, before the generating unit generates the output data into a plurality of data packets based on the stream output table,
receiving a unblocking data packet; wherein, the unblocking data packet includes an identifier of the data stream, and the unblocking data packet is used to indicate to the source sub-chip that the destination sub-chip is ready to receive the data stream.
In a possible implementation manner, the foregoing sending unit is specifically configured to:
transmitting the plurality of data packets based on the port forwarding mapping table; the port forwarding mapping table includes a mapping relationship between the identifier of the destination sub-chip and the sending port.
In a possible implementation manner, when there are a plurality of destination sub-chips of the data stream, the data packet includes identifications of the plurality of destination sub-chips.
In a possible embodiment, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier;
when the destination sub-chip is a sub-chip of the subsystem, the data packet further includes an identifier of the subsystem.
In a fifth aspect, the present application provides a destination sub-chip, comprising:
a receiving unit, configured to receive configuration parameters;
the configuration unit is used for configuring a preset flow input table according to the configuration parameters, and the flow input table comprises at least one data flow identifier to be received; the target sub-chip is a sub-chip of a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
a judging unit, configured to judge whether the flow input table includes a data flow identifier in the data packet when the data packet is received;
and a storage unit, configured to store the data in the packet when the flow entry table contains the data flow identifier in the packet.
In a possible embodiment, the stream input table further includes one or more items of an identifier of the first data processing task, information indicating the number of packets that have received the data stream, and a start address of the destination sub-chip for storing data included in the data stream;
the data packet further includes an identification of the first data processing task.
In a possible implementation manner, the destination sub-chip further includes a sending unit, configured to send the data packet to the destination sub-chip,
sending unblocking data packets; the unblocking data packet is used for indicating the source sub-chip that the target sub-chip is ready to receive the data stream; the source sub-chip is a sub-chip for transmitting the data packet in the chip system.
In a possible embodiment, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier;
the data packet further includes an identification of the subsystem.
In a sixth aspect, the present application provides a controller comprising:
the distribution unit is used for distributing a first data processing task for the source sub-chip, and the data obtained after the execution of the first data processing task is finished is sent to the destination sub-chip in a data stream form; the controller, the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
a configuration unit, configured to configure an identifier for the data stream;
a sending unit, configured to send the identifier of the data stream and the identifier of the destination sub-chip to the source sub-chip; wherein, the identifier of the data stream and the identifier of the destination sub-chip are used for associating and storing the stream output table of the source sub-chip; the stream output table is the basis for the source sub-chip to send data.
In a possible implementation manner, the allocating unit is further configured to allocate a second data processing task to the destination sub-chip, where the second data processing task is executed based on data obtained after the execution of the first data processing task is completed;
the sending unit is further configured to send the identifier of the data stream to the destination sub-chip; the identifier of the data stream is used for being stored in a stream input table of the target sub-chip, and the stream input table is a basis for the target sub-chip to receive data.
In a possible embodiment, the controller further includes:
the acquisition unit is used for acquiring the data transmission condition among the sub-chips in the chip system;
a generating unit, configured to generate scheduling information for the source sub-chip based on the data transmission condition, where the scheduling information indicates a sending port of the source sub-chip to which the data stream is sent to the destination sub-chip;
the sending unit is further configured to send the scheduling information to the source chiplet.
In a seventh aspect, the present application provides a sub-chip, the sub-chip comprising a processor, a memory, and a communication port; wherein the memory and the communication port are coupled to the processor, the communication port is used for transceiving data, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the sub-chip to execute the method according to any one of the first aspect;
the sub-chip is a sub-chip of a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
In an eighth aspect, the present application provides a sub-chip comprising a processor, a memory, and a communication port; wherein the memory and the communication port are coupled to the processor, the communication port is used for transceiving data, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the sub-chip to execute the method according to any one of the second aspect;
the sub-chip is a sub-chip of a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
In a ninth aspect, the present application provides a sub-chip comprising a processor, a memory, and a communication port; wherein the memory and the communication port are coupled to the processor, the communication port is used for transceiving data, the memory is used for storing a computer program, and the processor is used for calling the computer program to make the sub-chip execute the method according to any one of the third aspect;
the sub-chip is a sub-chip of a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
In a tenth aspect, the present application provides a chip system, which includes a source sub-chip, a destination sub-chip, and a controller; wherein the source sub-chip is the source sub-chip of any one of the above-mentioned fourth aspects, the destination sub-chip is the destination sub-chip of any one of the above-mentioned fifth aspects, and the controller is the controller of any one of the above-mentioned sixth aspects; alternatively, the first and second electrodes may be,
the source sub-chip is the sub-chip of the seventh aspect, the destination sub-chip is the sub-chip of the eighth aspect, and the controller is the sub-chip of the ninth aspect.
In an eleventh aspect, the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the first aspects; alternatively, the first and second electrodes may be,
the computer program, when executed by a processor, implements the method of any one of the second aspects; alternatively, the first and second electrodes may be,
the computer program as described above when executed by a processor implements the method of any one of the third aspects.
In a twelfth aspect, the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of any of the first aspects; alternatively, the first and second electrodes may be,
the computer program, when executed by a processor, implements the method of any one of the second aspects; alternatively, the first and second electrodes may be,
the computer program, when executed by a processor, implements the method of any of the third aspects.
It will be appreciated that the fourth to twelfth aspects described above all correspond to the implementation of the method provided by any of the first to third aspects described above. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.
Drawings
The drawings to be used in the embodiments of the present application will be described below.
FIG. 1 is a schematic diagram of a chip system provided herein;
FIG. 2 is a schematic structural diagram of a sub-chip provided herein;
fig. 3 to fig. 6 are schematic diagrams of a chip system provided in the present application;
FIG. 7 is a schematic diagram illustrating the sub-chipset partitioning provided herein;
fig. 8 is a schematic flowchart of a data transmission processing method in a chip system provided in the present application;
fig. 9A and 9B are schematic diagrams of a data processing flow provided by the present application;
FIG. 10 is a diagram of a data packet structure provided herein;
FIG. 11 is a schematic diagram of a port of a chiplet in the chip system provided herein;
FIG. 12 is a schematic diagram of a packet structure provided herein;
FIG. 13 is a schematic flow chart of a routing implementation provided herein;
FIG. 14 is a schematic view of a directional coordinate system provided herein;
FIG. 15 is a schematic diagram of constructing a directional coordinate system based on sub-chips according to the present application;
FIG. 16 is a schematic diagram of a port of a chiplet in a chiplet system provided herein;
fig. 17 to 19 are schematic structural diagrams of a virtual device provided in the present application;
fig. 20 is a schematic structural diagram of a physical device provided in the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings.
Fig. 1 is a schematic structural diagram of a chip system according to an embodiment of the present disclosure. The chip system 110 includes a plurality of sub-chips (16 sub-chips are exemplarily shown in fig. 1) connected according to a predetermined topological connection relationship, for example, the 16 sub-chips in fig. 1 may be arranged in a matrix form, and then, a single sub-chip is respectively connected with two, three or four surrounding sub-chips.
Each sub-chip has a respective memory, and the memories of some of the sub-chips are exemplarily shown in fig. 1. The memory may be, for example, a Synchronous Dynamic Random Access Memory (SDRAM) or a Double Rate SDRAM (DDR SDRAM), which may be abbreviated as DDR. Each of the sub-chips in the chip system 100 has complete processing capability and can perform tasks independently. Of course, multiple sub-chips in the chip system 100 may cooperate with each other to perform large processing tasks.
Referring to fig. 2, fig. 2 schematically illustrates a structure of the sub-chip in the chip system 110. The structure of the sub-chip may be presented in the form of a network-on-chip (NoC). It can be seen that the chiplet can include a processing module, a routing module, a static memory, a memory controller, and four ports (d0, d1, d2, and d 3).
The processing module is a Control Unit (CU) in the sub-chip and is responsible for managing each processing flow in the sub-chip.
The routing module is responsible for data synchronization inside the sub-chips, data synchronization among the sub-chips, data broadcasting and data transmission. The routing module also comprises a control unit, and the control unit is used for being responsible for managing the routing process in the routing module. The routing module also includes a local buffer for temporarily storing the data to be processed. The routing module further includes a forwarding-port mapper (FPM), where the FPM may be a hardware module or a software module, and the FPM stores a port forwarding mapping table, where the port forwarding mapping table includes a mapping relationship between a destination sub-chip and a sending port, and may be used to map a data packet to a corresponding port for sending. The routing module stores a Stream In Table (SIT) and a Stream Out Table (SOT), which are used for data transmission between the sub-chips and will be described in detail later, and will not be described in detail here.
The static memory may be a Static Random Access Memory (SRAM) or the like, and is used for storing data in the sub-chip.
The memory controller is connected to the memory corresponding to the sub-chip, and the memory controller may be, for example, a DDR controller.
The four ports (d0, d1, d2 and d3) are network interfaces of the sub-chip, and can realize data transmission between the sub-chip and the sub-chip. The connections between the sub-chips and the sub-chips are realized through the four ports. Optionally, the sub-chip may also include two of the ports, such as sub-chip 0, sub-chip 3, sub-chip 12, and sub-chip 15 in fig. 1. Alternatively, the sub-chip may also include three ports, such as the sub-chip 4 in fig. 1.
The chip system provided by the embodiment of the present application is not limited to the structure shown in fig. 1, and may also be another structure, for example, see fig. 3. Fig. 3 schematically shows a chip system 120, where the chip system 120 also includes a plurality of sub-chips (8 sub-chips are exemplarily shown in fig. 2), and the plurality of sub-chips of the chip system 120 are connected in a rectangular parallelepiped arrangement, which can shorten the data transmission path between the sub-chips as much as possible. The structure of the sub-chips in the chip system 120 can be referred to the corresponding description in fig. 2, and is not described herein again.
In a possible implementation manner, for a very large task, more sub-chips are needed to be processed together to improve the task processing efficiency, and then, the above-mentioned system-on-chip 110 or system-on-chip 120 may be used as a subsystem of a chip, and a larger system-on-chip may be composed of a plurality of such subsystems, for example, see the system-on-chip 130 shown in fig. 4.
The system-on-chip 130 may include a plurality of subsystems of the chip, which is illustrated in fig. 4 as including 8 subsystems, and each subsystem of the plurality of subsystems may be the system-on-chip 110 or the system-on-chip 120. Each subsystem can be regarded as a whole, and then the plurality of subsystems can be connected through a preset topological connection relationship, for example, the subsystems can be connected in a rectangular parallelepiped arrangement, as shown in fig. 4. To facilitate understanding of the connection manner of each subsystem in the chip system 130, a schematic diagram of the connection structure of each subsystem in the chip system 130 is exemplarily illustrated by taking the subsystem as the chip system 110, and refer to fig. 5.
As can be seen in fig. 5, the chip system 130 includes 8 subsystems, each subsystem of the 8 subsystems includes 16 sub-chips, and the 16 sub-chips may be arranged and connected in a matrix. Between two adjacent subsystems, the connection between the two subsystems can be realized by connecting any one sub-chip in one subsystem with any one sub-chip in the other subsystem. The sub-chips arranged at the corners of the matrix in each sub-system are exemplarily shown in fig. 5 as sub-chips connected to another sub-system. For example, in the subsystem 0, the connection between the subsystem 0 and the subsystem 1 is established through the sub-chip 3 in the subsystem 0 and the sub-chip 0 in the subsystem 1. The connection between the subsystem 0 and the subsystem 2 is established through the sub-chip 12 in the subsystem 0 and the sub-chip 0 in the subsystem 2. The connection between the subsystem 0 and the subsystem 4 is established through the sub-chip 0 in the subsystem 0 and the sub-chip 0 in the subsystem 4.
In a possible implementation, a control bus may be further included between the subsystems, for example, see the chip system 140 shown in fig. 6. The chipset system 140 includes a central controller (the central controller may be a sub-chip or a control module in the chipset system 140, etc.), and the central controller may manage the task processing flow in the chipset system 140. The control bus is connected with the central controller and is used for receiving control instructions of the central controller by the subsystems. In a specific implementation, each subsystem may be connected to the control line through a sub-chip, and the sub-chip may be forwarded to a corresponding sub-chip in the same subsystem after receiving the control instruction. Alternatively, each chiplet in each subsystem is connected to the control lines for directly receiving control instructions. The specific connections of the controller are not limited in this application.
In addition to the above-mentioned system-on-chip 140, the system-on-chip provided by the embodiment of the present invention (e.g., the above-mentioned system-on-chip 110, the system-on-chip 120, the system-on-chip 130, etc.) also includes a central controller for managing the task processing flow of the whole system-on-chip. Similarly, the central controller may be a sub-chip or a control module in a chip system. The central controller can obtain the load conditions, the resource use conditions and the like of all the sub-chips in the chip system, so that tasks can be distributed to the sub-chips based on the conditions. For example, tasks may be allocated to the sub-chips by a task scheduler (scheduler) in the central controller based on information such as loading and resource usage of the sub-chips.
In the chip system provided by the embodiment of the application, each sub-chip has the capability of independently processing data. However, in the case of a large data task, the processing efficiency of a single sub-chip is low. In order to improve the processing efficiency of tasks, the plurality of sub-chips included in the chip system may be divided into a plurality of sub-chip sets, each sub-chip set including at least one sub-chip. Therefore, the data tasks can be processed by taking the sub-chip set as a processing unit, and the processing efficiency is improved. For ease of understanding of the chipset, reference may be made to FIG. 7.
Fig. 7 takes the above chip system 110 as an example, and divides 16 sub-chips in the chip system into 9 sub-chip groups, and specifically, referring to the division shown in fig. 7, each sub-chip group includes at least one sub-chip. In addition, the multiple sub-chips of the same sub-chipset may be adjacent sub-chips, such as sub-chipset 3, sub-chipset 4, sub-chipset 7, and sub-chipset 8. Alternatively, the sub-chips of the same sub-chip set may be non-adjacent sub-chips, such as sub-chip set 2, where the sub-chip set 2 is composed of non-adjacent sub-chips 1 and 12.
The chip system provided by the embodiment of the application can realize the processing of data tasks in a data parallel mode, a model parallel mode or a mode of the model parallel mode and the data parallel mode. Wherein:
data parallelism refers to dividing data to be processed into a plurality of data blocks, distributing the data blocks to different sub-chipsets respectively, and each chipset running the same processing program to process the distributed data. For example, assuming that data to be processed is divided into 3 data blocks, and 3 existing sub-chipsets can run the same processing program to process the data blocks, the 1 st data block of the 3 data blocks may be sent to the 1 st sub-chipset of the 3 sub-chipsets for processing, the 2 nd data block of the 3 data blocks may be sent to the 2 nd sub-chipset of the 3 sub-chipsets for processing, and the 3 rd data block of the 3 data blocks may be sent to the 3 rd sub-chipset of the 3 sub-chipsets for processing.
Model parallelism refers to a plurality of sub-chipsets collectively completing a data processing task, and each sub-chipset in the plurality of sub-chipsets only performs a partial step (the partial step may be one or more processing steps) of the entire data processing task. For example, assuming that a data processing task takes 3 steps to complete, two sub-chipsets may be configured to collectively complete the task. Wherein, the 1 st sub-chipset completes the processing of the first 2 steps in the 3 steps, and the 2 nd sub-chipset acquires the processed data from the 1 st sub-chipset to complete the processing of the 3 rd step. Alternatively, three sub-chipsets may be configured to collectively accomplish this task. The 1 st sub-chipset completes the processing of the 1 st step in the 3 steps, the 2 nd sub-chipset acquires the processed data from the 1 st sub-chipset to complete the processing of the 2 nd step, and the 3 rd sub-chipset acquires the processed data from the 2 nd sub-chipset to complete the processing of the 3 rd step. That is, each chipset may perform one or more steps, which may be specifically determined according to the load condition and resource usage condition of the chipset.
The model parallel and data parallel mode combines the data parallel mode and the model parallel mode to process data. For example, a data processing task requires 3 steps to complete, and then three sub-chipsets may be configured to collectively complete the task. The 1 st sub-chipset completes the processing of the 1 st step in the 3 steps, the 2 nd sub-chipset acquires the processed data from the 1 st sub-chipset to complete the processing of the 2 nd step, and the 3 rd sub-chipset acquires the processed data from the 2 nd sub-chipset to complete the processing of the 3 rd step. However, since the processing of step 1 is relatively complicated, it takes a relatively long time to complete the processing of step 1, and in order to improve the processing efficiency, one or more sub-chipsets may be reconfigured to collectively execute the processing task of step 1. For example, a 4 th sub-chipset may be reconfigured to perform the 1 st step processing task in conjunction with the 1 st sub-chipset described above. Specifically, the data for performing the step 1 processing may be divided into two parts, one part is sent to the 1 st sub-chipset for processing, and the other part is sent to the 4 th sub-chipset for processing. Then, the 1 st sub-chipset and the 4 th sub-chipset are sent to the 2 nd sub-chipset together to perform the processing of the 2 nd step.
It should be noted that, in the above model parallel and data parallel manner, each processing step may be processed in a data parallel processing manner, or a part of the processing steps may be processed in a data parallel processing manner, which may be determined specifically according to a specific implementation, and the present application does not limit this.
In a specific implementation, the data processing tasks may be distributed to the various sub-chipsets by a central controller of the chip system. The data task is processed by adopting a model parallel mode or a mode of the model parallel and the data parallel mode, and data transmission needs to be carried out between the sub-chips. Data transmission causes a delay to reduce processing efficiency. In order to realize efficient data transmission among sub-chips in a chip system and improve the processing performance of the chip system, the embodiment of the application provides a data transmission processing method in the chip system.
Referring to fig. 8, the data transmission processing method provided in the embodiment of the present application includes, but is not limited to, the following steps:
s801, a source sub-chip receives configuration parameters and configures a preset flow output table according to the configuration parameters; the stream output table comprises an identification of a data stream and an identification of a destination sub-chip of the data stream; the source sub-chip executes a first data processing task to obtain output data.
In a specific implementation, in the process of implementing a data task by using a model parallel manner or a model parallel and data parallel manner, the source sub-chip may be any one of sub-chips in a first sub-chip set, where the first sub-chip set is a sub-chip set used for executing a processing task of a first step in a target task in a chip system. The first step may include one or more processing steps of the target task. The system-on-chip may be the previously described system-on-chip 110, system-on-chip 120, system-on-chip 130, or system-on-chip 140, among others.
The flow output table is initialized and stored in the source sub-chip in advance and is used for the source sub-chip to transmit corresponding data. The stream output table is the basis for the source sub-chip to send data. Specifically, the stream output table includes an identifier of the data stream and an identifier of a destination sub-chip to which the data stream is forwarded. In a possible implementation manner, the stream output table may further include one or more items of an identification of a task to which data included in the data stream belongs, indication information of the number of data packets that have been sent from the data stream, and a start address in the source sub-chip for storing the data included in the data stream.
Specifically, with respect to a data stream, one sub-chip transmits data to another sub-chip, the data is encapsulated into a plurality of data packets, the data packets are numbered and transmitted in sequence, and the consecutively transmitted data packets form the data stream. In a possible implementation, each data packet may carry 1kb of data, and if the total size of the transmitted data is 64kb, the data may be split and encapsulated into 64 data packets for transmission, and the 64 data packets may form one data stream.
In a particular implementation, the stream output table may be initialized by a central controller of the system-on-chip. Specifically, as can be seen from the foregoing description, the central controller allocates each data processing task to each sub-chipset, and then the central controller may configure corresponding task identifiers for each data processing task to distinguish different tasks.
In addition, for one data processing task, the central controller configures the sub chip set to execute each processing step of the data processing task. Therefore, the central controller may know the flow direction of the data flow corresponding to the data processing task, and configure a data flow identifier for the corresponding data flow for distinguishing different data flows. For ease of understanding, the following is exemplified.
See, for example, fig. 9A. Assuming that a certain data processing task includes eight processing steps, the central controller may perform the allocation of the processing tasks based on the data size of the data processing task, the processing complexity of the eight steps, and the load and resource usage of each sub-chipset in the chip system. For example, as shown in fig. 9A, the central controller configures step 1 in which the data processing task is executed by the chipset 1, steps 2 and 3 in which the data processing task is executed by the chipset 2, step 4 in which the data processing task is executed by the chipset 3, steps 5, 6 and 7 in which the data processing task is executed by the chipset 4, and step 8 in which the data processing task is executed by the chipset 5. The sub-chip set 1 comprises a sub-chip 4, the sub-chip set 2 comprises a sub-chip 5, the sub-chip set 3 comprises a sub-chip 6 and a sub-chip 7, the sub-chip set 4 comprises a sub-chip 8, and the sub-chip set 5 comprises a sub-chip 0 and a sub-chip 9.
In the above-mentioned fig. 9A, assuming that the data processed in step 1 is needed to perform step 2, step 5 and step 8, the sub-chipset 1 may send the data processed in step 1 to the sub-chipset 2, the sub-chipset 4 and the sub-chipset 5, respectively. Since the data sent by the sub-chipset 1 to the sub-chipset 2, the sub-chipset 4 and the sub-chipset 5 are the same, the data flow from the sub-chipset 1 to the sub-chipset 2, the data flow from the sub-chipset 1 to the sub-chipset 4 and the data flow from the sub-chipset 1 to the sub-chipset 5 have the same identification, for example, 17. In addition, the data obtained by the sub-chipset 2 completing the processing of step 2 and step 3 is sent to the sub-chipset 3 for the processing of step 4, and the identification of the data flow from the sub-chipset 2 to the sub-chipset 3 may be 18. The data obtained by the sub-chipset 3 completing the processing of step 4 is sent to the sub-chipset 5 for the processing of step 8, and the identification of the data flow from the sub-chipset 3 to the sub-chipset 5 may be 19. The data obtained by the sub-chipset 4 completing the processing of step 5, step 6 and step 7 is sent to the sub-chipset 5 for the processing of step 8, and the identification of the data flow from the sub-chipset 4 to the sub-chipset 5 may be 20.
In a specific implementation, data transmission between the sub-chipsets is actually data transmission between the sub-chips in the sub-chipsets, for example, the sub-chipset 1 sends the data processed in step 1 to the sub-chipset 5, and actually, the sub-chip 4 in the sub-chipset 1 sends the data processed in step 1 to the sub-chip 0 and the sub-chip 9 in the sub-chipset 5. Other similar reasons will not be described in detail.
In another possible embodiment, referring to fig. 9B, it is assumed that the data processed in step 1 is needed for performing step 2, step 5, and step 8, but the needed data is different or not identical. For example, it is assumed that data 1 in the data processed in step 1 is required for executing step 2, data 2 in the data processed in step 1 is required for executing step 5, and data 3 in the data processed in step 1 is required for executing step 8. Then the sub-chipset may send the data 1 to the sub-chipset 2, the data 2 to the sub-chipset 4, and the data 3 to the sub-chipset 5. Since the data sent to the three sub-chipsets, sub-chipset 2, sub-chipset 4 and sub-chipset 5, are different, the identification of their corresponding data streams is also different. For example, the identification of the data flow from sub-chipset 1 to sub-chipset 2 may be 15, the identification of the data flow from sub-chipset 1 to sub-chipset 4 may be 16, and the identification of the data flow from sub-chipset 1 to sub-chipset 5 may be 17. The identification of the data flow between other sub-chipsets can be referred to the description of fig. 9A, and is not described herein again.
In a specific implementation, data transmission between the sub-chip processing steps in the sub-chip set does not need to configure an identifier of a data stream, for example, in the sub-chip set 2, the data obtained after the processing in step 2 is sent to the corresponding module in the sub-chip 5 to perform the processing in step 3, and the data transmission in the sub-chip 5 does not need to configure an identifier of a data stream. The data transmission in the sub-chip 8 in the sub-chip group 4 is the same without configuring the identification of the data stream.
Based on the above description, the central controller can know the flow direction of the corresponding data stream, i.e. can know the identification of the destination sub-chip of the data stream. The central controller may store the identifier of the data processing task, the identifier of the data stream corresponding to the data processing task, the identifier of the source sub-chip of the data stream, the identifier of the destination sub-chip of the data stream, and the information of the step of the corresponding processing of the source sub-chip in association with each other. The association storage may be in the form of a table, which may be referred to as a data Stream Table (ST). To facilitate understanding of the data flow table, see table 1, for example.
TABLE 1
Figure BDA0003440847500000101
Figure BDA0003440847500000111
Exemplarily, based on the above fig. 9B, the above table 1 exemplarily shows information associated with step 1 of the data processing task, and other information associated with steps are the same, and are not repeated here. As shown in table 1, the data stream table includes an identifier (TID) of a task, an identifier (source mask, S _ mask) of a source sub-chip, an identifier (D _ mask) of a destination sub-chip, an identifier (stream identifier, SID) of a destination sub-chip, and an identifier (TID) of a data stream.
The identification of the task refers to the identification of the corresponding data processing task. The task identifier may be, for example, 1 or another identifier, which is not limited in this application.
The identification of the source sub-chip and the identification of the sub-chip executed by the source sub-chip indicate the identification of the sub-chip executing the step 1. The identifier of the destination sub-chip and the steps executed by the destination sub-chip indicate the destination to which the data processed in step 1 goes and the corresponding processing steps executed by the destination. And the identifier of the data stream indicates the source sub-chip to send the data processed in the step 1 to the identifier of the data stream formed by each corresponding destination sub-chip. Specific identifiers are shown in table 1, but the identifiers shown in table 1 are only examples, and other identifier symbols may be used instead, and in a program executed by a computer, the identifiers may be represented in binary or hexadecimal, and the like, and the present application does not limit the representation of each type of identifier.
It should be noted that, in the embodiment of the present application, the source sub-chip and the destination sub-chip are for data streams, and the source sub-chip and the destination sub-chip corresponding to different data streams may be different.
In addition, optionally, the content included in the data stream table is not limited to the content shown in table 1. In a specific implementation, the content included in the data flow table may be part or all of the contents shown in table 1. Or may also include other contents, such as other steps executed by the destination sub-chip (for example, the sub-chip 8 processes step 6 and step 7 in addition to step 5, and then the data flow table may include the information of step 6 and step 7, etc.).
Based on the above description, the central controller stores the data flow table, and then the central controller may initialize the flow output table in the source sub-chip based on the data flow table. Specifically, the central controller may find the associated information corresponding to the source sub-chip in the data flow table, that is, find the corresponding information such as the task identifier, the identifier of the destination sub-chip, and the identifier of the data flow. The association information is then sent to the source chiplet. And after the source sub-chip receives the association information, filling the information into a stream output table. One or more items of the associated information corresponding to the source sub-chip in the data flow table are the configuration parameters.
In a possible implementation manner, based on the foregoing description, the stream output table in the source sub-chip may further include information indicating the number of packets that have sent the data stream and a start address for storing data included in the data stream in the source sub-chip. These two items may be that the source chiplet populates the stream output table based on its own information.
For example, the source chiplet receives a data processing task (the data processing task may be the first data processing task in S801) allocated by the central controller, and then, after completing the preset processing steps, obtains processed data (the data is the output data obtained by executing the first data processing task in S801). And storing the processed data in the memory buffer of the source sub-chip. Then, the source chiplet can know the size of the data and the starting address of the storage after the processing. The processed data is the data to be transmitted, and the size of the transmitted data packet is preset, so that the number of the data packets to be transmitted can be obtained by knowing the size of the processed data. Therefore, the source sub-chip can fill and write the number of the data packets to be sent and the storage initial address of the data to be sent into the stream output table. At this point, the initialization of the stream output table in the source sub-chip is completed.
The first data processing task received by the source sub-chip may be specific task data and/or a task execution instruction, and the source sub-chip may cache the task in a preset storage space after receiving the first data processing task, so as to be executed subsequently.
For ease of understanding the stream output table described above, reference may be made to table 2 for example.
TABLE 2
Figure BDA0003440847500000121
The information corresponding to task 1 in the stream output table is exemplarily shown in table 2 above. It can be seen that the TID, SID and D _ mask corresponding to task 1 in table 2 are obtained from table 1 above by the central controller and sent to the source sub-chip, and therefore are the same as table 1 above. In addition, the start address (S _ address) in table 2 refers to the start address for storing data included in the data stream to be transmitted in the source sub-chip. The packet count (C _ packet) in table 2 indicates information indicating the number of packets that have been transmitted for the data stream. The packet count may be a countdown, for example, the number of all packets included in the data stream in the packet count at initialization, and then the packet count is decreased by one every time the source chiplet sends one packet.
In addition, the flow output table may include information corresponding to a plurality of tasks, which is distinguished by the identifiers of the tasks, and the identifiers of the data flows of different tasks may be the same or different, which is specifically determined according to the data flow table of the central controller.
Alternatively, the source chiplet can, for example, complete initialization of the flow output table prior to executing the first data processing task assigned by the central controller. Specifically, the source chiplet can initialize the number of data packets of a transmitted data stream to zero in the stream output table and initialize a start address for storing data included in the data stream to a start address of a designated memory space. Then, the data after the data processing task assigned by the central controller executed by the source sub-chip is finished can be stored in the designated storage space. And, in the process of transmitting the processed data by the source sub-chip, adding one to the number of data packets of the transmitted data stream in the stream output table every time a data packet is transmitted.
S802, the source sub-chip generates a plurality of data packets from the output data based on the stream output table, wherein the data packets include the identifier of the data stream and the identifier of the destination sub-chip.
After the source chiplet initializes the flow output table and executes the first data processing task to obtain output data, a data packet corresponding to the data flow can be generated based on the flow output table. The type (type) of the packet, the identification of the task, the identification of the data stream, the identification of the destination sub-chip, the packet number, and the data may be included in the packet. The identification of the destination chiplet in the data packet can be the identification of one or more destination chiplets. And if the destination sub-chip corresponding to the data in the data packet is one, the identifier of the destination sub-chip in the data packet is the identifier of the destination sub-chip. If the destination sub-chips corresponding to the data in the data packet are multiple, the identifiers of the destination sub-chips in the data packet are the identifiers of the multiple destination sub-chips. For example, in table 2 above, there are two destination sub-chips corresponding to the data stream 17, which are the sub-chip 0 and the sub-chip 9, respectively, and then the data packet in the data stream 17 includes the identifications of the sub-chip 0 and the sub-chip 9.
In a specific implementation, the source chiplet obtains a start address for storing data to be sent (i.e. the output data) based on the task identifier and the data stream identifier in the stream output table, and reads the data to be sent based on the start address to generate the data packet.
In a possible implementation manner, the generated data packet may further carry sideband information, and the sideband information may include one or more of an identifier of the task, an identifier of the data stream, or an identifier of the destination sub-chip. These side information may not be encapsulated within a data packet, but rather transmitted with the data packet. In a specific implementation, information included in a data packet can only be known by a routing module of a sub-chip, and other modules such as a port in the sub-chip do not sense the information. Therefore, in order to facilitate fast forwarding of the data packet, the data packet may be configured to carry the above-mentioned sideband information. For ease of understanding the format of the data packet and the format of the sideband information, reference may be made exemplarily to fig. 10. The format of the data packet and the corresponding sideband information shown in fig. 10 are only examples, and in a specific implementation, the data packet may further include other information, and the sideband information may also include more information, which is not limited in this application.
S803, the source chiplet sends the plurality of data packets to the destination chiplet.
After the source chiplet generates a data packet based on the flow output table, the data packet can be sent to the destination chiplet. Specifically, the source chiplet can send the packet based on the port forwarding mapping table. Based on the above description about the chip system, the routing module of the source sub-chip further includes a port forwarding mapping module FPM, where the FPM stores a port forwarding mapping table, and the port forwarding mapping table includes the identifier of the destination sub-chip and the mapping relationship of the sending port.
Specifically, the source chiplet can query the port forwarding mapping table based on the identifier of the destination chiplet to which the data packet is sent, can query the corresponding sending port, and then sends the data packet out of the sending port. See fig. 11 for ease of understanding. In fig. 11, it is assumed that the sub-chip 0 is a source sub-chip and the sub-chip 1 is a destination sub-chip. The port forwarding mapping table in the sub-chip 0 stores the association relationship between the identifier of the sub-chip 1 and the port d1 of the sub-chip 0. Then, when the sub-chip 0 has a packet to send to the sub-chip 1, the sub-chip 0 queries the port forwarding mapping table based on the identifier of the sub-chip 1, and knows that the sending port of the packet is d1, and then the sub-chip 0 sends the packet from the port d 1.
And S804, the destination sub-chip receives the data packet.
After the source sub-chip sends out the data packet, the data packet is transmitted through one path to the destination sub-chip. The destination chiplet receives the data packet. For example, taking the above-mentioned fig. 11 as an example, the sub-chip 0 sends a packet from the port d1, the packet reaches the destination sub-chip through the port d3 of the sub-chip 1, and the sub-chip 1 receives the packet through the port d 3.
S805, the destination sub-chip judges whether a flow input table contains a data flow identifier in the received data packet, wherein the flow input table comprises at least one data flow identifier to be received; when the stream entry table contains the data stream identification in the received packet, the data in the received packet is stored.
In a specific implementation, after the destination sub-chip receives the data packet destined to itself, the data in the data packet may be stored in a buffer for subsequent processing. Specifically, the destination chiplet can store the data in the data packet based on the flow entry table. The stream input table is the basis for the destination sub-chip to receive data.
The flow entry table may include an identification of the task and an identification of the data flow. In a particular implementation, the flow entry table may be initialized by a central controller of the system-on-chip. As can be seen from the foregoing description, the data flow table is stored in the central controller. The central controller may query the data flow table based on the identifier of the destination sub-chip, obtain information associated with the destination sub-chip, and send the information to the destination sub-chip. The information associated with the destination sub-chip includes information such as an identifier of a corresponding task and an identifier of a data stream. After receiving the information, the destination sub-chip can write the information into its own stream input table. One or more items of the associated information corresponding to the destination sub-chip in the data flow table are configuration parameters for configuring the flow input table. The identifier of the data stream in the stream input table is the identifier of the data stream to be received by the destination chiplet, that is, only if the data stream identifier in the data packet belongs to the data stream identifier in the stream input table, the destination chiplet obtains the data in the data packet to store for subsequent processing.
The stream input table may further include information indicating the number of packets of the received data stream and information such as a start address for storing data included in the data stream in the destination sub-chip. The information may be that the destination chiplet populates the stream entry table based on its own information. For example, the destination sub-chip may configure a designated memory space for the data stream to be received, and then initialize the start address of the designated memory space into the stream input table. And for example, the destination sub-chip may initialize the number of packets of the received data stream in the stream input table to zero, and then, in the process of receiving the packets of the data stream by the destination sub-chip, each time a packet is received, add one to the number of packets of the corresponding received data stream in the stream input table. Or, for example, the destination sub-chip may learn, from the source sub-chip or the controller, the total number of packets included in the data stream to be received, and initialize the number of packets of the corresponding received data stream in the stream input table to the total number. Then, in the process of receiving the data packet of the data stream by the destination sub-chip, the number of the data packets of the received data stream corresponding to the stream input table is reduced by one every time one data packet is received.
To facilitate understanding of the flow entry table, see table 3 for example.
TABLE 3
Figure BDA0003440847500000141
The information corresponding to task 1 in the stream input table is exemplarily shown in table 3 above. It can be seen that the TID and SID corresponding to task 1 in Table 3 are obtained from Table 1 by the central controller and sent to the source sub-chip, and thus are the same as those in Table 1. In addition, the start address (S _ address) in table 3 refers to a start address for storing data included in the data stream 15 in the destination sub-chip. The packet count (C _ packet) in table 3 indicates the number of packets of the received data stream 15. The packet count may be a countdown, e.g., the number of all packets included in the data stream in the packet count at initialization, and then the packet count is decremented by one every time the destination chiplet receives one packet.
In addition, the flow input table may include information corresponding to a plurality of tasks, the information is distinguished by the identifiers of the tasks, and the identifiers of the data flows of different tasks may be the same or different, which is specifically determined according to the data flow table of the central controller.
In another possible embodiment, the destination sub-chip may initialize the stream input table by receiving a header packet from the source sub-chip. The header packet may include a type of packet, an identification of the task, an identification of the data stream, an identification of the destination sub-chip, a total number of packets included in the data stream, and a start address for storing data of the data stream in the destination sub-chip. Optionally, the header packet may also carry sideband information, and the content of the sideband information may be the same as that of the sideband information, which is not described herein again. To facilitate understanding of the format of the header packet, reference may be made to fig. 12 as an example.
In fig. 12, N _ PK represents the total number of data packets included in the data stream, Address represents the start Address for storing the data of the data stream in the destination sub-chip, and other identifiers refer to the foregoing description, and are not described herein again. The format of the data packet and the corresponding sideband information shown in fig. 12 are only examples, and in a specific implementation, the data packet may further include other information, and the sideband information may also include more information, which is not limited in this application.
In one possible embodiment, after the stream input table of the destination sub-chip is initialized, it indicates that the destination sub-chip is ready to receive the corresponding data stream. Thus, the destination chiplet can send to the source chiplet a unblock data packet indicating to the source chiplet that the destination chiplet is ready to receive the corresponding data stream.
The unblocking data packet may include a data packet type, an identification of a task, an identification of a data stream, and an identification of a sub-chip that receives the data packet. The sub-chip receiving the data packet is the source sub-chip. Optionally, the unblocking data packet may also carry sideband information, and the content of the sideband information may be the same as the content of the sideband information, which is not described herein again.
Based on the above description, after the destination sub-chip initializes the flow input table, for the received data packet sent by the source sub-chip, the data flow identifier in the data packet may be obtained first and compared with the data flow identifier in the flow input table of the destination sub-chip. If the flow input table includes the data flow identifier in the received data packet, the flow input table may be queried based on the task identifier and the data flow identifier in the data packet to obtain a start address of corresponding stored data, and then the corresponding storage address may be calculated based on the start address, and the data of the data packet may be stored in the corresponding storage address.
In one possible implementation, the central controller allocates a second data processing task to the destination sub-chip, where the second data processing task is executed based on data obtained after the execution of the first data processing task is completed. Then, the source sub-chip executes the first data processing task to obtain output data, and sends the output data to the destination sub-chip in a data stream form. The destination chiplet receives and stores the output data based on its own stream input table. The destination chiplet can then read the output data from the memory space based on the known memory address for performing its own second data processing task.
In a possible implementation manner, based on the related description of the chip system, the chip system may include a plurality of subsystems, and each subsystem is connected according to a preset topological connection relationship. If the source sub-chip and the destination sub-chip are not sub-chips in the same subsystem, the data packet transmitted between the two sub-chips also includes an identifier of the subsystem where the sub-chip receiving the data packet is located. For example, if the source sub-chip sends a data packet (e.g., a data packet included in the data stream) to the destination sub-chip, the data packet further includes an identifier of a subsystem where the destination sub-chip is located. If the destination sub-chip sends a data packet (for example, the unblocking data packet mentioned above) to the source sub-chip, the data packet further includes an identifier of the subsystem where the source sub-chip is located. Data transmission among the sub-chips of the cross-subsystem can be realized through the identification of the subsystem in the data packet, so that the processing of large-scale data tasks is facilitated, and the processing efficiency is improved.
To sum up, the embodiment of the present application can implement efficient data transmission in the chip system based on the initialized flow output table and the flow input table, and improve the data processing performance of the chip system.
In a possible implementation manner, in order to further improve the data transmission efficiency in the chip system, embodiments of the present application may provide a routing implementation manner, so that flexible scheduling and efficient transmission may be performed between sub-chips of the data chip system. The first sub-chip is described as an example.
Referring to fig. 13, a routing implementation manner provided by the embodiment of the present application includes, but is not limited to, the following steps:
s1301, the first sub chip obtains a first data packet; wherein, the first data packet comprises the identification of the destination sub-chip; the first sub-chip and the destination sub-chip are sub-chips included in a chip system, and a plurality of sub-chips included in the chip system are connected in a preset topological structure.
The system-on-chip may be the previously described system-on-chip 110, system-on-chip 120, system-on-chip 130, or system-on-chip 140, among others. The first chiplet can be any one of the chiplets in any one of the chiplets.
For example, the first sub-chip may be the source sub-chip described in fig. 13, and then the first sub-chip obtaining the first data packet may be the first sub-chip generating the first data packet. For a specific implementation of the generating of the first data packet, reference may be made to the corresponding description in step S802, which is not described herein again.
Or, for example, the first sub-chip may be a sub-chip in a path of a data packet transmitted from a source sub-chip to a destination sub-chip, and then, the first sub-chip acquiring the first data packet may be the first sub-chip receiving the first data packet.
For the convenience of the following description, the chip system in which the first sub-chip is located is the first chip system. The first sub-chip receives the first data packet from another sub-chip in the first chip system.
In a specific implementation, the first data packet may include one or more of information such as a type (type) of the packet (packet), an identifier of the task, an identifier of the data stream, an identifier of the destination sub-chip, a packet number, and data. In a possible implementation manner, the first data packet may further carry sideband information, and the sideband information may include one or more of an identifier of the task, an identifier of the data stream, or an identifier of the destination sub-chip. For the content included in the first data packet, reference may be made to the description about the data packet in step S802, which is not described herein again.
S1302, the first sub-chip sends the data in the first data packet based on the data transmission condition between the sub-chips in the chip system.
In a specific implementation, the data transmission between the sub-chips in the chip system includes multiple cases, and several possible implementations are exemplarily described below.
In a first possible implementation manner, the first sub-chip may send the data in the first data packet based on a congestion condition of its port.
In particular, as can be seen from the above description of the chip system, each sub-chip includes a plurality of ports for communicating with other sub-chips. Each port is configured with a corresponding transmission buffer, and the transmission buffer is used for storing data to be transmitted.
And after the first sub-chip receives the first data packet, analyzing the first data packet to acquire the identifier of the target sub-chip in the first data packet. If the identification of the destination sub-chip indicates that the first sub-chip is the destination sub-chip, the first sub-chip extracts the data in the first data packet for storage for subsequent processing. Otherwise, the first sub-chip uses the identifier of the destination sub-chip as an index, and searches the forwarding mapping table of the first sub-chip for the sending port of the first data packet. For introduction of the forwarding mapping table, reference may be made to the corresponding description in the foregoing description about fig. 2, and details are not described here. If the found sending port includes a plurality of sending ports, the specific sending port can be determined based on the congestion condition of the sending buffers of the plurality of sending ports. Specifically, in order to improve the data transmission efficiency, the port with the least amount of data to be transmitted in the transmission buffer of the multiple transmission ports may be selected to transmit the first data packet.
In a possible implementation manner, if the first data packet includes identifiers of a plurality of destination sub-chips, and the first sub-chip is one of the destination sub-chips, the first sub-chip extracts data stored in the first data packet for subsequent processing. And the first sub-chip searches the sending port of the data of the first data packet in the forwarding mapping table of the first sub-chip by taking the identifier of the remaining destination sub-chip as an index.
If the remaining destination sub-chip has one identifier, then, in the same way, after the corresponding sending port is found, the port with the least amount of data to be sent in the sending buffer area of the sending port is selected to send the data in the first data packet. Specifically, the data is repackaged into a data packet for transmission, and the identifier of the destination chiplet in the repackaged data packet no longer includes the identifier of the first chiplet, but only includes the identifiers of the remaining destination chiplets.
If there are more than one identifiers of the remaining destination sub-chips, the first sub-chip searches the corresponding sending ports in its own forwarding mapping table. If the found sending ports are the same, the data included in the first data packet may be copied to regenerate one data packet, and the newly generated data packet includes the identifiers of the remaining destination sub-chips. And transmitting the newly generated data packet from the searched same transmission port. Similarly, the sending port may be a port with the least amount of data to be sent in the sending buffer in the found sending ports.
Alternatively, if there are more than one labels for the remaining target sub-chips, two labels are taken as an example, and the remaining target sub-chips are assumed to be the sub-chip a and the sub-chip B. The first sub-chip searches the sending port mapped by the identifier of the sub-chip A and the sending port mapped by the identifier of the sub-chip B in the forwarding mapping table of the first sub-chip. Assuming that the found transmission ports are different, the first sub-chip may regenerate two packets: packet a and packet B. The two data packets each include data included in the first data packet, where the identifier of the destination sub-chip included in the data packet a is the identifier of the sub-chip a, and the identifier of the destination sub-chip included in the data packet B is the identifier of the sub-chip B. And then, the data packet A and the data packet B are sent through the respectively searched sending ports. Similarly, the sending port may be a port with the least amount of data to be sent in the sending buffer in the found sending ports.
In a second possible implementation manner, the first sub-chip may send the data in the first data packet to a destination sub-chip based on a minimum bandwidth consumption principle. The minimum bandwidth consumption principle refers to the principle of delivering data to the destination sub-chip with the minimum transmission bandwidth.
To facilitate understanding of the present implementation, a directional coordinate system constructed with the first sub-chip as a center is first introduced. Fig. 14 exemplarily shows a schematic diagram of a direction coordinate system constructed centering on the first sub-chip. It can be seen that the directional coordinate system comprises four directional axes: a first directional axis, a second directional axis, a third directional axis, and a fourth directional axis. The four directional axes all diverge outwardly centered on the first sub-chip. The first direction axis and the second direction axis are collinear and opposite in direction; the third direction axis and the fourth direction axis are collinear and opposite in direction. The directional coordinate system also includes four regions: a first region, a second region, a third region, and a fourth region. Wherein the first region is bounded by the first directional axis and the third directional axis; the second region is bounded by the second directional axis and the third directional axis; the third region is bounded by the second directional axis and the fourth directional axis, and the fourth region is bounded by the first directional axis and the fourth directional axis.
In the chip system, the row where the first sub-chip is located on at least one of the first direction axis and the second direction axis, and the column where the first sub-chip is located on at least one of the third direction axis and the fourth direction axis. See fig. 15 for an exemplary illustration. Assuming that the sub-chip 5 in the chip system is the first sub-chip, a direction coordinate system is established with the sub-chip 5 as the center. In the directional coordinate system, the second row of the sub-chip 5 is located on the first directional axis and the second directional axis, and the second column of the sub-chip 5 is located on the third directional axis and the fourth directional axis. Then, the sub-chip 2 and the sub-chip 3 are located in the first area of the directional coordinate system. The sub-chip 0 is located in a second region of the directional coordinate system. The chiplet 8 and chiplet 12 are located in a third area of the directional coordinate system. The sub-chip 10, the sub-chip 11, the sub-chip 14 and the sub-chip 15 are located in a fourth area of the directional coordinate system.
In a possible embodiment, if the first sub-chip of the sub-chip 0 in the chip system of fig. 15 is described above, a direction coordinate system is established with the sub-chip 0 as the center. In the directional coordinate system, the first row of the sub-chip 0 is located on a first directional axis, and the first column of the sub-chip 0 is located on a fourth directional axis. Then, the remaining sub-chips except the sub-chip of the row where the sub-chip 0 is located and the sub-chip of the column where the sub-chip is located are located in the fourth area of the directional coordinate system.
In a possible implementation, the minimum bandwidth consumption rule includes: when the destination sub-chip included in the first data packet is on the destination direction axis, the first sub-chip transmits the data of the first data packet along the direction of the destination direction axis; the target direction axis is the first direction axis, the second direction axis, the third direction axis or the fourth direction axis. For ease of understanding, the description is made by way of example with reference to fig. 15 above.
In fig. 15, it is assumed that the sub-chip 5 is the first sub-chip, and receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 7. If the data packet only includes the identification of a destination sub-chip, the sub-chip 5 sends the data packet along the first direction axis because the sub-chip 7 is located on the first direction axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 6, and then the sub-chip 6 forwards the data packet to the sub-chip 7. If the data packet includes the identifiers of the plurality of destination sub-chips, the sub-chip 7 as one of the destination sub-chips is located on the first direction axis. Therefore, the sub-chip 5 copies a copy of the data in the data packet to newly generate a data packet, and sends the new data packet along the direction of the first direction axis. That is, the sub-chip 5 first sends the new data packet to the sub-chip 6, and then the sub-chip 6 forwards the new data packet to the sub-chip 7. The newly generated data packet comprises an identification of the sub-chip 7.
In a possible implementation manner, the first data packet includes identifications of the first destination sub-chip and the second destination sub-chip. The minimum bandwidth consumption principle further includes: and in a direction coordinate system established by taking the first sub-chip as a center, under the condition that the first target sub-chip and the second target sub-chip are respectively positioned in two adjacent areas of a first area, a second area, a third area and a fourth area of the coordinate system, the first sub-chip sends a second data packet along the direction of the common direction axis. The second data packet includes the data, the identity of the first destination chiplet and the second destination chiplet. The common direction axis is a direction axis of a common boundary of the two adjacent regions. For ease of understanding, the description is made by way of example with reference to fig. 15 above.
In fig. 15, it is assumed that the sub-chip 5 is the first sub-chip, and receives a data packet, and the identification of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 8 and the sub-chip 14. The sub-chip 8 is located in a third region, the sub-chip 14 is located in a fourth region, the two regions are adjacent regions, and a common boundary is a fourth direction axis. Therefore, the sub-chip 5 transmits the data packet in the direction of the fourth direction axis. That is, the sub-chip 5 first sends the data packet to the sub-chip 9, and then the sub-chip 9 further forwards the data packet. Specifically, the sub-chip 9 may also be regarded as the first sub-chip, a direction coordinate system is established with the sub-chip 9 as a center, and then data is forwarded based on the minimum bandwidth consumption principle.
In a possible implementation manner, the first data packet includes identifications of the first destination sub-chip and the second destination sub-chip. The minimum bandwidth consumption principle further includes: in a direction coordinate system established by taking the first sub-chip as a center, when the first target sub-chip is in a first area of the coordinate system and the second target sub-chip is in a third area, the first sub-chip sends a third data packet along the direction of one of the direction axes of the two boundaries of the first area, and sends a fourth data packet along the direction of one of the direction axes of the two boundaries of the third area. The third data packet includes the data and an identification of the first destination chiplet. The fourth data packet includes the data and an identification of the second destination chiplet. For ease of understanding, the description is made by way of example with reference to fig. 15 above.
In fig. 15, it is assumed that the sub-chip 5 is the first sub-chip, and receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 2 and the sub-chip 12. The sub-chip 2 is located in the first area and the sub-chip 12 is located in the third area. Then, the sub-chip 5 may regenerate two data packets based on the data in the received data packet: packet a and packet B. The data packet a includes data and the identifier of the sub-chip 2, and the data packet B includes data and the identifier of the sub-chip 12. Then, the packet a is transmitted in the direction of the first directional axis or the third directional axis. For example, the data packet a is sent along the first direction axis, that is, the data packet a is sent to the sub-chip 6, and then the sub-chip 6 forwards the data packet a to the sub-chip 2. The sub-chip 5 additionally transmits the data packet B in the direction of the second directional axis or the fourth directional axis. For example, the data packet B is sent along the fourth direction axis, that is, the data packet B is sent to the sub-chip 9 first, and then is forwarded by the sub-chip 9.
In a possible implementation manner, the first data packet includes identifications of the first destination sub-chip and the second destination sub-chip. The minimum bandwidth consumption principle further includes: in a direction coordinate system established by taking the first sub-chip as a center, when the first target sub-chip is located in a second area of the coordinate system and the second target sub-chip is located in the fourth area, the first sub-chip sends a third data packet along the direction of one of the direction axes of two boundaries of the second area, and sends a fourth data packet along the direction of one of the direction axes of two boundaries of the fourth area. The third data packet includes the data and an identification of the first destination chiplet. The fourth data packet includes the data and an identification of the second destination chiplet. For ease of understanding, the description is made by way of example with reference to fig. 15 above.
In fig. 15, it is assumed that the sub-chip 5 is the first sub-chip, and receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 0 and the sub-chip 10. The sub-chip 0 is located in the second area, and the sub-chip 10 is located in the fourth area. Then, the sub-chip 5 may regenerate two data packets based on the data in the received data packet: packet C and packet D. The data packet C includes data and the identifier of the sub-chip 0, and the data packet D includes data and the identifier of the sub-chip 10. Then, the packet C is transmitted in the direction of the second directional axis or the third directional axis. For example, the data packet C is sent along the direction of the second direction axis, that is, the data packet C is sent to the sub-chip 4 first, and then the sub-chip 4 forwards the data packet C to the sub-chip 0. The sub-chip 5 additionally transmits the data packet D in the direction of the first directional axis or the fourth directional axis. For example, the data packet D is sent along the first direction axis, that is, the data packet D is sent to the sub-chip 6 and then forwarded to the sub-chip 10 by the sub-chip 6.
In a possible implementation manner, the first data packet includes identifications of the first destination sub-chip and the second destination sub-chip. The minimum bandwidth consumption principle further includes: and in a direction coordinate system established by taking the first sub-chip as a center, if the first target sub-chip is in a target area and the second target sub-chip is on a direction axis of the boundary of the target area, the first sub-chip sends a fifth data packet along the direction of the direction axis of the boundary of the target area. The fifth data packet includes the data and the identification of the first destination chiplet and the second destination chiplet. The target area is a first area, a second area, a third area, or a fourth area. For ease of understanding, the description is made by way of example with reference to fig. 15 above.
In fig. 15, it is assumed that the sub-chip 5 is the first sub-chip, and receives a data packet, and the identifier of the destination sub-chip in the data packet indicates that the destination sub-chip is the sub-chip 14 and the sub-chip 9. The sub-chip 14 is located in the fourth area and the sub-chip 9 is located on the fourth axis. The fourth direction axis is the boundary direction axis of the fourth area, and the sub-chip 5 may transmit the received data packet along the fourth direction axis, i.e. to the sub-chip 9. The sub-chip 9 receives the data packet and stores the data in the data packet. And a copy of the data is copied to regenerate a data packet. The new data packet includes the identifier of the sub-chip 14, and then the new data packet is sent to the sub-chip 13 or the sub-chip 10, and then forwarded to the sub-chip 14 by the sub-chip 13 or the sub-chip 10.
In a third possible implementation manner, the first sub-chip may receive scheduling information from a central controller in the chip system, and correspondingly transmit data based on the scheduling information.
Specifically, as can be seen from the foregoing description of the chip system, the central controller of the chip system may also be responsible for data scheduling in the chip system. Specifically, the central controller obtains the data transmission conditions of each sub-chip through the control bus, and can obtain the congestion conditions of each transmission path and/or the port congestion conditions of each sub-chip through analyzing the data transmission conditions, so that a data transmission strategy can be formulated based on the conditions, and the data transmission strategy is issued to each sub-chip in the form of scheduling information. Each sub-chip correspondingly sends data based on the scheduling information issued by the controller, so that the probability of congestion is reduced, and the data transmission efficiency is improved.
For example, the scheduling information sent by the central controller to one of the sub-chips may include an identifier of one or more of the sub-chips and an identifier of a port corresponding to the one or more of the sub-chips. And after receiving the scheduling information, the sub-chip updates the scheduling information into a forwarding mapping table of the sub-chip for subsequent data forwarding. For ease of understanding, reference may be made to fig. 16 for example.
Fig. 16 exemplarily shows a structural diagram of a chip system, and assuming that the sub-chip 0 serves as a central controller of the chip system, the sub-chip 0 can communicate with other sub-chips in the chip system through a control bus (not shown in fig. 16). Specifically, the sub-chip 0 may collect information such as the amount of data to be sent of the ports in each sub-chip through the control bus, and may analyze the congestion condition of each port based on the information, so as to analyze the congestion condition of each transmission path. Based on the conditions obtained by the analysis, the sub-chip 0 can comprehensively formulate a data transmission strategy in the chip system and correspondingly issue the data transmission strategy to each sub-chip in a scheduling information mode.
For example, for data transferred from sub-chip 1 to sub-chip 6, sub-chip 0 is analyzed to see that port d2 of sub-chip 1 is less idle, and port d1 of sub-chip 5 is also less idle. Then, sub-chip 0 sends to sub-chip 1 a schedule message that includes the identity of sub-chip 6 and the identity of port d 2. After receiving the scheduling information, the chiplet 1 updates the information that the destination chiplet is the chiplet 6 and the corresponding sending port is the port d2 into its own forwarding mapping table. In addition, the sub-chip 0 sends a scheduling information to the sub-chip 5, which includes the identification of the sub-chip 6 and the identification of the port d 1. After receiving the scheduling information, the chiplet 5 updates the information that the destination chiplet is the chiplet 6 and the corresponding sending port is the port d1 into its own forwarding mapping table. Then, when the sub-chip 1 receives a packet addressed to the sub-chip 6, it queries its forwarding mapping table to know that its sending port is d2, and therefore, sends the packet out from the port d 2. After the data packet arrives at the sub-chip 5, the sub-chip 5 queries its forwarding mapping table to know that its sending port is d1, so that the data packet is sent out from the port d1 and sent to the sub-chip 6.
To sum up, the embodiment of the present application transmits the received data based on the data transmission condition in the chip system, so that the sending of the data can be flexibly scheduled, the data transmission efficiency is improved, and the processing performance of the chip system is further improved.
The foregoing mainly introduces a data transmission processing method in a chip system provided in this embodiment. It is understood that each device comprises corresponding hardware structures and/or software modules for executing each function in order to realize the corresponding function. The elements and steps of the various examples described in connection with the embodiments disclosed herein may be embodied as hardware or a combination of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the device may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that the division of the modules in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
In the case of dividing each functional module according to each function, fig. 17 shows a specific logical structure diagram of the apparatus, which may be the source chiplet. The apparatus 1700 includes:
a receiving unit 1701 for receiving configuration parameters;
a configuration unit 1702, configured to configure a preset stream output table according to the foregoing configuration parameters; the stream output table comprises an identifier of a data stream and an identifier of a destination sub-chip of the data stream; the device 1700 and the target sub-chip are sub-chips of a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topological structure;
an execution unit 1703, configured to execute the first data processing task to obtain output data;
a generating unit 1704, configured to generate a plurality of data packets from the output data based on the stream output table, where the data packets include an identifier of the data stream and an identifier of the destination chiplet;
a transmitting unit 1705, configured to transmit the plurality of data packets.
In a possible embodiment, the stream output table further includes one or more items of an identifier of the first data processing task, information indicating the number of packets of the data stream that have been sent, and a start address of the device 1700 for storing data included in the data stream; the data packet further includes an identification of the first data processing task.
In one possible embodiment, the receiving unit 1701 is further configured to receive a unblocking packet before the generating unit generates the output data into a plurality of packets based on the stream output table; the unblocking packet includes an identifier of the data stream, and the unblocking packet is used to indicate to the device 1700 that the destination chiplet is ready to receive the data stream.
In a possible implementation manner, the sending unit 1705 is specifically configured to: transmitting the plurality of data packets based on the port forwarding mapping table; the port forwarding mapping table includes a mapping relationship between the identifier of the destination sub-chip and the sending port.
In a possible implementation manner, when there are a plurality of destination sub-chips of the data stream, the data packet includes identifications of the plurality of destination sub-chips.
In a possible embodiment, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier; when the destination sub-chip is a sub-chip of the subsystem, the data packet further includes an identifier of the subsystem.
For specific operations and advantages of the units in the apparatus 1700 shown in fig. 17, reference may be made to the corresponding description in fig. 8 and possible method embodiments thereof, which are not described herein again.
In the case of dividing each functional module according to each function, fig. 18 shows a specific logical structure diagram of the apparatus, which may be the target sub-chip. The apparatus 1800 includes:
a receiving unit 1801, configured to receive configuration parameters;
a configuration unit 1802, configured to configure a preset flow input table according to the configuration parameter, where the flow input table includes at least one data flow identifier to be received; the apparatus 1800 is a sub-chip of a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topology structure;
a determining unit 1803, configured to determine, when a data packet is received, whether the flow input table includes a data flow identifier in the data packet;
a storage unit 1804, configured to store the data in the data packet when the data packet includes the data stream identifier to be received.
In a possible embodiment, the flow input table further includes one or more of an identifier of the first data processing task, information indicating the number of packets of the received data flow, and a start address of the device 1800 for storing data included in the data flow; the data packet further includes an identification of the first data processing task.
In one possible embodiment, the apparatus 1800 further includes a transmitting unit, configured to transmit an unblocking packet before the receiving unit 1801 receives the packet; wherein the unblocking packet includes an identifier of the data stream and an identifier of a source chiplet, and the unblocking packet is used to indicate to the source chiplet that the apparatus 1800 is ready to receive the data stream; the source sub-chip is a sub-chip for transmitting the data packet in the chip system.
In a possible embodiment, the chip system includes a subsystem, the subsystem includes at least two sub-chips, and the subsystem is configured with a subsystem identifier; the data packet further includes an identification of the subsystem.
For specific operations and benefits of each unit in the apparatus 1800 shown in fig. 18, reference may be made to the corresponding description in fig. 8 and its possible method embodiments, which are not described herein again.
In the case of dividing each function module according to each function, fig. 19 shows a specific logical structure diagram of the apparatus, which may be the above-mentioned central controller. The apparatus 1900 includes:
an allocating unit 1901, configured to allocate a first data processing task to a source chiplet, where data obtained after the first data processing task is executed is sent to a destination chiplet in a data stream form; the device 1900, the source sub-chip, and the destination sub-chip are sub-chips of a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topology structure;
a configuration unit 1902, configured to configure an identifier for the data stream;
a sending unit 1903, configured to send the identifier of the data stream and the identifier of the destination chiplet to the source chiplet; wherein, the identifier of the data stream and the identifier of the destination sub-chip are used for associating and storing the stream output table of the source sub-chip; the stream output table is the basis for the source sub-chip to send data.
In a possible implementation manner, the allocating unit 1901 is further configured to allocate a second data processing task to the destination sub-chip, where the second data processing task is executed based on data obtained after the execution of the first data processing task is completed;
the sending unit 1903 is further configured to send the identifier of the data stream to the destination sub-chip; the identifier of the data stream is used for being stored in a stream input table of the target sub-chip, and the stream input table is a basis for the target sub-chip to receive data.
In a possible embodiment, the apparatus 1900 further comprises:
the acquisition unit is used for acquiring the data transmission condition among the sub-chips in the chip system;
a generating unit, configured to generate scheduling information for the source sub-chip based on the data transmission condition, where the scheduling information indicates a sending port of the source sub-chip to which the data stream is sent to the destination sub-chip;
the sending unit 1903 is further configured to send the scheduling information to the source chiplet.
For specific operations and advantages of each unit in the apparatus 1900 shown in fig. 19, reference may be made to the corresponding descriptions in fig. 8 and fig. 13 and possible method embodiments thereof, which are not described herein again.
Fig. 20 is a schematic diagram illustrating a specific hardware structure of the apparatus provided in the present application. The apparatus 2000 comprises: a processor 2001, a memory 2002, and a communication port 2003. The processor 2001, communication port 2003, and memory 2002 may be interconnected or interconnected via a bus 2004.
Illustratively, the memory 2002 is used to store computer programs and data of the apparatus 2000, and the memory 2002 may include, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable read-only memory (CD-ROM), among others. Illustratively, the memory 2002 may be the static memory described above in FIG. 2.
The communication port 2003 includes a sending port and a receiving port, and the number of the communication ports 2003 may be plural, and is used for supporting the apparatus 2000 to perform communication, such as receiving or sending data or messages. Illustratively, the communication port 2003 may be the ports d0, d1, d2, and d3 shown in FIG. 2 above.
The processor 2001 may be, for example, a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. A processor may also be a combination of computing functions, e.g., a combination of one or more microprocessors, a digital signal processor and a microprocessor, or the like. Illustratively, the processor 2001 may be the processing module shown in fig. 2 described above.
In one possible embodiment, the apparatus 2000 is the source sub-chip in fig. 8 and its possible embodiments. The processor 2001 in the apparatus 2000 may be configured to read the program stored in the memory 2002, so that the apparatus 2000 performs the operations performed by the source chiplet as described above in fig. 8 and its specific embodiment.
In one possible embodiment, the apparatus 2000 is a target sub-chip in the above fig. 8 and its possible embodiments. The processor 2001 in the device 2000 may then be configured to read the program stored in the memory 2002, so that the device 2000 performs the operations performed by the destination chiplet as described above in fig. 8 and its specific embodiment.
In one possible embodiment, the device 2000 is a central controller in the above fig. 8 and its possible embodiments. The processor 2001 in the device 2000 may then be configured to read the program stored in the memory 2002 described above, such that the device 2000 performs the operations performed by the central controller as described above with respect to fig. 8 and its particular embodiment.
For specific operations and advantageous effects of each unit in the apparatus 2000 shown in fig. 20, reference may be made to the corresponding description in fig. 8 and the specific method embodiment thereof, which are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the operations performed by the source chiplet as described in fig. 8 and any of the possible method embodiments thereof.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform operations performed by the destination sub-chip as described in any embodiment of fig. 8 and its possible method embodiments.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the operations performed by the central controller in fig. 8 and any of the possible method embodiments thereof.
Embodiments of the present application further provide a computer program product, and when the computer program product is read and executed by a computer, the operations performed by the source chiplet in any embodiment of fig. 8 and its possible method embodiments are implemented.
Embodiments of the present application further provide a computer program product, and when the computer program product is read and executed by a computer, the operations performed by the destination sub-chip in any embodiment of fig. 8 and its possible method embodiments are implemented.
Embodiments of the present application also provide a computer program product, and when the computer program product is read and executed by a computer, the operations performed by the central controller in any of the embodiments of fig. 8 and its possible method embodiments are implemented.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (21)

1. A method for processing data transmission in a chip system, the method comprising:
the source sub-chip receives the configuration parameters and configures a preset flow output table according to the configuration parameters; the stream output table comprises an identifier of a data stream and an identifier of a destination sub-chip of the data stream; the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topological structure;
the source sub-chip executes a first data processing task to obtain output data;
and the source sub-chip generates a plurality of data packets from the output data based on the stream output table and sends the data packets, wherein the data packets comprise the identification of the data stream and the identification of the destination sub-chip.
2. The method according to claim 1, wherein the stream output table further comprises one or more of an identification of the first data processing task, information indicating the number of packets of the data stream that have been sent, and a start address in the source chiplet for storing data included in the data stream;
the data packet further comprises an identification of the first data processing task.
3. The method of claim 1, wherein before the source chiplet generating and sending the output data into a plurality of data packets based on the stream output table, further comprising:
the source sub-chip receives the unblocking data packet; the unblocking data packet includes an identifier of the data stream, and the unblocking data packet is used for indicating to the source sub-chip that the destination sub-chip is ready to receive the data stream.
4. The method of claim 1, wherein the source chiplet sending the plurality of data packets comprises:
the source sub-chip sends the plurality of data packets based on a port forwarding mapping table; the port forwarding mapping table includes a mapping relationship between the identifier of the destination sub-chip and a sending port.
5. The method of claim 1, wherein when the destination chiplet for the data stream is multiple, the data packet includes an identification of the multiple destination chiplets.
6. The method according to any one of claims 1-5, wherein the chip system comprises a subsystem, the subsystem comprising at least two sub-chips, the subsystem configured with a subsystem identification;
when the destination sub-chip is a sub-chip of the subsystem, the data packet further includes an identifier of the subsystem.
7. A method for processing data transmission in a chip system, the method comprising:
the target sub-chip receives configuration parameters and configures a preset flow input table according to the configuration parameters, wherein the flow input table comprises at least one data flow identifier to be received; the target sub-chip is a sub-chip in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
when the destination sub-chip receives a data packet, judging whether the stream input table contains a data stream identifier in the data packet;
and when the flow input table contains the data flow identification in the data packet, storing the data in the data packet.
8. The method according to claim 7, wherein the stream input table further comprises one or more of an identifier of the first data processing task, information indicating the number of packets of the received data stream, and a start address of the destination sub-chip for storing data included in the data stream;
the data packet further comprises an identification of the first data processing task.
9. The method of claim 7, wherein before the destination chiplet receives the data packet, further comprising:
the target sub-chip sends a unblocking data packet; the unblocking data packet comprises an identifier of the data stream and an identifier of a source sub-chip, and the unblocking data packet is used for indicating that the destination sub-chip is ready to receive the data stream to the source sub-chip; the source sub-chip is a sub-chip for sending the data packet in the chip system.
10. The method according to any one of claims 7-9, wherein the chip system comprises a subsystem, the subsystem comprising at least two sub-chips, the subsystem configured with a subsystem identification;
the data packet also includes an identification of the subsystem.
11. A method for processing data transmission in a chip system, the method comprising:
the controller distributes a first data processing task for the source sub-chip, and data obtained after the first data processing task is executed is sent to the destination sub-chip in a data stream mode; the controller, the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topological structure;
the controller configures an identifier for the data stream;
the controller sends the identifier of the data stream and the identifier of the destination sub-chip to the source sub-chip; wherein, the identification of the data stream and the identification of the destination sub-chip are used for associating a stream output table stored in the source sub-chip; and the stream output table is the basis for the source sub-chip to send data.
12. The method of claim 11, further comprising:
the controller distributes a second data processing task to the target sub-chip, and the second data processing task is executed based on data obtained after the execution of the first data processing task is completed;
the controller sends the identification of the data stream to the target sub-chip; the identifier of the data stream is used for being stored in a stream input table of the destination sub-chip, and the stream input table is a basis for the destination sub-chip to receive data.
13. The method according to claim 11 or 12, characterized in that the method further comprises:
the controller acquires data transmission conditions among the sub-chips in the chip system;
the controller generates scheduling information for the source sub-chip based on the data transmission condition, wherein the scheduling information indicates that the data stream is sent to a sending port of the destination sub-chip in the source sub-chip;
and the controller sends the scheduling information to the source sub-chip.
14. A source chiplet, characterized in that the source chiplet comprises:
a receiving unit, configured to receive configuration parameters;
the configuration unit is used for configuring a preset flow output table according to the configuration parameters; the stream output table comprises an identifier of a data stream and an identifier of a destination sub-chip of the data stream; the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topological structure;
the execution unit is used for executing the first data processing task to obtain output data;
a generating unit, configured to generate a plurality of data packets from the output data based on the stream output table, where the data packets include an identifier of the data stream and an identifier of the destination chiplet;
a transmitting unit, configured to transmit the plurality of data packets.
15. A destination sub-chip, comprising:
a receiving unit, configured to receive configuration parameters;
the configuration unit is used for configuring a preset flow input table according to the configuration parameters, wherein the flow input table comprises at least one data flow identifier to be received; the target sub-chip is a sub-chip in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure;
a judging unit, configured to judge, when a data packet is received, whether the flow input table includes a data flow identifier in the data packet;
and the storage unit is used for storing the data in the data packet when the flow input table contains the data flow identification in the data packet.
16. A controller, characterized in that the controller comprises:
the distribution unit is used for distributing a first data processing task for the source sub-chip, and the data obtained after the execution of the first data processing task is finished is sent to the destination sub-chip in a data stream form; the controller, the source sub-chip and the destination sub-chip are sub-chips in a plurality of sub-chips included in a chip system, and the plurality of sub-chips are connected in a preset topological structure;
a configuration unit, configured to configure an identifier for the data stream;
a sending unit, configured to send the identifier of the data stream and the identifier of the destination sub-chip to the source sub-chip; wherein, the identification of the data stream and the identification of the destination sub-chip are used for associating a stream output table stored in the source sub-chip; and the stream output table is the basis for the source sub-chip to send data.
17. A sub-chip comprising a processor, a memory, and a communication port; wherein the memory and a communication port are coupled to the processor, the communication port being for transceiving data, the memory being for storing a computer program, the processor being for invoking the computer program to cause the sub-chip to perform the method of any of claims 1-6;
the sub-chips are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
18. A sub-chip comprising a processor, a memory, and a communication port; wherein the memory and a communication port are coupled to the processor, the communication port being for transceiving data, the memory being for storing a computer program, the processor being for invoking the computer program to cause the sub-chip to perform the method of any of claims 7-10;
the sub-chips are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
19. A sub-chip comprising a processor, a memory, and a communication port; wherein the memory and a communication port are coupled to the processor, the communication port being for transceiving data, the memory being for storing a computer program, the processor being for invoking the computer program to cause the sub-chip to perform the method of any of claims 11-13;
the sub-chips are sub-chips in a plurality of sub-chips included in the chip system, and the plurality of sub-chips are connected in a preset topological structure.
20. The chip system is characterized by comprising a source sub-chip, a destination sub-chip and a controller; wherein the source sub-chip is the source sub-chip of claim 14, the destination sub-chip is the destination sub-chip of claim 15, and the controller is the controller of claim 16; alternatively, the first and second electrodes may be,
the source sub-chip is the sub-chip of claim 17, the destination sub-chip is the sub-chip of claim 18, and the controller is the sub-chip of claim 19.
21. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1-6; alternatively, the first and second electrodes may be,
the computer program, when executed by a processor, implementing the method of any one of claims 7-10; alternatively, the first and second electrodes may be,
the computer program, when executed by a processor, implements the method of any of claims 11-13.
CN202111633371.XA 2021-12-28 2021-12-28 Data transmission processing method in chip system and related device Pending CN114328623A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111633371.XA CN114328623A (en) 2021-12-28 2021-12-28 Data transmission processing method in chip system and related device
PCT/CN2022/099777 WO2023123902A1 (en) 2021-12-28 2022-06-20 Data transmission processing method in chip system, and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111633371.XA CN114328623A (en) 2021-12-28 2021-12-28 Data transmission processing method in chip system and related device

Publications (1)

Publication Number Publication Date
CN114328623A true CN114328623A (en) 2022-04-12

Family

ID=81014166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111633371.XA Pending CN114328623A (en) 2021-12-28 2021-12-28 Data transmission processing method in chip system and related device

Country Status (2)

Country Link
CN (1) CN114328623A (en)
WO (1) WO2023123902A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123905A1 (en) * 2021-12-28 2023-07-06 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related apparatus
WO2023123902A1 (en) * 2021-12-28 2023-07-06 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system, and related device
CN117041186A (en) * 2023-10-07 2023-11-10 苏州仰思坪半导体有限公司 Data transmission method, chip system, computing device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107508767B (en) * 2012-03-29 2020-10-02 英特尔公司 Techniques for using assigned switch identifications in input/output devices
CN103067295A (en) * 2013-01-04 2013-04-24 华为技术有限公司 Method and device and system for service transmission
CN111274197B (en) * 2018-12-05 2023-05-16 锐迪科(重庆)微电子科技有限公司 Data processing apparatus and method
CN112532714B (en) * 2020-11-25 2022-06-03 北京金山云网络技术有限公司 Data processing method, processing device, server and storage medium
CN112667557A (en) * 2021-03-16 2021-04-16 南京蓝洋智能科技有限公司 Data transmission method suitable for chiplet architecture
CN114328623A (en) * 2021-12-28 2022-04-12 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123905A1 (en) * 2021-12-28 2023-07-06 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related apparatus
WO2023123902A1 (en) * 2021-12-28 2023-07-06 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system, and related device
CN117041186A (en) * 2023-10-07 2023-11-10 苏州仰思坪半导体有限公司 Data transmission method, chip system, computing device and storage medium
CN117041186B (en) * 2023-10-07 2024-01-30 苏州仰思坪半导体有限公司 Data transmission method, chip system, computing device and storage medium

Also Published As

Publication number Publication date
WO2023123902A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
CN108319563B (en) Network function acceleration method and system based on FPGA
CN114328623A (en) Data transmission processing method in chip system and related device
US10305823B2 (en) Network interface card configuration method and resource management center
US8312197B2 (en) Method of routing an interrupt signal directly to a virtual processing unit in a system with one or more physical processing units
US8526422B2 (en) Network on chip with partitions
EP2486715B1 (en) Smart memory
US7613902B1 (en) Device and method for enabling efficient and flexible reconfigurable computing
CN105408879A (en) Resource management for peripheral component interconnect-express domains
US20090282211A1 (en) Network On Chip With Partitions
CN110995598B (en) Variable-length message data processing method and scheduling device
CN106936739B (en) Message forwarding method and device
US11675633B2 (en) Virtualised gateways
CN112753198B (en) Load balancing and message reordering method and device in network
CN114297130A (en) Data transmission processing method in chip system and related device
CN112214445B (en) RapidIO switching network data rate reconfigurable hardware circuit
CN104378161A (en) FCoE protocol acceleration engine IP core based on AXI4 bus formwork
CN115061973B (en) Network card mapping method and device based on asymmetric multiprocessing mode
US10353857B2 (en) Parallel processing apparatus and method for controlling communication
CN105407045A (en) Router virtualization method based on safety isolation
CN111357016B (en) On-chip communication system for neural network processor
US20220345930A1 (en) Virtualization of transceivers for multi-tenant programmable network-based devices
CN105939242B (en) Realize the method and device of virtual system
CN115756296A (en) Cache management method and device, control program and controller
CN112988061A (en) High bandwidth memory system with crossbar for dynamically programmable allocation schemes
CN114039894B (en) Network performance optimization method, system, device and medium based on vector packet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination