CN112579510A - Chip cluster - Google Patents
Chip cluster Download PDFInfo
- Publication number
- CN112579510A CN112579510A CN202011497091.6A CN202011497091A CN112579510A CN 112579510 A CN112579510 A CN 112579510A CN 202011497091 A CN202011497091 A CN 202011497091A CN 112579510 A CN112579510 A CN 112579510A
- Authority
- CN
- China
- Prior art keywords
- chip
- chips
- port
- interconnection
- ports
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005540 biological transmission Effects 0.000 abstract description 35
- 238000003780 insertion Methods 0.000 abstract description 6
- 230000037431 insertion Effects 0.000 abstract description 6
- 238000000034 method Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 12
- 238000012546 transfer Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multi Processors (AREA)
- Small-Scale Networks (AREA)
Abstract
The invention discloses a chip cluster which comprises a plurality of chips, wherein the plurality of chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip. The chip cluster provided by the invention allows the main chips to be indirectly interconnected through the auxiliary chips, effectively shortens the link length of data transmission, can achieve the purposes of reasonably controlling the link insertion loss of data transmission and keeping the highest transmission rate of data transmission by the chip cluster under the condition that the physical distance between the main chips is longer without adding additional devices such as a repeater and the like.
Description
Technical Field
The invention relates to the technical field of chips, in particular to a chip cluster.
Background
Nowadays, with the development of information technology, information data becomes more and more complex, and the amount of calculation required for processing data also becomes larger and larger. Currently, a chip cluster formed by connecting a plurality of chips can be constructed through interconnection ports inside each chip. By constructing the finished chip cluster, the exchange transmission of data in different chips can be realized, and further, the large-scale data can be rapidly processed.
When the current chip cluster is designed, in order to reduce the delay of data transmission, the main chips in the chip cluster are forced to be directly interconnected. However, in actual deployment, the physical distance between the main chips may be relatively long, which causes an excessively long data transmission link between the main chips and the main chips through the board card, the backplane, the cable, and the like, and easily causes a problem that the link insertion loss of data transmission exceeds the maximum tolerable link insertion loss value, thereby directly reducing the data transmission rate of the chip cluster.
Disclosure of Invention
In view of the above problems, the present invention provides a chip cluster to overcome or at least partially solve the above problems, and the technical solution is as follows:
a chip cluster comprises a plurality of chips, wherein the plurality of chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip.
Optionally, each chip is provided with N interconnection ports, each M chip forms a unit, at least some of the N interconnection ports of the chip have at least one external connection port for connecting chips of other units, at least some of the N interconnection ports of the chip having the external connection port except the external connection port are internal connection ports, and the internal connection ports are used for connecting other chips in the unit where the internal connection port is located.
Alternatively, one interconnect port is connected to only one chip.
Optionally, the N interconnection ports respectively have port numbers, and the number sets formed by the port numbers of the N interconnection ports of each chip are the same.
Optionally, when the first external port of the first chip is connected to the second external port of the second chip, the second external port is an interconnection port closest to the first external port in each interconnection port of the second chip, and the first external port is an interconnection port closest to the second external port in each interconnection port of the first chip.
Optionally, each chip is arranged on one board card, the board card is provided with interconnection ports, the number of the interconnection ports arranged on the board card is N, and each interconnection port of one chip is respectively connected with each interconnection port on the board card where the chip is located.
Optionally, each M chips form a unit, and the board cards where the chips in one unit are located are all connected through the backplane.
Optionally, one or more units are determined as a chipset, and each chipset includes only one master chip.
Optionally, a plurality of the master chips are connected through a plurality of slave chips to form a target connection ring, and a difference between the number of slave chips arranged between any two adjacent master chips in the target connection ring and a preset number is smaller than a preset difference, wherein the two adjacent master chips are connected through the slave chip arranged between the two master chips.
Optionally, the target connection ring is one of a plurality of connection rings obtained according to the chip cluster, the plurality of connection rings all include the same master chip, slave chips included in the plurality of connection rings are not completely the same, and the target connection ring is a connection ring including the smallest number of slave chips in the plurality of connection rings.
By means of the technical scheme, the chip cluster comprises a plurality of chips, wherein the chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip. The chip cluster provided by the invention allows the main chips to be indirectly interconnected through the auxiliary chips, effectively shortens the link length of data transmission, can achieve the purposes of reasonably controlling the link insertion loss of data transmission and keeping the highest transmission rate of data transmission by the chip cluster under the condition that the physical distance between the main chips is longer without adding additional devices such as a repeater and the like.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 illustrates a schematic connection diagram of a master chip and a slave chip of a chip cluster according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a connection between a master chip and a slave chip of another chip cluster according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a structural connection of any chip in a chip cluster according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating connection between chips in different units in a chip cluster according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating numbers of interconnection ports in any chip in a chip cluster according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a target connection ring in a chip cluster according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The chip cluster provided by the embodiment of the invention can comprise a plurality of chips, wherein the plurality of chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip.
In the chip cluster shown in fig. 1, the shaded circle is a master chip, the blank circle is a slave chip, and the two master chips are connected through the slave chip. It will be appreciated that the chip cluster shown in fig. 1 only shows a connection between two master chips through only one slave chip. In actual chip cluster deployment, there may also be a connection situation between two master chips through more than two slave chips. For example: as shown in fig. 2, the master chip 1 may be connected to the master chip 2 via the slave chip 3, the master chip 1 may be connected to the master chip 2 via the slave chip 1 and the slave chip 3 in this order, and the master chip 1 may be connected to the master chip 2 via the slave chip 1, the slave chip 2, and the slave chip 3 in this order, depending on the communication path.
As shown in fig. 3, any chip in the chip cluster provided by the embodiment of the present invention may include at least a first interconnect port 100, a second interconnect port 200, a system bus 300, a first interconnect bus 400, a first arbitration distribution module 500, and a second arbitration distribution module 600. The first interconnect port 100 is connected to one side of the first interconnect bus 400 via a first arbitration distribution module 500, and the second interconnect port 200 is connected to the other side of the first interconnect bus 400 via a second arbitration distribution model. The system bus 300 is different from the first interconnect bus 400.
The first interconnect port 100 is connected to the system bus 300 via a first arbitration distribution module 500. The second interconnect port 200 is connected to the system bus 300 via a second arbitration distribution module 600.
The interconnection port in the embodiment of the present invention may be a high-speed interconnection port formed by a high-speed analog interface and a controller corresponding to the high-speed analog interface. For example: the interconnection port provided by the embodiment of the present invention may be a high-speed interconnection port formed by a ccix (cache Coherent interconnection for operators) or an Ethernet (Ethernet) controller and a high-speed serializer/deserializer (Serdes) thereof. Optionally, one interconnect port in the embodiment of the present invention is connected to only one chip. Optionally, the interconnection port provided in the embodiment of the present invention may be directly connected to at least one interconnection bus. According to the embodiment of the invention, the high-speed interconnection port is arranged in the chip, so that data can be interacted under the conditions of high speed and high bandwidth, and the time for transmitting the data through the chip is reduced. For example: the high-speed interconnection port of the embodiment of the invention can realize data interaction with the speed of 25Gbps and the bidirectional bandwidth of 50 GB/s. It is understood that, with the continuous development of chip port technology and manufacturing technology, the speed and bandwidth of the interconnect port may be continuously upgraded, which is only an illustrative example, and the speed and bandwidth of the interconnect port are not specifically limited by the embodiments of the present invention.
It should be noted that fig. 3 is a schematic diagram illustrating the connection between the first interconnect port 100 and the system bus 300 via the first arbitration distribution module 500, and the connection between the second interconnect port 200 and the system bus 300 via the second arbitration distribution module 600. It is understood that, in practical cases, a case where only the first interconnect port 100 is connected to the system bus 300 via the first arbitration distribution module 500 and the second interconnect port 200 is not connected to the system bus 300, and a case where only the second interconnect port 200 is connected to the system bus 300 via the second arbitration distribution module 600 and the first interconnect port 100 is not connected to the system may also be included. In a normal situation, the connection between the ports and the system bus 300 does not affect the data transmission between the interconnected ports through the interconnection bus.
The first arbitration distribution module 500 provided in the embodiment of the present invention may obtain connection information (i.e., routing information) of each chip node, and further determine, according to the connection information and a destination address of the data, a transmission line through which the data is sent to the destination address via the current chip node, and further determine, according to the transmission line, whether to send the data to the first interconnection port 100, the system bus 300, or the first interconnection bus 400. For example: when the destination address is the current chip node, the first arbitration distribution module 500 may send the data to the system bus 300 so that the current chip node may obtain the data. When the next chip node of the transmission line is a chip node to which the first interconnection port 100 is directly connected, the first arbitration distribution module 500 transmits data to the first interconnection port 100 so that the next chip node obtains the data. When the next chip node of the transmission line is a chip node directly connected to the second interconnect port 200, the first arbitration distribution module 500 transmits data to the first interconnect bus 400 to transmit the data to the second arbitration distribution module 600 through the first interconnect bus 400. Thus, the second arbitration distribution module 600 can transmit data to the second interconnect port 200 according to the transmission line, and further transmit data to the next chip node.
Optionally, the master chip may be a chip within a chip cluster that computationally generates and stores a specification (reduce) result generated by one or more chips, performs an inter-ring AllReduce operation, and broadcasts the final result to the chips within the group. The slave chip may be a chip in a chip cluster that computes a specification result and stores the final result.
The invention provides a chip cluster which comprises a plurality of chips, wherein the plurality of chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip. The chip cluster provided by the invention allows the main chips to be indirectly interconnected through the auxiliary chips, effectively shortens the link length of data transmission, can achieve the purposes of reasonably controlling the link insertion loss of data transmission and keeping the highest transmission rate of data transmission by the chip cluster under the condition that the physical distance between the main chips is longer without adding additional devices such as a repeater and the like.
Optionally, each chip is provided with N interconnection ports, each M chip forms a unit, at least some of the N interconnection ports of the chip have at least one external connection port for connecting chips of other units, at least some of the N interconnection ports of the chip having the external connection port except the external connection port are internal connection ports, and the internal connection ports are used for connecting other chips in the unit where the internal connection port is located.
For ease of understanding, the description herein is made in conjunction with FIG. 4: as shown in fig. 4, the blank circle is a chip, the black dot is an interconnection port, the square solid frame is a unit, each chip may be provided with 4 interconnection ports, each 4 chips constitute a unit, wherein the unit a includes a chip a, a chip B, a chip c and a chip d, the unit B includes a chip e, a chip f, a chip g and a chip h, the chip a, the chip B, the chip c and the chip d are connected through an inline port, the chip e, the chip f, the chip g and the chip h are connected through an inline port, and the unit a and the unit B are connected through an external port of the chip B and an external port of the chip e.
Optionally, the N interconnection ports respectively have port numbers, and the number sets formed by the port numbers of the N interconnection ports of each chip are the same.
For example: as shown in fig. 5, in the case where 4 interconnection ports are provided per chip, the 4 interconnection ports are numbered as P0, P1, P2, and P3. The number sets of the interconnection ports of the chips are [ P0, P1, P2 and P3 ].
The embodiment of the invention can conveniently manage the internal connection port and the external connection port by setting the port number of the interconnection port with the same number set for each chip. For example: the port numbers of the external ports in the chip are uniformly set to be P2, and the port numbers of other interconnection ports are set according to the relative positions of the other interconnection ports except the external ports in the chip and the external ports.
Optionally, when the first external port of the first chip is connected to the second external port of the second chip, the second external port is an interconnection port closest to the first external port in each interconnection port of the second chip, and the first external port is an interconnection port closest to the second external port in each interconnection port of the first chip.
According to the embodiment of the invention, the two interconnection ports with the closest distance between the two chips are used as the external connection ports, so that the interconnection between the two units where the two chips are positioned can keep the highest transmission rate of data transmission without adding an additional repeater.
Optionally, each chip is arranged on one board card, the board card is provided with interconnection ports, the number of the interconnection ports arranged on the board card is N, and each interconnection port of one chip is respectively connected with each interconnection port on the board card where the chip is located.
Optionally, each M chips form a unit, and the board cards where the chips in one unit are located are all connected through the backplane.
Optionally, the two external connection ports may be connected by a cable.
It can be understood that, in the embodiment of the present invention, board cards corresponding to the units may be preset, and when a chip is arranged on a board card, the chip belongs to the unit corresponding to the board card.
Optionally, in the embodiment of the present invention, one or more units may be determined as a chipset, and each chipset includes only one master chip.
Optionally, in the embodiment of the present invention, each chipset in the chip cluster may be determined by an algorithm having a packet structure. For example: the algorithm with the grouping structure can be a Hierarchical Ring-AllReduce algorithm or a 2D-Mesh AllReduce algorithm.
Alternatively, in the embodiment of the present invention, a plurality of units connected by a cable may be set as one chip set.
Optionally, a plurality of the master chips are connected through a plurality of slave chips to form a target connection ring, and a difference between the number of slave chips arranged between any two adjacent master chips in the target connection ring and a preset number is smaller than a preset difference, wherein the two adjacent master chips are connected through the slave chip arranged between the two master chips.
Alternatively, the preset number may be a ratio of the number of slave chips to the number of master chips in the target connection ring. It is understood that the number of slave chips disposed between any adjacent two master chips in the target connection ring may be equal.
Optionally, the target connection ring is one of a plurality of connection rings obtained according to the chip cluster, the plurality of connection rings all include the same master chip, slave chips included in the plurality of connection rings are not completely the same, and the target connection ring is a connection ring including the smallest number of slave chips in the plurality of connection rings.
It can be understood that, in the embodiments of the present invention, the position of the master chip in each chip set can be determined by setting a difference between the number of the slave chips arranged between any two adjacent master chips in the target connection ring and the preset number to be smaller than a preset difference, and/or setting the target connection ring as the connection ring including the smallest number of the slave chips in the plurality of connection rings.
The chip cluster provided by the embodiment of the invention does not force direct connection between the main chips, so that the occupation of the interconnection port of the main chip is avoided when the chip clusters are connected, the interconnection port of the main chip except the external connection port can be connected with the slave chip in the same chip cluster, and the expandability of the interconnection form of the chip nodes in the chip cluster is increased. For example: the saved interconnect ports in the main chip may be used as broadcast of the Hierarchical Ring-AllReduce.
In order to facilitate overall understanding of the chip cluster provided in the embodiment of the present invention, a chip cluster in the various chip clusters provided in the embodiment of the present invention is described with reference to fig. 6: as shown in fig. 6, a shaded circle is a master chip, a blank circle is a slave chip, a black dot is an interconnection port, a square solid frame is a unit, and the chips in the dashed frame are connected and combined to form a target connection ring. A chip group comprises a unit, and the unit comprises a master chip and three slave chips. In the same unit, the master chip and the slave chip are connected with each other through an interconnection port. And different units are connected through the external ports of the chips with the closest physical distance. In the target connection ring, any two main chips are connected through one slave chip.
The chip cluster provided by the embodiment of the invention can be practically deployed as a server. The server includes one or more racks. Each cabinet may include multiple enclosures. Multiple chassis may be vertically stacked in a single cabinet. Multiple chipsets in one chassis may be connected horizontally by the physical closest external port, and chipsets between different chassis stacked vertically may be connected vertically by the physical closest external port. The server deployed in the embodiment of the invention can minimize the physical distance of the variable link of the chip cluster provided by the embodiment of the invention to the maximum extent, and ensure that the link in the chip cluster can realize the maximum transmission rate of data transmission on the premise of not increasing additional board-level devices.
Any chip in the chip cluster in the embodiments of the present invention may store the topology structure of the chip cluster, and may determine the flow direction of data by using the topology structure in the data transmission process.
Optionally, the topology of the chip cluster may be determined by an enumeration method. Each chip in the chip cluster is taken as a chip node. Optionally, an embodiment of the present invention provides an enumeration method of a topology structure of a chip cluster, where the method may include:
and taking one chip node in the chip cluster as the current chip node. Determining the hierarchy of the current chip node as a first layer, setting the sequence number of the current chip node as a first layer, and writing the hierarchy and the sequence number of the current chip node into a node resource register of the current chip node. And traversing the interconnection ports which are not traversed in the interconnection ports of the current chip node in sequence. And determining whether node resource registers of other chip nodes connected with the interconnection ports traversed this time store topology position information, wherein the topology position information at least comprises the levels and the serial numbers of the chip nodes. And if the topological position information is not stored, writing the next level of the current chip node into the node resource register which is not stored with the topological position information, setting a sequence number for the other chip nodes which are not stored with the position information according to the traversal sequence, and respectively writing the sequence number into the corresponding node resource registers of the other chip nodes. And forming interconnection port pair connection information by utilizing the interconnection port information of the interconnection port of each current chip node and the interconnection port information of the interconnection ports of other chip nodes connected with the interconnection port, and storing the interconnection port pair connection information. And when other chip nodes with the same hierarchy as the current chip node are traversed, respectively taking the chip node which is written into the sequence number after the current traversal as the current chip node, and returning to the step of executing the sequential traversal of the interconnection ports which are not traversed in the interconnection ports of the current chip node. When the interconnection port pair connection information corresponding to each interconnection port in all chip nodes in the chip cluster is stored, determining the interconnection port pair connection information corresponding to each interconnection port as the topological structure of the chip cluster.
By the method for enumerating the topological structure of the chip cluster, the topological structure of the whole chip cluster can be enumerated through any chip node in the chip cluster, and data can be exchanged and transmitted among chips in the chip cluster.
Optionally, an embodiment of the present invention further provides a data handling method, where the data handling method is applied to the chip cluster, and the data handling method may include: and determining a communication path from the starting chip node to the destination chip node according to the topological structure of the chip cluster. And generating a data carrying request according to the communication path, wherein the data carrying request comprises a request access address and an ID set of the interconnection ports of each chip node which are sequentially arranged according to the communication path. And sequentially transmitting data carrying requests in each chip node from the initial chip by referring to the ID set so that when the destination chip node receives the data carrying requests, the data operation corresponding to the data carrying requests is executed.
When a converter connected to an interconnect port of any chip node in a chip cluster receives a data transfer request, the converter may modify a request access address in the data transfer request to the interconnect port, and the chip node may transmit the data transfer request to an interconnect port of another chip node through the interconnect port. It is understood that the chip node and the other chip node are both nodes on the communication path, and the other chip node is a node behind the one chip node in the communication path, and the chip node and the other chip node are connected through the two interconnection ports.
In the embodiment of the present invention, software, a CPU, or an MCU may determine a communication path from an initial chip node to a destination chip node according to a topology structure, where the communication path may include a transit chip node. For example: as shown in fig. 1, if one master chip is an initial chip node and the other master chip is a destination chip node, the slave chip is a relay chip node. Wherein the topology is preserved by the originating chip node.
After the communication path is determined, a data transfer request may be generated from the communication path by configuring components such as a transfer engine that transfers data over the system bus or transfers data over the interconnection bus.
The data transfer request may include an ID set of interconnection ports of each chip node of the transit chip nodes that need to be passed through in the communication path. It is understood that the IDs in the ID set are arranged in order according to the communication path, and there is a precedence order.
By applying the data handling method on the chip cluster provided by the embodiment of the invention, the data transmission delay among the chips in the chip cluster can be effectively reduced, especially the data transmission delay between the main chip and the main chip, and the overall data transmission efficiency of the chip cluster is improved.
Optionally, the chip cluster provided by the embodiment of the present invention may be applied to a distributed system. Further, the distributed system may be a traditional distributed system or an artificial intelligence distributed training system.
It can be understood that in the field of artificial intelligence deep learning, as the scale of the deep neural network model is increased, the amount of data and calculation amount required to be trained is increased. Compared with the traditional distributed system, the artificial intelligence distributed training system needs each chip node in the chip cluster to exchange data with other chip nodes for many times in each iteration, so that very high requirements are imposed on the bandwidth and the processing capacity of the chip nodes and the transmission bandwidth delay among the chip nodes. By applying the chip cluster provided by the embodiment of the invention to the artificial intelligence distributed training system, the link length of data transmission can be shortened, additional devices such as a repeater and the like do not need to be added, the link insertion loss of data transmission is further reasonably controlled, the transmission bandwidth delay between chip nodes is effectively reduced, and the performance of the artificial intelligence distributed training system for executing a training task can be improved.
In this application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (10)
1. A chip cluster is characterized by comprising a plurality of chips, wherein the plurality of chips comprise a master chip and a slave chip, and at least two master chips are connected through at least one slave chip in the plurality of chips.
2. The chip cluster according to claim 1, wherein each of the chips is provided with N interconnection ports, each M chips constitute a unit, at least one external connection port for connecting chips of other units is provided in at least some of the N interconnection ports of the chips, at least some of the other interconnection ports of the N interconnection ports of the chips having the external connection port except the external connection port are internal connection ports, and the internal connection ports are used for connecting other chips in the unit where the internal connection ports are located.
3. The chip cluster of claim 1, wherein one interconnect port is connected to only one chip.
4. The chip cluster according to claim 2, wherein the N interconnection ports have port numbers respectively, and the number sets of the port numbers of the N interconnection ports of each chip are the same.
5. The chip cluster according to claim 1, wherein when the first external port of the first chip is connected to the second external port of the second chip, the second external port is an interconnection port closest to the first external port among the interconnection ports of the second chip, and the first external port is an interconnection port closest to the second external port among the interconnection ports of the first chip.
6. The chip cluster according to claim 1, wherein each of the chips is disposed on a board card, the board card is disposed with interconnection ports, the number of the interconnection ports disposed on the board card is N, and each interconnection port of one of the chips is connected to each interconnection port on the board card where the chip is disposed.
7. The chip cluster according to claim 1, wherein each M chips form a unit, and the board cards on which the chips in a unit are located are connected through a backplane.
8. The chip cluster according to claim 2, wherein one or more units are defined as a chip set, each chip set comprising only one master chip.
9. The chip cluster according to claim 1, wherein a plurality of the master chips are connected by a plurality of slave chips to form a target connection ring, and a difference between a preset number and a number of slave chips arranged between any two adjacent master chips in the target connection ring is smaller than a preset difference, wherein the two adjacent master chips are connected by the slave chips arranged between the two master chips.
10. The chip cluster according to claim 9, wherein the target connection ring is one of a plurality of connection rings obtained from the chip cluster, the plurality of connection rings each include the same master chip, the slave chips included in the plurality of connection rings are not exactly the same, and the target connection ring is the connection ring including the smallest number of slave chips in the plurality of connection rings.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011497091.6A CN112579510B (en) | 2020-12-17 | 2020-12-17 | Chip cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011497091.6A CN112579510B (en) | 2020-12-17 | 2020-12-17 | Chip cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112579510A true CN112579510A (en) | 2021-03-30 |
CN112579510B CN112579510B (en) | 2024-08-27 |
Family
ID=75135942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011497091.6A Active CN112579510B (en) | 2020-12-17 | 2020-12-17 | Chip cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112579510B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115347894A (en) * | 2022-10-17 | 2022-11-15 | 杭州岸达科技有限公司 | Radio frequency interface circuit and multi-chip cascade method based on radio frequency interface circuit |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908032A (en) * | 2010-08-30 | 2010-12-08 | 湖南大学 | Processor array with reconfigurable processor sets |
US20130155796A1 (en) * | 2011-12-15 | 2013-06-20 | Samsung Electronics Co., Ltd. | Fabrication and testing method for nonvolatile memory devices |
CN110401466A (en) * | 2019-06-25 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN111782580A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Complex computing device, method, artificial intelligence chip and electronic equipment |
CN111901257A (en) * | 2020-08-10 | 2020-11-06 | 曙光信息产业(北京)有限公司 | Switch, message forwarding method and electronic equipment |
-
2020
- 2020-12-17 CN CN202011497091.6A patent/CN112579510B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908032A (en) * | 2010-08-30 | 2010-12-08 | 湖南大学 | Processor array with reconfigurable processor sets |
US20130155796A1 (en) * | 2011-12-15 | 2013-06-20 | Samsung Electronics Co., Ltd. | Fabrication and testing method for nonvolatile memory devices |
CN110401466A (en) * | 2019-06-25 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN111782580A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Complex computing device, method, artificial intelligence chip and electronic equipment |
CN111901257A (en) * | 2020-08-10 | 2020-11-06 | 曙光信息产业(北京)有限公司 | Switch, message forwarding method and electronic equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115347894A (en) * | 2022-10-17 | 2022-11-15 | 杭州岸达科技有限公司 | Radio frequency interface circuit and multi-chip cascade method based on radio frequency interface circuit |
Also Published As
Publication number | Publication date |
---|---|
CN112579510B (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111104775B (en) | Network-on-chip topological structure and implementation method thereof | |
US6035360A (en) | Multi-port SRAM access control using time division multiplexed arbitration | |
US20180285302A1 (en) | Method and apparatus to manage the direct interconnect switch wiring and growth in computer networks | |
US10476697B2 (en) | Network-on-chip, data transmission method, and first switching node | |
CN101488922B (en) | Network-on-chip router having adaptive routing capability and implementing method thereof | |
CN105956659A (en) | Data processing device, data processing system and server | |
DE102005048585A1 (en) | Subscriber and communication controller of a communication system and method for implementing a gateway functionality in a subscriber of a communication system | |
CN103546299A (en) | 50 Gb/s ethernet using serializer/deserializer lanes | |
CN111555901A (en) | Chip configuration network system for flexibly supporting hybrid bus protocol | |
CN114564434B (en) | General multi-core brain processor, acceleration card and computer equipment | |
US10678730B2 (en) | Computing system framework and method for configuration thereof | |
CN108270877B (en) | Distributed network node data sharing system | |
CN115586964A (en) | Resource sharing device, resource management device, and resource management method | |
CN113902111A (en) | Multi-chip interconnection system and neural network accelerated processing method | |
CN106407154A (en) | On-chip optical network topology and data transmission method | |
CN112579510B (en) | Chip cluster | |
CN114445260B (en) | Distributed GPU communication method and device based on FPGA | |
CN117493237A (en) | Computing device, server, data processing method, and storage medium | |
US20230334000A1 (en) | Axi bus structure and chip system | |
CN205983537U (en) | Data processing device and system, server | |
CN113438171B (en) | Multi-chip connection method of low-power-consumption storage and calculation integrated system | |
JPH0635874A (en) | Parallel processor | |
CN117499348A (en) | Artificial intelligence computing service equipment based on PCIe exchange | |
CN112463680A (en) | Data transfer method and device | |
CN115809685B (en) | NPU cluster network structure and network interconnection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |