CN115687239A

CN115687239A - On-chip processing array, processing method, electronic device, and computer-readable medium

Info

Publication number: CN115687239A
Application number: CN202210963361.0A
Authority: CN
Inventors: 徐兵; 蒲朝飞; 谢鑫; 张楠赓
Original assignee: Hangzhou Canaan Creative Information Technology Ltd
Current assignee: Hangzhou Canaan Creative Information Technology Ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2023-02-03

Abstract

The disclosure provides an on-chip processing array, an on-chip processing method, electronic equipment and a computer readable medium, and belongs to the technical field of computers. The on-chip processing array includes: the on-chip processing array comprises a plurality of nodes distributed in an array, wherein connecting lines are arranged among the nodes of the on-chip processing array and are used for data transmission among the nodes; and the distance of the connecting line between any two nodes is less than or equal to a preset threshold value. According to the embodiment of the disclosure, the condition that two nodes with longer distances are directly connected can be reduced while the node connection relation in the on-chip processing array is guaranteed, and the chip cost is reduced.

Description

On-chip processing array, processing method, electronic device, and computer-readable medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an on-chip processing array, a processing method, an electronic device, and a computer-readable medium.

Background

Artificial Intelligence (AI), blockchain, etc. techniques rely on parallel computing and distributed storage. Since a distributed chip generally has a plurality of nodes, and each node is configured with a corresponding storage resource, it has been widely used in the technical fields of AI, block chaining, and the like. In the related art, in order to ensure the information transmission between the nodes on the chip, the nodes are usually connected to form a ring structure. However, in a distributed chip, if the distance between two directly connected nodes is long, the connection cost is high, and the chip cost is also high accordingly.

Disclosure of Invention

The disclosure provides an on-chip processing array, an on-chip processing method, an electronic device and a computer readable medium.

In a first aspect, the present disclosure provides an on-chip processing array comprising:

the on-chip processing array comprises a plurality of nodes distributed in an array, wherein connecting lines are arranged among the nodes of the on-chip processing array and are used for data transmission among the nodes;

and the distance between any two nodes of the connecting line is smaller than or equal to a preset threshold value.

In a second aspect, the present disclosure provides a processing method, comprising: executing the target task in response to the received task processing request;

the on-chip processing array is the on-chip processing array in any one of the embodiments of the present disclosure, and during the execution of the target task, data transmission is performed between nodes of the on-chip processing array based on a connection line.

In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the processing method described above.

In a fourth aspect, the present disclosure provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the processing method described above.

The embodiment provided by the disclosure can ensure that the connection relation of the nodes in the array is processed on the chip, so that the situation that two nodes with longer distances are directly connected is reduced while data is smoothly transmitted between the nodes, the connection cost of the nodes is reduced, and the cost of the chip is further reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:

FIG. 1 is a schematic diagram of an on-chip processing array provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a cascade chip provided in an embodiment of the disclosure;

FIG. 3 is a schematic diagram of an on-chip processing array provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an on-chip processing array based on distributed chips according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a cascaded-chip-based on-chip processing array according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an on-chip processing array based on cascaded chips according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a cascaded-chip-based on-chip processing array according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a cascaded-chip based on-chip processing array according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of a processing method provided by an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, … … specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Artificial intelligence is a technology for researching and developing intelligence for simulating, extending and expanding people, and has been applied to many fields as it enters a rapid development stage in recent years. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm, distributed accounting is performed through technologies such as decentralization, sharing and encryption, the block chain has the remarkable characteristics of disintermediation, openness, autonomy, information non-falsification, anonymity and the like, and is widely applied. However, whether it is an artificial intelligence technique or a blockchain technique, its implementation usually requires a lot of computing power and distributed storage resources for support.

In the related art, tasks in the technical fields of artificial intelligence, block chaining, and the like may be performed using an on-chip processing array. The on-chip processing array is composed of a plurality of nodes distributed in an array, and each node comprises a computing unit and a storage unit corresponding to the computing unit. During the execution of a task, data needs to be transferred between a plurality of nodes. To ensure that data can be transferred between any two nodes in an on-chip processing array, it is therefore often necessary to connect these nodes in a ring.

The on-chip processing array may be a rectangular, circular, trapezoidal, or other regular or irregular array.

Fig. 1 is a schematic diagram of an on-chip processing array provided in an embodiment of the present disclosure, where the on-chip processing array is composed of 16 nodes distributed in a 4*4 array, and fig. 1 (a) -1 (c) respectively show three connection relationships between nodes in the on-chip processing array.

Referring to fig. 1 (a), in the on-chip processing array, nodes are connected only to nodes adjacent to the nodes, and the connection of the nodes does not form a ring in the row direction or the column direction.

Referring to fig. 1 (b), in the on-chip processing array, in addition to establishing a connection relationship with nodes located adjacent to the nodes, in the column direction, two nodes located at the edge side of the same column are connected, so that the nodes in the column direction form a ring-shaped connection relationship, in the row direction, the last node in the ith row is connected to the first node in the (i + 1) th row (i is an integer greater than 1), and the first node in the 1 st row is connected to the last node in the last 1 st row, so that the nodes in the row direction also form a ring-shaped connection relationship.

Referring to fig. 1 (c), in the on-chip processing array, in addition to establishing a connection relationship with nodes located adjacent to the nodes, two nodes located on the edge side of the same column are connected in the column direction, so that the nodes in the column direction form a ring-shaped connection relationship, and two nodes located on the edge side of the same row are connected in the row direction, so that the nodes in the row direction also form a ring-shaped connection relationship.

Based on this, as for fig. 1 (a), when data is transmitted to the node on the edge side, congestion is easily formed due to the limitation of the available data transmission path, and when the congestion is serious, the on-chip processing array cannot smoothly execute the task; with regard to fig. 1 (b) and fig. 1 (c), when data is transmitted to the edge-side node, the data can still be transmitted to the destination node by using the ring connection relationship between the nodes, and a block is not easily formed, thereby ensuring that the on-chip processing array can smoothly execute tasks.

In the related art, a distributed chip may be used as an on-chip processing array. The distributed chip is internally provided with a plurality of computing nodes distributed in an array, and under a normal condition, each computing node is allocated with corresponding storage resources, and the computing nodes have management authority for the allocated storage resources. One compute node and its storage resources on a distributed chip may be considered a node in an on-chip processing array.

In some possible implementation manners, if one distributed chip cannot meet task processing requirements (including computing node requirements and/or storage requirements), multiple distributed chips may be cascaded by using a small chip (chip) and other technologies to form an extended chip with a larger processing array scale on a chip, so as to complete task processing based on the extended chip.

Fig. 2 is a schematic diagram of a cascade chip according to an embodiment of the disclosure.

Referring to fig. 2, it is an extended chip obtained by cascading four blocks 4*4 distributed chips. The on-chip processing array corresponding to the expansion chip includes 8*8 and 64 nodes, and each node is allocated with a corresponding storage resource (not shown in the figure).

In some possible implementation manners, in order to ensure that data can be smoothly transmitted between nodes in the on-chip processing array, the nodes in the edge rows or edge columns in the distributed chip and the corresponding nodes in the distributed chip cascaded with the nodes in the edge rows or edge columns in the distributed chip may be interconnected between chips, so that the nodes in the on-chip processing array (i.e., the extended chip) can form a ring-shaped connection relationship no matter in the horizontal direction or the longitudinal direction.

When inter-chip interconnection is performed, if distances between nodes are different, connection modes or interfaces used are correspondingly different. For two nodes with a short distance, for example, node 59 and node 60, node 51 and node 52, and … …, node 3 and node 4, a low-cost interconnection mode can be adopted. For example, an Integrated Fan-Out (InFO) based interconnect scheme is used. However, for two nodes with a long distance, for example, the node 56 and the node 63, … …, the node 0 and the node 7, and the node 56 and the node 0, … …, the node 63 and the node 7, etc., it is usually necessary to adopt a relatively high-cost interconnection method. For example, a high speed data interface (SERializer/deserialiser, serDes) based interconnect. This connection is costly and results in a correspondingly high chip cost for providing an on-chip processing array.

It should be noted that, similar to the cascade chip, for a single distributed chip, when two nodes at a longer distance are connected, a connection mode with higher cost is also needed, so that the chip cost is higher.

In view of this, the embodiments of the present disclosure provide an on-chip processing array and a processing method, where the on-chip processing array can ensure a node connection relationship, so that data is smoothly transmitted between nodes, and at the same time, reduce a situation of directly connecting two nodes with a longer distance, thereby reducing a connection cost of the nodes and further reducing a cost of a chip.

Fig. 3 is a schematic diagram of an on-chip processing array according to an embodiment of the disclosure. Referring to fig. 3, the on-chip processing array includes a plurality of nodes distributed in an array, and connection lines are provided between the plurality of nodes of the on-chip processing array, and are used for data transmission between the nodes; and the distance of the connecting line between any two nodes is less than or equal to a preset threshold value.

In some possible implementation manners, the preset threshold may be set according to any one or more of task processing requirements, experience, statistical data, and the like, and the setting manner of the preset threshold in the embodiment of the disclosure is not limited.

The on-chip processing array comprises a plurality of rows of nodes, and a plurality of nodes in the same row are connected through connecting lines.

The plurality of nodes in the same row may be a plurality of nodes arranged in rows in the horizontal direction (e.g. 0, 1, 2, 3, 4, 5, 6, 7 in fig. 2), a plurality of nodes arranged in columns in the vertical direction (e.g. 0, 8, 16, 24, 32, 40, 48, 56 in fig. 2), or a plurality of nodes arranged in any oblique direction on the on-chip processing array (e.g. 0, 9, 18, 27, 36, 45, 54, 63 in fig. 2), or a plurality of nodes arranged in a non-linear manner (e.g. 0, 8, 16, 25, 34, 43, 52, 60 in fig. 2) may be defined as a plurality of nodes in the same row in advance according to the needs of the processing task.

It should be noted that the form of the plurality of nodes in the same row is not limited to the above example, and may also be any plurality of nodes in the on-chip processing array, and the selection of any plurality of nodes is to satisfy the requirement that the distance of the connecting line between any two nodes is smaller than or equal to the preset threshold while forming the closed-loop connecting line.

In some possible implementations, the preset threshold is equal to n times the distance between adjacent nodes.

Illustratively, n times may be two times, three times, four times, etc., which embodiments of the present disclosure are not limited to.

It should be noted that the embodiments of the present disclosure are applicable to an on-chip processing array with uniformly distributed nodes, and are also applicable to an on-chip processing array with non-uniformly distributed nodes. Aiming at the on-chip processing array with non-uniformly distributed nodes, a reasonable preset threshold value can be set according to experience, statistical data and the like; for example: the distance between two adjacent nodes of the on-chip processing array with non-uniformly distributed nodes in the row direction and/or the direction is gradually increased or decreased, the distance between two adjacent nodes may have a plurality of values, wherein n times of the minimum value or n times of the maximum value or n times of the average value or n times of the distance between two other adjacent nodes is taken as a preset threshold, so that the distance between any two nodes directly connected in the on-chip processing array is smaller than or equal to the preset threshold, and the situation that two nodes with longer distances are directly connected can be reduced, thereby reducing the connection cost of the nodes and the chip cost.

In some possible implementation manners, the connection lines among the plurality of nodes in the same row jointly form a closed-loop connection line with a ring structure, so that alternative communication paths (both clockwise and counterclockwise directions of the closed-loop connection line can be used for communication) can be added, and congestion is prevented.

It should be noted that there is an association relationship between the interface type of the connection node and the distance between the nodes. In general, the shorter the distance between nodes is, the easier the connection relationship is to be implemented, and the lower the corresponding interface cost is, whereas the longer the distance between nodes is, the less easy the connection relationship is to be implemented, and the higher the corresponding interface cost is.

In some possible implementations, the preset threshold is less than or equal to the distance threshold. The distance threshold is used as a reference value for the node connection adopting the long-distance connection scheme or the short-distance connection scheme.

The interface of the connection node comprises a short-distance connection scheme and a long-distance connection scheme, and the distance threshold is used for limiting the adoption of the short-distance connection scheme and the long-distance connection scheme; the long-distance connection scheme is used for connecting nodes with the node distance larger than or equal to a distance threshold value, and the short-distance connection scheme is used for connecting nodes with the node distance smaller than the distance threshold value. The preset threshold is smaller than or equal to the distance threshold, so that the multiple node connection in the application is limited to only adopt a short-distance connection scheme, and the cost is low.

In some possible implementations, the node and the connection line are connected through a package interface.

Illustratively, long-range connection schemes employ relatively high-cost high-speed data interfaces (SerDes), and short-range connection schemes employ relatively low-cost integrated fan-out packages (InFO). The packaging interface related to the present disclosure is essentially: the distance between two directly connected nodes is short, a high-speed data interface is not required to be additionally adopted for connection, and only a low-cost short-distance connection scheme such as integrated fan-out packaging is adopted to connect the two nodes.

It should be understood that the above short-distance connection scheme and the long-distance connection scheme are only examples, and other types of connection manners may also be used to establish the connection between the nodes, which is not limited by the embodiment of the present disclosure.

In some possible implementations, the preset threshold is equal to twice the distance between adjacent nodes; the connection line is arranged to: two adjacent nodes in the plurality of nodes in the same row are connected through a connecting line or two nodes with a node interval are connected through a connecting line.

Because the preset threshold is twice the distance between adjacent nodes, only two connection modes can be provided for a plurality of nodes in the same row, one mode is that two adjacent nodes with one interval are directly connected, and the other mode is that two non-adjacent nodes with two intervals are connected.

Specifically, according to a value range determined by a preset threshold, in a preset direction of the on-chip processing array, nodes arranged at odd-numbered positions in the same row are sequentially connected in series, nodes arranged at even-numbered positions are sequentially connected in series, a first node at the odd-numbered positions is connected with a first node at the even-numbered positions, and a last node at the odd-numbered positions is connected with a last node at the even-numbered positions. The value range is used for representing the value range of the distance between two nodes directly connected in the on-chip processing array. The "row" may indicate a row or a column, where the "row" indicates a row when the preset direction is a row direction, and the "row" indicates a column when the preset direction is a column direction.

In one example, if the distance between adjacent nodes in the on-chip processing array is d, the preset threshold is 2d. Based on this, it can be determined that the distance between any two nodes directly connected in the on-chip processing array should be less than or equal to 2d, that is, only two adjacent nodes can be directly connected, or two nodes separated by one node can be directly connected. Therefore, in the on-chip processing array, in the preset direction, nodes arranged at odd bits are sequentially connected in series, nodes arranged at even bits are sequentially connected in series, the first node at the odd bits is connected with the first node at the even bits, and the last node at the odd bits is connected with the last node at the even bits, so that the corresponding connecting line is obtained. In the connecting line, the maximum distance between two directly connected nodes is 2d, and the requirement of a preset threshold value is met.

In some possible implementations, the preset direction may represent a direction of connection between nodes in the on-chip processing array, which is related to a distribution of the nodes in the on-chip processing array.

In some possible implementations, the preset direction includes a row direction. And a plurality of nodes in the ith row in the on-chip processing array are connected through connecting lines to form a transverse connecting line, wherein i is an integer greater than or equal to 1.

In other words, when the preset direction includes the row direction, the nodes in each row are connected according to the connection manner, and the transverse connection line corresponding to each row node is obtained, so that the nodes of the on-chip processing array are connected in a ring shape in the transverse direction.

In some possible implementations, the ith row of the on-chip processing array includes N nodes, where N is an integer greater than or equal to 2. In the ith row of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through connecting lines, the 1 st node is connected with the 2 nd node through connecting lines, and the N-1 st node is connected with the Nth node through connecting lines.

In other words, for any row of nodes of the on-chip processing array, the corresponding horizontal connecting lines are connecting lines having a ring structure, which are formed by sequentially connecting the nodes arranged at odd-numbered bits in series, and sequentially connecting the nodes arranged at even-numbered bits in series, and connecting the 1 st node with the 2 nd node (i.e., connecting the 1 st node at odd-numbered bits with the 1 st node at even-numbered bits), and connecting the last 1 node with the 2 nd node from last (i.e., connecting the last 1 node at odd-numbered bits with the last 1 node at even-numbered bits).

In some possible implementations, the preset direction includes a column direction. And a plurality of nodes in the jth column in the on-chip processing array are connected through connecting lines to form a longitudinal connecting line, wherein j is an integer greater than or equal to 1.

In other words, when the preset direction includes a column direction, the nodes in each column are connected according to the connection manner, and the longitudinal connection line corresponding to the node in each column is obtained, so that the nodes of the on-chip processing array are connected in a ring shape in the longitudinal direction.

In some possible implementations, the jth column of the on-chip processing array includes M nodes, where M is an integer greater than or equal to 2. In the jth column of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through connecting lines, the 1 st node is connected with the 2 nd node through connecting lines, and the M-1 st node is connected with the Mth node through connecting lines.

In other words, for any column of nodes of the on-chip processing array, the corresponding vertical connection lines are connection lines having a ring structure formed by sequentially connecting the nodes arranged at odd-numbered bits in series, and sequentially connecting the nodes arranged at even-numbered bits in series, and connecting the 1 st node to the 2 nd node (i.e., connecting the 1 st node at odd-numbered bits to the 1 st node at even-numbered bits), and connecting the last 1 node to the 2 nd node from last (i.e., connecting the last 1 node at odd-numbered bits to the last 1 node at even-numbered bits).

It should be noted that the on-chip processing array in the above embodiments may be composed of one or more distributed chips, and the embodiments of the present disclosure do not limit this. Wherein, if the on-chip processing array is composed of a distributed chip, the connection relationship between the nodes is the on-chip connection relationship between the nodes of the distributed chip; if the on-chip processing array is composed of a plurality of (at least two) distributed chips, the connection relationship between the nodes includes not only the on-chip connection relationship between the nodes of each distributed chip but also the inter-chip interconnection relationship between the nodes of different distributed chips.

An on-chip processing array according to an embodiment of the disclosure is described below with reference to fig. 4 and 5.

Fig. 4 is a schematic diagram of an on-chip processing array based on a distributed chip according to an embodiment of the present disclosure. Referring to fig. 4, the on-chip processing array consists of a single distributed chip that includes 16 nodes distributed in a 4*4 array. Accordingly, the node connection relationship includes only the on-chip connection relationship between the nodes within the distributed chip.

In the row direction, the number of nodes N is equal to 4. Aiming at any row of nodes, the 1 st node is connected with the 3 rd node, the 2 nd node is connected with the 4 th node, the 1 st node is connected with the 2 nd node, and the 3 rd node (namely, the N-1 st node) is connected with the 4 th node (namely, the N-1 st node) to form a transverse connecting line corresponding to the row of nodes.

In the column direction, the number of nodes M is equal to 4. Similar to the row direction, for any column of nodes, the 1 st node is connected with the 3 rd node, the 2 nd node is connected with the 4 th node, the 1 st node is connected with the 2 nd node, and the 3 rd node (i.e., the M-1 st node) is connected with the 4 th node (i.e., the M-1 st node) to form a transverse connecting line corresponding to the column of nodes.

Compared with the node connection relationship shown in fig. 1 (b) and 1 (c), in the on-chip processing array provided by the embodiment of the present disclosure, a horizontal ring connection relationship can be realized without connecting the first node and the last node in the row direction (or the last node in the previous row and the first node in the next row) in the distributed chip, and a vertical ring connection relationship can be realized without connecting the first node and the last node in the column direction, so that the distance between two directly connected nodes is reduced, and thus, an interface or a packaging mode with lower cost can be used to realize the same connection effect, thereby reducing the chip cost.

Fig. 5 is a schematic diagram of an on-chip processing array based on cascaded chips according to an embodiment of the present disclosure. Referring to fig. 5, the on-chip processing array is formed by cascading four 4*4 distributed chips, and comprises 64 nodes distributed in a 8*8 array. Correspondingly, the node connection relationship not only includes the intra-chip connection relationship among the nodes in the distributed chip, but also includes the inter-chip interconnection relationship among the nodes in different distributed chips.

In the row direction, the number of nodes N is equal to 8. Aiming at any row of nodes, connecting odd-number nodes in series in sequence, namely connecting a 1 st node with a 3 rd node, connecting the 3 rd node with a 5 th node, and connecting the 5 th node with a 7 th node; similarly, even-numbered nodes are also connected in series in sequence, that is, the 2 nd node is connected with the 4 th node, the 4 th node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 1 st node is connected with the 2 nd node, and the 7 th node (i.e., the N-1 th node) is connected with the 8 th node (i.e., the N-th node), so as to form a transverse connecting line corresponding to the node in the row.

In the column direction, the number of nodes M is equal to 8. Similar to the row direction, aiming at any column of nodes, odd-number nodes are sequentially connected in series, namely, the 1 st node is connected with the 3 rd node, the 3 rd node is connected with the 5 th node, and the 5 th node is connected with the 7 th node; similarly, even-numbered nodes are also connected in series in sequence, namely, the 2 nd node is connected with the 4 th node, the 4 th node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 1 st node is connected with the 2 nd node, and the 7 th node (namely, the M-1 st node) is connected with the 8 th node (namely, the M-1 th node) to form a longitudinal connecting line corresponding to the node in the column.

It should be understood that the connection relationship of the nodes in the on-chip processing array shown in fig. 5 is changed compared to the on-chip processing array shown in fig. 2, but the on-chip processing array shown in fig. 5 can still achieve the connection effect of the on-chip processing array shown in fig. 2.

The connection effect of the on-chip processing array is explained below in a row direction and a column direction, respectively.

In the row direction, the last 1 row node is taken as an example for explanation. In the last row 1 of fig. 2, node 0 is directly connected to node 1, node 1 is directly connected to node 2, node 2 is directly connected to node 3, node 3 is directly connected to node 4, node 4 is directly connected to node 5, node 5 is directly connected to node 6, node 6 is directly connected to node 7, and node 7 is directly connected to node 0, thereby forming a horizontal ring-shaped connection relationship of the nodes in the row. For convenience of description, the node numbers in fig. 5 are numbered in a reverse manner in a manner corresponding to fig. 2, that is, in fig. 5, the node number of the last 1 line of nodes is node 0- > node 7- > node 1- > node 6- > node 2- > node 5- > node 3- > node 4 from left to right in sequence (only the node numbers are changed, and the physical positions of the nodes themselves and the nodes are not changed). Based on this, in the last row 1 of fig. 5, node 0 is directly connected to node 1, node 1 is directly connected to node 2, node 2 is directly connected to node 3, node 3 is directly connected to node 4, node 4 is directly connected to node 5, node 5 is directly connected to node 6, node 6 is directly connected to node 7, and node 7 is directly connected to node 0. The nodes of the other rows are similar and will not be described again here.

In the column direction, the 1 st column node is taken as an example for explanation. In column 1 of fig. 2, node 0 is directly connected to node 8, node 8 is directly connected to node 16, node 16 is directly connected to node 24, node 24 is directly connected to node 32, node 32 is directly connected to node 40, node 40 is directly connected to node 48, node 48 is directly connected to node 56, and node 56 is directly connected to node 0, thereby forming a longitudinal ring-shaped connection relationship of the nodes in the column. For convenience of description, the numbers of the nodes in fig. 5 are numbered in a reverse way in a manner corresponding to fig. 2, that is, in fig. 5, the numbers of the nodes in the 1 st column are sequentially from bottom to top, that is, nodes 0- > 56- > 8- > 48- > 16- > 40- > 24- > 32. Based on this, in column 1 of fig. 5, node 0 is directly connected to node 8, node 8 is directly connected to node 16, node 16 is directly connected to node 24, node 24 is directly connected to node 32, node 32 is directly connected to node 40, node 40 is directly connected to node 48, node 48 is directly connected to node 56, and node 56 is directly connected to node 0. The nodes of the other columns are similar and will not be described again here.

It can be seen that although the connection relationships between the nodes in fig. 5 and fig. 2 are different, the connection effects of the nodes are the same.

It should be noted that, the above description that the preset threshold is equal to twice the distance between adjacent nodes is only an example, in some possible implementation manners, the preset threshold may also be another value (for example, the preset threshold may also be three times the distance between adjacent nodes), and the value of the preset threshold is not limited in the embodiment of the present disclosure.

The following description will be made by taking as an example that the preset threshold is equal to three times and four times of the distance between the vector nodes.

Fig. 6 is a schematic diagram of an on-chip processing array based on a cascade chip according to an embodiment of the present disclosure. Referring to fig. 6, the on-chip processing array is formed by cascading four 4*4 distributed chips, and comprises 64 nodes distributed in a 8*8 array.

In the row direction, the number of nodes N is equal to 8. Aiming at any row of nodes, the 1 st node is connected with the 2 nd node, the 2 nd node is connected with the 4 th node, the 4 th node is connected with the 7 th node, the 7 th node is connected with the 8 th node, the 8 th node is connected with the 6 th node, the 6 th node is connected with the 5 th node, the 5 th node is connected with the 3 rd node, and the 3 rd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the row direction are formed.

The column direction is similar to the row direction and also includes 8 nodes. Aiming at any column of nodes, the 1 st node is connected with the 2 nd node, the 2 nd node is connected with the 4 th node, the 4 th node is connected with the 7 th node, the 7 th node is connected with the 8 th node, the 8 th node is connected with the 6 th node, the 6 th node is connected with the 5 th node, the 5 th node is connected with the 3 rd node, and the 3 rd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the column direction are formed.

Further, as can be seen from fig. 6, the maximum distance between two directly connected nodes (the 4 th node is directly connected to the 7 th node, and the distance between the two nodes is the largest compared to other directly connected nodes) is equal to three times the distance between adjacent nodes (i.e., n = 3) in either the row direction or the column direction.

Fig. 7 is a schematic diagram of an on-chip processing array based on a cascaded chip, which also includes 64 nodes distributed in a 8*8 array according to an embodiment of the present disclosure.

Referring to fig. 7, another node connection is shown for the same on-chip processing array as fig. 5 and 6.

Aiming at any row of nodes, the 1 st node is connected with the 4 th node, the 4 th node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 8 th node is connected with the 7 th node, the 7 th node is connected with the 5 th node, the 5 th node is connected with the 3 rd node, the 3 rd node is connected with the 2 nd node, and the 2 nd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the row direction are formed.

The column direction is similar to the row direction, and the 1 st node is connected with the 4 th node, the 4 th node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 8 th node is connected with the 7 th node, the 7 th node is connected with the 5 th node, the 5 th node is connected with the 3 rd node, the 3 rd node is connected with the 2 nd node, and the 2 nd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the column direction are formed.

Further, as can be seen from fig. 7, the maximum distance between two directly connected nodes (the 1 st node is directly connected to the 4 th node, and the distance between the two is the largest compared to other directly connected nodes) is equal to three times the distance between adjacent nodes (i.e., n = 3) in either the row direction or the column direction.

Fig. 8 is a schematic diagram of an on-chip processing array based on a cascaded chip, which also includes 64 nodes distributed in a 8*8 array according to an embodiment of the present disclosure.

Referring to fig. 8, another node connection is shown for the same on-chip processing array as fig. 5-7.

Aiming at any row of nodes, the 1 st node is connected with the 2 nd node, the 2 nd node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 8 th node is connected with the 7 th node, the 7 th node is connected with the 5 th node, the 5 th node is connected with the 4 th node, the 4 th node is connected with the 3 rd node, and the 3 rd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the row direction are formed.

The column direction is similar to the row direction, and the 1 st node is connected with the 2 nd node, the 2 nd node is connected with the 6 th node, the 6 th node is connected with the 8 th node, the 8 th node is connected with the 7 th node, the 7 th node is connected with the 5 th node, the 5 th node is connected with the 4 th node, the 4 th node is connected with the 3 rd node, and the 3 rd node is connected with the 1 st node, so that eight groups of closed-loop connecting lines in the column direction are formed.

Further, as can be seen from fig. 8, the maximum distance between two directly connected nodes (the 2 nd node is directly connected to the 6 th node, and the distance between the two nodes is the largest compared to other directly connected nodes) is equal to four times the distance between adjacent nodes (i.e., n = 4) in both the row direction and the column direction.

It should be understood that, for the on-chip processing array shown in fig. 5 to 8, when it corresponds to one chip, the connection lines may be established by the nodes in the chip in the connection manner shown in the drawing, which is not limited by the embodiment of the present disclosure.

In each of the above schematic diagrams, the connecting line between the nodes is only used to represent that there is a connection relationship between the two corresponding nodes, and is not used to limit the attributes such as the shape and the position of the actual connecting line. In practical applications, routing modes between nodes can be reasonably arranged according to chip structures and the like, and the routing modes are not limited in the embodiment of the disclosure.

It should be noted that the above schematic diagram is only an example of the connection lines of the on-chip processing array in the embodiment of the present disclosure, and other connection lines meeting the requirement of the preset threshold may be used to establish the connection relationship of the on-chip processing array, which is not limited in the embodiment of the present disclosure.

It should be further noted that, for an on-chip processing array, whether the connection relationship as described above needs to be established in the row direction and the column direction at the same time may be determined according to the node distribution condition, task processing requirement, and the like of the on-chip processing array.

In some possible implementation manners, the above connection relationship may be established only for the nodes in the row direction, and the column direction is not required (any node connection manner may be adopted); the above connection relation can be established only for the nodes in the column direction, and the row direction is not required (any node connection mode can be adopted); the above connection relationship may also be established for nodes in the row direction and the column direction at the same time, which is not limited in the embodiment of the present disclosure.

In one example, if the number of nodes of the on-chip processing array in the row direction is large and the number of nodes in the column direction is small, the distance between the first node and the last node in the row direction is long and the distance between the first node and the last node in the column direction is short, if the connection is performed according to the connection method in the related art, the distance between some nodes directly connected in the row direction may be long. Therefore, the above connection relationship may be established only for the nodes in the row direction, and the nodes in the column direction may be connected in a vertical ring shape according to any node connection method in the related art.

In one example, if the number of nodes in the column direction of the on-chip processing array is large and the number of nodes in the row direction is small, the distance between the first node and the last node in the column direction is long and the distance between the first node and the last node in the row direction is short, if the connection is performed according to the connection method in the related art, the distances between the partial nodes directly connected in the column direction may be long. Therefore, the above connection relationship may be established only for the nodes in the column direction, and the horizontal ring connection may be established for the nodes in the row direction according to any node connection method in the related art.

In one example, if the number of nodes in the row direction and the column direction of the on-chip processing array is large, the distance between the first node and the last node in the row direction is long, and the distance between the first node and the last node in the column direction is also long, which may cause the distance between some nodes directly connected in the row direction and the column direction to be long if the connection is performed according to the connection method in the related art. Therefore, the above connection relationship can be established for the nodes in the row direction and the column direction, respectively.

It should be noted that the on-chip processing array of the disclosed embodiments can process various tasks by using the computing resources and storage resources of the nodes based on the connection lines.

Fig. 9 is a flowchart of a processing method according to an embodiment of the disclosure. Referring to fig. 9, the method includes:

step S901, in response to the received task processing request, executes the target task.

The on-chip processing array adopts any one of the on-chip processing arrays provided by the embodiments of the present disclosure, and during execution of a target task, data transmission is performed between nodes of the on-chip processing array based on a connection line.

In some possible implementations, the task processing request is used to instruct the on-chip processing array to perform a target task, and the target task includes any one of an image processing task, a voice processing task, a text processing task, a video processing task, and a block chain calculation task.

It should be noted that the above target tasks are only examples, and the embodiments of the present disclosure do not limit the type and content of the target tasks.

In some possible implementations, during the execution of the target task, the data transmitted based on the connection line includes, but is not limited to, processing results generated by the node processing the target task, intermediate data generated during the node processing the target task, data sent by other nodes, and data obtained from other storage spaces.

In some possible implementations, the preset threshold is equal to twice the distance between adjacent nodes; the connection line is arranged to:

according to a value range determined by a preset threshold value, in a preset direction of the on-chip processing array, nodes arranged at odd bits are sequentially connected in series, nodes arranged at even bits are sequentially connected in series, a first node at the odd bits is connected with a first node at the even bits, and a last node at the odd bits is connected with a last node at the even bits. The value range is used for representing the value range of the distance between two nodes directly connected in the on-chip processing array.

In some possible implementations, the ith row of the on-chip processing array includes N nodes; in the ith row of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through connecting lines, the 1 st node is connected with the 2 nd node through connecting lines, and the N-1 st node is connected with the Nth node through connecting lines.

In some possible implementations, the jth column of the on-chip processing array includes M nodes; in the jth column of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through connecting lines, the 1 st node is connected with the 2 nd node through connecting lines, and the M-1 st node is connected with the Mth node through connecting lines.

In some possible implementations, the on-chip processing array is composed of one or more distributed chips, and the distributed chips are provided with a plurality of nodes distributed in an array.

Referring to fig. 10, an embodiment of the present disclosure provides an electronic device including: at least one processor 1001; and memory 1002 communicatively coupled to the at least one processor 1001; the memory 1002 stores one or more computer programs executable by the at least one processor 1001, and the one or more computer programs are executed by the at least one processor 1001 to enable the at least one processor 1001 to execute the processing method.

Embodiments of the present disclosure also provide a computer-readable medium, on which a computer program is stored, wherein the computer program, when executed by a processor/processing core, implements the processing method described above. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

The disclosed embodiments also provide a computer program product, which includes computer readable code or a non-volatile computer readable storage medium carrying computer readable code, when the computer readable code runs in a processor of an electronic device, the processor in the electronic device executes the processing method.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).

The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer. In addition, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. It will, therefore, be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. An on-chip processing array is characterized by comprising a plurality of nodes distributed in an array, wherein connecting lines are arranged among the nodes of the on-chip processing array and used for data transmission among the nodes; and the distance between any two nodes of the connecting line is smaller than or equal to a preset threshold value.

2. The on-chip processing array of claim 1, wherein the on-chip processing array comprises a plurality of rows of nodes, a plurality of nodes in a same row being connected by the connecting lines.

3. The on-chip processing array of claim 2, wherein the plurality of nodes in the same row include any one or a combination of a plurality of nodes arranged in rows, vertically, linearly, obliquely, and non-linearly.

4. The on-chip processing array of claim 1, wherein the preset threshold is less than or equal to a distance threshold.

5. The on-chip processing array of claim 1, wherein the nodes and the connecting lines are connected by a package interface.

6. The on-chip processing array of claim 2, wherein the preset threshold is equal to n times a distance between adjacent nodes of the same row.

7. The on-chip processing array of claim 6, wherein the predetermined threshold is equal to twice a distance between adjacent nodes of the same row.

8. The on-chip processing array of claim 7, wherein two adjacent nodes of the plurality of nodes of the same row are connected by the connecting line or two nodes with a node spacing are connected by the connecting line.

9. The on-chip processing array of claim 8, wherein nodes arranged in odd bits in the same row are sequentially connected in series by the connecting lines, nodes arranged in even bits are sequentially connected in series by the connecting lines, a first node in odd bits is connected with a first node in even bits by the connecting lines, and a last node in odd bits is connected with a last node in even bits by the connecting lines.

10. The on-chip processing array of claim 9, wherein a plurality of nodes in a same row are connected by the connecting line;

and a plurality of nodes of the ith row in the on-chip processing array are connected through the connecting lines to form transverse connecting lines, wherein i is an integer greater than or equal to 1.

11. The on-chip processing array of claim 10, wherein an ith row of the on-chip processing array comprises N nodes;

in the ith row of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through the connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through the connecting lines, a 1 st node is connected with a 2 nd node through the connecting lines, and an N-1 st node is connected with an Nth node through the connecting lines.

12. The on-chip processing array of claim 9, wherein a plurality of nodes in the same column are connected by the connecting line;

and a plurality of nodes in the jth column in the on-chip processing array are connected through the connecting lines to form a longitudinal connecting line, wherein j is an integer greater than or equal to 1.

13. The on-chip processing array of claim 12, wherein a jth column of the on-chip processing array comprises M nodes;

in the j column of the on-chip processing array, nodes arranged at odd-numbered positions are sequentially connected in series through the connecting lines, nodes arranged at even-numbered positions are sequentially connected in series through the connecting lines, the 1 st node and the 2 nd node are connected through the connecting lines, and the M-1 st node and the M-th node are connected through the connecting lines.

14. The on-chip processing array according to any of claims 1-13, wherein the on-chip processing array is comprised of one or more distributed chips provided with a plurality of nodes distributed in an array.

15. A method of processing, applied to an on-chip processing array, the method comprising:

executing the target task in response to the received task processing request;

the on-chip processing array adopts the on-chip processing array as claimed in any one of claims 1 to 14, and during the execution of the target task, data transmission is performed between the nodes of the on-chip processing array based on the connection line.

16. The processing method of claim 15, wherein the target task comprises any one of an image processing task, a voice processing task, a text processing task, and a video processing task.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the processing method of claim 15 or 16.

18. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the processing method of claim 15 or 16.