CN115562848A - Task processing method and device, distributed chip, electronic device and medium - Google Patents

Task processing method and device, distributed chip, electronic device and medium Download PDF

Info

Publication number
CN115562848A
CN115562848A CN202210961374.4A CN202210961374A CN115562848A CN 115562848 A CN115562848 A CN 115562848A CN 202210961374 A CN202210961374 A CN 202210961374A CN 115562848 A CN115562848 A CN 115562848A
Authority
CN
China
Prior art keywords
node
computing node
computing
target
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210961374.4A
Other languages
Chinese (zh)
Inventor
谢鑫
徐兵
张楠赓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Canaan Creative Information Technology Ltd
Original Assignee
Hangzhou Canaan Creative Information Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Canaan Creative Information Technology Ltd filed Critical Hangzhou Canaan Creative Information Technology Ltd
Priority to CN202210961374.4A priority Critical patent/CN115562848A/en
Publication of CN115562848A publication Critical patent/CN115562848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The disclosure provides a task processing method and device, a distributed chip, electronic equipment and a medium, and belongs to the technical field of computers. The method comprises the following steps: determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip; selecting a target computing node from the second computing nodes, and copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node; and executing the task to be processed based on the first computing node and the second computing node. According to the embodiment of the disclosure, the resource utilization rate of the extended chip can be improved, and the resource waste of the chip is reduced.

Description

Task processing method and device, distributed chip, electronic device and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a task processing method and apparatus, a distributed chip, an electronic device, and a computer-readable medium.
Background
Artificial Intelligence (AI), blockchain, etc. techniques rely on parallel computing and distributed storage. Since a distributed chip generally has a plurality of computing nodes, and each computing node is configured with a corresponding storage resource, the distributed chip has been widely used in the technical fields of AI, blockchain, and the like.
In the related art, in a single distributed chip, or after a current distributed chip is cascaded with other distributed chips to obtain an extended chip, only some rows or some columns of computing nodes in the distributed chip may be used, so that there are unoccupied computing nodes, and the storage resources of these computing nodes are also in an unoccupied state, resulting in waste of computing resources and storage resources.
Disclosure of Invention
The disclosure provides a task processing method and device, a distributed chip, an electronic device and a computer readable medium.
In a first aspect, the present disclosure provides a task processing method, where a distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with a corresponding storage resource, the task processing method including: determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip; selecting a target computing node from the second computing node, and copying the content in the storage resource corresponding to the target computing node into the storage resource of the first computing node; executing a task to be processed based on the first compute node and the second compute node.
In a second aspect, the present disclosure provides a task processing apparatus, where a distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with a corresponding storage resource, the task processing apparatus including: the determining module is used for determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip; a selecting module, configured to select a target computing node from the second computing nodes; the copying module is used for copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node; an execution module to execute a task to be processed based on the first compute node and the second compute node.
In a third aspect, the present disclosure provides a distributed chip, where the distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with a corresponding storage resource, where the computing nodes include an unoccupied first computing node and an occupied second computing node, the second computing node includes a target computing node, and the target computing node is a second computing node whose storage resource is connected with the storage resource of the first computing node.
In a fourth aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the above-mentioned task processing method.
In a fifth aspect, the present disclosure provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the task processing method described above.
According to the embodiment provided by the disclosure, under the condition that an unoccupied first computing node exists in the distributed chip, the content in the occupied part of the storage resource of the second computing node can be copied into the storage resource of the first computing node to be used as the redundant storage of the distributed chip, and the task to be processed is executed together based on the first computing node and the second computing node, so that the resource utilization rate of the distributed chip is improved, and the resource waste of the chip is reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
fig. 1 is a schematic diagram of a distributed chip according to an embodiment of the disclosure;
fig. 2 is a flowchart of a task processing method based on a distributed chip according to an embodiment of the present disclosure;
fig. 3 (a) is a schematic diagram of a distributed chip provided in an embodiment of the present disclosure;
fig. 3 (b) is a schematic diagram of a distributed chip formed by chip cascading according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a distributed chip provided in an embodiment of the present disclosure;
fig. 5 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure;
fig. 6 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure;
fig. 7 is a schematic diagram of a distributed chip provided in an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a region division according to an embodiment of the present disclosure;
fig. 9 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure;
fig. 10 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure;
fig. 11 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure;
fig. 12 is a block diagram of a task processing apparatus based on a distributed chip according to an embodiment of the present disclosure;
fig. 13 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," 8230; \8230 "; when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Artificial intelligence is a technology for researching and developing intelligence for simulating, extending and expanding people, and has been applied to many fields as it enters a rapid development stage in recent years. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm, distributed accounting is performed through technologies such as decentralization, sharing and encryption, the block chain has the remarkable characteristics of disintermediation, openness, autonomy, information non-falsification, anonymity and the like, and is widely applied. However, whether it is an artificial intelligence technique or a block chain technique, its implementation usually requires a lot of computing power and distributed storage resources for support.
The distributed chip is provided with a plurality of computing nodes distributed in an array, and under the normal condition, each computing node is allocated with corresponding storage resources, and the computing nodes have management authority for the allocated storage resources, so that the distributed chip is suitable for processing various distributed tasks, and is suitable for the technical fields of artificial intelligence, block chains and the like.
The distributed chips can be in rectangular, circular, trapezoidal and other regular or irregular arrangement forms.
Fig. 1 is a schematic diagram of a distributed chip according to an embodiment of the present disclosure. Referring to fig. 1, the distributed chip includes 16 compute nodes distributed in a 4 × 4 array, and each compute node is configured with 4 memory cells for providing memory resources for the compute node, and the arrowed lines between the compute nodes represent routes for transferring data.
In some possible implementations, the compute nodes in an edge row or column of the distributed chip are directly connected to the compute nodes in another edge row or column corresponding thereto (e.g., by packaging techniques to connect the corresponding nodes together).
Taking node 0 as an example for explanation: in addition to being directly connected to the node 1 and the node 4, the node 0 is also directly connected to the node 3 and the node 12, that is, the hop counts between the node 0 and the nodes 1, 4, 3 and 12 are all 1 hop. Based on this, when the node 0 transfers data to the node 12, the data may be directly transmitted to the node 12 through 1 hop, or the data may be transmitted to the node 12 through 3 hops based on the nodes 4 and 8 in turn. Similarly, node 1 is directly connected to node 13 in addition to nodes 0, 5, and 2. Other nodes are similar to node 0 and node 1, and the description is not repeated here.
It should be noted that, by the above node connection manner, the computing nodes of the distributed chip may be connected in a ring structure, so that the average hop count between the computing nodes can be reduced, the time delay can be reduced, and the processing efficiency of the distributed chip can be improved.
In some possible implementation manners, if the current computing node completes the ith computation and learns that data required by the ith +1 th computation is stored in a storage unit of another computing node, the relevant data calculated in the ith computation may be transferred to the another computing node through a routing connection relationship between the computing nodes, and the another computing node executes the (i + 1) th computation, where i is an integer greater than or equal to 1.
In one example, the calculation order of the calculation nodes may be determined in advance according to the storage condition of the data in the storage unit. If the calculation sequence is determined to be the node 3- > node 6- > node 0, the node 3 performs the 1 st step of calculation by using the data in the storage unit, transmits the relevant data calculated in the 1 st step (the relevant data calculated in the 1 st step includes any one or more of the calculation result, the calculation intermediate quantity and the like of the node 3) to the node 6 through a route after the calculation is completed, performs the 2 nd step of calculation by using the data in the storage unit and the relevant data transmitted by the node 3 by using the node 6, transmits the relevant data corresponding to the node 6 to the node 0 through the route after the calculation of the node 6 is completed, and completes the 3 rd step of calculation by using the data in the storage unit and the relevant data transmitted by the node 6 by using the node 0, thereby obtaining the final calculation result.
In some possible implementations, if one distributed chip cannot meet the task processing requirement (including the computing node requirement and/or the storage requirement), a plurality of distributed chips may be cascaded in a cascade manner to form an extended chip, so as to complete the task processing based on the extended chip. The extended chip can be regarded as a distributed chip with a larger node scale, and the task processing method provided by the embodiment of the disclosure is suitable for a single distributed chip and is also suitable for an extended chip formed in a chip cascade mode; the storage resources are electronic devices which can realize data storage functions, such as triggers, registers, latches, memories, memory cards and the like.
However, in some possible implementations, in a single distributed chip, or after the current distributed chip is cascaded with other distributed chips to obtain an extended chip, only some rows or some columns of computing nodes in the distributed chip may be used, so that there are unoccupied computing nodes, and the storage resources of these computing nodes are also in an unoccupied state, resulting in waste of computing resources and storage resources.
In view of this, an embodiment of the present disclosure provides a distributed chip, where the distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with a corresponding storage resource, and the computing nodes include an unoccupied first computing node and an occupied second computing node, and the second computing node includes a target computing node, where the target computing node is a second computing node whose storage resource is connected to the storage resource of the first computing node.
Wherein the number of target compute nodes is less than or equal to the first compute node.
The storage resources of the target computing node can be correspondingly connected with the storage resources of the first computing nodes with the same number, so that data replication is realized, data transmission among the nodes is completed during task processing, and the hop count among the computing nodes is reduced. The connection can be realized by determining a target computing node in advance during packaging, connecting the storage resource of the determined target computing node with the storage resource of the first computing node correspondingly at the moment, and directly completing data transfer and replication during task processing; or directly connecting the storage resource of the second computing node with the storage resource of the first computing node in advance, and specifically selecting a specific target computing node to realize data transmission with the specific first computing node when waiting for task processing to copy the content of the storage resource. That is, the determination of the target compute node may be a good determination at the time of encapsulation or a confirmation according to the processing task requirements after encapsulation. The data can be copied through the data transmission unit, and the data transmission unit can be triggered through the control of the main controller to realize the moving and copying of the data.
Referring to the distributed chip shown in fig. 1, among the nodes 0 to 15, there may be a portion of nodes that are unoccupied and belong to a first computing node, and a portion of nodes that are already occupied and belong to a second computing node. For the first compute node, since its resources are not utilized, the resources are in an idle state, which may result in a low resource utilization.
In view of this, the embodiments of the present disclosure further provide a task processing method and apparatus based on a distributed chip, which can copy the content in at least a part of the storage resources of the second computing node that has been occupied into the storage resources of the first computing node that is not occupied, as redundant storage of the distributed chip, and execute the task to be processed based on the first computing node and the second computing node together, that is, complete data processing based on the second computing node, but only utilize the storage resources of the first computing node to complete data transfer between the nodes, reduce the number of hops between the computing nodes, improve the resource utilization rate of the distributed chip, and reduce the resource waste of the chip.
Fig. 2 is a flowchart of a task processing method based on a distributed chip according to an embodiment of the present disclosure. Referring to fig. 2, the method includes:
step S201, determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip.
Step S202, selecting a target computing node from the second computing nodes, and copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node.
Step S203, a task to be processed is executed based on the first computing node and the second computing node.
For example, the distributed chip may be a single distributed chip, or may be an extended chip obtained by cascading at least two chips. The distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with corresponding storage resources. The storage resources may be used to store a variety of content including, but not limited to, processing results generated by the compute nodes processing tasks, intermediate data generated during the compute nodes processing tasks, data sent by other compute nodes, and data retrieved from other storage spaces.
Fig. 3 (a) is a schematic diagram of a distributed chip provided in an embodiment of the present disclosure, and fig. 3 (b) is a schematic diagram of a distributed chip formed by chip cascading provided in an embodiment of the present disclosure.
Referring to fig. 3 (a), the distributed chip includes 4 × 4 computing units distributed in an array, where each computing unit is allocated with a corresponding storage resource (the storage resource is not shown in the figure).
Referring to fig. 3 (b), it is an extended chip obtained by cascading four 4 × 4 distributed chips. The expansion chip includes 64 computing units, and each computing unit is allocated with a corresponding storage resource (the storage resource is not shown in the figure).
In some possible implementations, the computing nodes in the edge row or edge column of the distributed chip may establish routing connections with the corresponding computing nodes in the distributed chip in cascade connection with the computing nodes, so that the extended chip can form a circular computing array in both the horizontal direction and the vertical direction.
In one example, compute node 56 is directly connected to compute node 63, compute node 48 is directly connected to compute node 55, and so on in the horizontal cascade direction, compute node 0 is directly connected to compute node 7 so that the compute nodes of the expansion chip form a circular compute array in the horizontal direction. Similarly, in the longitudinal cascade direction, the computing node 56 is directly connected to the computing node 0, the computing node 57 is directly connected to the computing node 1, and so on, the computing node 63 is also directly connected to the computing node 7, so that the computing nodes of the expansion chip also form a ring-shaped computing array in the longitudinal direction.
In some possible implementation manners, if an unoccupied first computing node exists in the distributed chip, the computing node and the storage resource may be wasted, and based on this, in step S201, a target computing node may be selected from the occupied second computing nodes according to the node occupation information of the distributed chip, so as to copy the content in the storage resource of the target computing node into the storage resource of the first computing node, thereby reducing the waste of the resource.
In some possible implementations, the node occupancy information is used to characterize occupancy of the distributed on-chip computing nodes. In one example, the node occupancy information includes row and column distribution information of the first computing node and row and column distribution information of the second computing node. For example, the row-column distribution information of the first computing node includes which rows and/or columns of the distributed chip are occupied by the first computing node, and the row-column distribution information of the second computing node includes which rows and/or columns of the distributed chip are occupied by the second computing node. According to the node occupation information, the distribution condition of the first computing node, the distribution condition of the second computing node, the relative position relationship between the first computing node and the second computing node and other information in the distributed chip can be clarified.
In some possible implementation manners, the node occupation information includes quantity information of the first computing node and the second computing node, and the quantity of the target computing nodes is less than or equal to that of the first computing node; the target computing node may be selected from the second computing nodes based on the quantity information.
In step S201, it is determined that the number of unoccupied first computing nodes is N1, and N2 computing nodes are selected from the occupied second computing nodes as target computing nodes, where N2 is not greater than N1. When the N2 second computing nodes are selected, the second computing nodes may be selected randomly, may also be selected in rows and columns, and may also be selected according to preset information (for example, load information of the nodes), which is not limited in this disclosure.
In some possible implementations, the node occupancy information includes location information of the first computing node and the second computing node; a target compute node may be selected from the second compute nodes based on the location information. The position information may be represented as a plane distribution condition, may be a row and column distribution information, and may be a plane coordinate value. According to the position information, the content in the storage resource of the target computing node at the specific position can be copied to the storage resource of the unoccupied first computing node according to a random position or a specific position, and as long as the data transmission between the nodes is completed during the task processing, the hop count between the computing nodes can be reduced.
In some possible implementations, the location information includes row-column distribution information of the first computing node and the second computing node; and selecting a target computing node from the second computing nodes according to the row-column distribution information. The row and column distribution information can more clearly and accurately reflect the specific positions of the computing nodes, such as several rows and several columns of computing nodes. Therefore, the content in the storage resource of the target computing node at the specific position can be copied to the storage resource of the unoccupied first computing node according to the arrangement rule of the specific row and column direction, so that the data transmission between the nodes is completed, and the hop count between the computing nodes is reduced.
In some possible implementation manners, content in the storage resource corresponding to the target computing node may be copied to the storage resource of the unoccupied first computing node according to the row and column distribution information and a fixed direction. The fixed direction is a fixed direction of the target computing node, and part or all of the second computing nodes in the row direction and/or the column direction may be selected as the target computing nodes.
In some possible implementations, the selecting the target computing node in step S202 may be implemented as follows: firstly, determining the fixed direction of a target computing node according to the row-column distribution information of a first computing node; and secondly, selecting a target computing node from the second computing nodes according to the row-column distribution information of the first computing node, the row-column distribution information of the second computing nodes and the fixed direction of the target computing node.
Wherein the fixed direction comprises a row direction or a column direction, which is used to determine whether a row or a column is selected from the second computing nodes as the target computing node. In some possible implementation manners, if 1 or several rows of computing nodes are occupied in the distributed chip, determining that the fixed direction is a row direction, namely selecting a plurality of rows from second computing nodes of the distributed chip as target computing nodes; correspondingly, if 1 or more columns of computing nodes in the distributed chip are occupied in the distributed chip, the fixed direction is determined to be the column direction, namely, a plurality of columns of computing nodes are selected from the second computing nodes of the distributed chip to be used as target computing nodes.
In some possible implementation manners, under the condition that the fixed direction is the row direction, firstly, determining a target row number n1 according to row-column distribution information of the first computing node; secondly, according to the row-column distribution information of the second computing nodes and the target row number n1, selecting n2 rows of second computing nodes as target computing nodes, wherein n1 and n2 are integers greater than or equal to 1, and n2 is less than or equal to n1.
In other words, for the unoccupied n1 rows of the first computing node, all rows (i.e., n1= n 2) or a part of rows (i.e., n2 < n 1) may be selected from the unoccupied n1 rows of the first computing node to implement the redundant storage for the second computing node, which is not limited by the embodiment of the present disclosure.
In one example, the target number of rows is the same as the number of rows of the first compute nodes, i.e., all of the first compute nodes are occupied, making all of the first compute nodes redundant nodes.
In one example, the target number of rows is less than the number of rows of the first compute node, i.e., only a portion of the first compute node is occupied, and there is also a portion of the first compute node that is unoccupied.
It should be noted that, because the number of rows of the second computing node is generally greater than the number of rows of the unoccupied first computing node, in order to reduce the average hop count between the computing nodes and improve the effective computing power, n2 rows of second computing nodes may be selected as the target computing node in a sparse and uniform manner. In some possible implementations, the target compute node may be chosen sparsely and uniformly based on node row spacing.
In one example, firstly, determining a node line interval according to the row-column distribution information of the second computing node and the target row number n1; and secondly, selecting n2 rows of second computing nodes as target computing nodes according to the node row interval.
It should be noted that, for the same distributed chip, the node row interval may be set to be a uniform interval, or may be set to be different intervals. The node row intervals are set to be different intervals, and it is mainly considered that in part of application scenes, unified node row intervals cannot be set based on row and column distribution information of the first computing node and row and column distribution information of the second computing node.
In one example, the distributed chip includes computing nodes distributed in a 10 × 5 array, where the second computing nodes are located on the upper side of the distributed chip and distributed in a 7 × 5 array, and the first computing nodes are located on the lower side of the distributed chip and distributed in a 3 × 5 array, that is, the second computing nodes occupy rows 1 to 7 of the distributed chip, and the first computing nodes occupy rows 8 to 10 of the distributed chip. The selection direction of the target computing node is determined as a row vector, and at most 3 rows can be selected from the second computing nodes from the 1 st row to the 7 th row as the target computing node (namely, the target row number n1 is equal to 3). In this case, a uniform node row interval cannot be set. However, in order to select the target computing nodes as sparsely and uniformly as possible, n2 may be set to 3, and the node line pitches may be set to 1, and 2, respectively. In other words, the second computing node in the 1 st row is selected as the target computing node corresponding to the first computing node in the 8 th row; selecting a second computing node in the 3 rd row as a target computing node corresponding to the first computing node in the 9 th row at an interval of 1 row; and 2 lines are separated, and the second computing node in the 6 th line is selected as the target computing node corresponding to the first computing node in the 10 th line. Of course, n2 may also be set to 2, so that 2 rows are selected from the second computing nodes in rows 1 to 7 as the target computing node. For example, the second computing node in the 3 rd row and the second computing node in the 5 th row are respectively selected as the target computing node corresponding to the first computing node in the 8 th row and the 9 th row. The case where n2 is 1 is similar to the above and will not be described further herein.
It should be noted that, the above node row spacing is only an example, and a person skilled in the art may flexibly set the node row spacing according to a requirement, which is not limited in the embodiment of the present disclosure.
The fixed direction also includes the column direction. In some possible implementation manners, under the condition that the fixed direction is the column direction, firstly, determining a target column number m1 according to the row-column distribution information of the first computing node; and secondly, selecting m2 columns of second computing nodes as target computing nodes according to the row-column distribution information of the second computing nodes and the target column number m1, wherein m1 and m2 are integers greater than or equal to 1, and m2 is less than or equal to m1.
In other words, for the first computing node with m1 columns that is not occupied, all columns (i.e., m1= m 2) or a part of columns (i.e., m2 < m 1) may be selected from the first computing node to implement the redundant storage for the second computing node, which is not limited by the embodiment of the present disclosure.
In one example, the target number of columns is the same as the number of columns of first compute nodes, i.e., all first compute nodes are occupied, making all first compute nodes redundant nodes.
In one example, the target number of columns is less than the number of columns of the first compute nodes, i.e., only a portion of the first compute nodes are occupied, and there is also a portion of the first compute nodes that are unoccupied.
It should be understood that when copying the contents of the storage resources between the nodes, extra power consumption is usually required, and for the cases where n2 < n1 and m2 < m1, power consumption is saved compared to the cases where n2= n1 and m2= m1 because only a part of the first computing nodes are occupied.
It should be noted that, because the number of columns of the second computing nodes is usually greater than the number of columns of the unoccupied first computing nodes, in order to reduce the average hop count between the computing nodes and improve the effective computing power, m2 columns of second computing nodes may be selected as the target computing nodes in a sparse and uniform manner. In some possible implementations, the target compute node may be chosen sparsely and uniformly based on the node column spacing.
In one example, firstly, determining a node column interval according to the row-column distribution information of the second computing node and the target column number m1; and secondly, selecting m2 columns of second computing nodes as target computing nodes according to the node column intervals.
It should be noted that, similar to the node row interval, for the same distributed chip, the node column interval may be set to be a uniform interval, or may also be set to be a different interval, and the specific manner may refer to the foregoing setting regarding the node row interval, and is not described repeatedly here.
It should be further noted that, when the second computing node is selected according to the number of the target rows or the number of the target columns, the second computing node in the entire row/entire column may be selected, or a part of the second computing nodes in the row/column may be selected, as long as the number of the selected second computing nodes is the same as the number of the determined target computing nodes.
In some possible implementations, in step S202, selecting a target computing node from the second computing nodes includes:
selecting a target computing node from the second computing nodes according to the load information of the second computing nodes; wherein the load information at least comprises a calculation frequency and a calculation amount.
In some possible implementations, the load information may be obtained through simulation of the distributed chip.
Illustratively, for various types of tasks to be processed, a distributed chip is used for simulation in advance, and load information such as the calculation frequency and the calculation amount of the second calculation node when the various types of tasks to be processed are executed is obtained. When the target computing node is selected from the second computing nodes, the second computing node with high computing frequency and/or large computing amount is preferably selected as the target computing node.
It should be noted that the above description is only an example of the load information, and other information that can characterize the processing load of the second computing node is also within the protection scope of the embodiments of the present disclosure.
After the selection of the target computing node is completed, in step S202, the content in the storage resource corresponding to the target computing node may be copied to the storage resource of the first computing node, so as to implement the utilization of the first computing node and the storage resource thereof, and reduce the waste of resources.
It should be noted that the target computing node and the first computing node have a corresponding relationship, and therefore, in the content copying process, the content copying operation needs to be executed based on the corresponding relationship.
In some possible implementations, if the 1 st column of target computing nodes has a corresponding relationship with the 1 st column of first computing nodes and the 2 nd column of target computing nodes has a corresponding relationship with the 2 nd column of computing nodes in the distributed chip, the content in the storage resource of the 1 st column of target computing nodes should be copied to the storage resource of the 1 st column of first computing nodes, and the content in the storage resource of the 2 nd column of target computing nodes should be copied to the storage resource of the 2 nd column of first computing nodes.
After copying the content in the storage resource of the target compute node to the storage resource of the first compute node, the pending task may be executed in step S203.
In some possible implementations, the first compute node and the second compute node based on distributed chips perform the pending task. In other words, after the content replication, when the distributed chip executes the task to be processed, the distributed chip does not rely on the second computing node and the storage resources of the second computing node any more, but executes the task to be processed based on the first computing node, the second computing node and the storage resources of the computing nodes together, so that the computing nodes and the storage resources in the distributed chip are fully utilized, the effective computing power is improved, and the task processing capability of the distributed chip is correspondingly improved.
It should be noted that the task to be processed includes any one of an image processing task, a voice processing task, a text processing task, a video processing task, and a block chain calculation task, and the type and content of the task to be processed are not limited in the embodiment of the present disclosure.
It should be further noted that, the task processing method based on the distributed chip provided in the embodiment of the present disclosure may be executed by a controller, where the controller may be a controller disposed on a chip or a controller disposed outside the chip, and the embodiment of the present disclosure does not limit this.
Fig. 4 is a schematic diagram of a distributed chip according to an embodiment of the present disclosure. Referring to fig. 4, the distributed chip is formed by cascading four small distributed chips, namely, the first chip to the fourth chip, and the second computing node is represented by a black circle and the first computing node is represented by a white circle. Therefore, all the computing nodes of the first chip and the second chip are occupied, only the 1 st column of computing nodes in the third chip and the fourth chip are occupied, and the rest 3 columns of computing nodes are unoccupied. Based on this, the fixed direction of the target compute node may be determined to be a column vector.
In the distributed chip, the first computing node occupies the 6 th to 8 th columns, and thus, the target number of columns is determined to be 3. The second computing nodes occupy the 1 st column to the 5 th column, and the number of target columns is 3, so that it may be determined that the node column interval is 1, that is, 3 columns of second computing nodes are sparsely and uniformly selected from the 1 st column to the 5 th column as target computing nodes (i.e., m2= m1= 3) in a manner of 1 column interval. Therefore, the 1 st, 3 rd and 5 th column second computing nodes are selected as target computing nodes, the 1 st column second computing node has a corresponding relationship with the 6 th column first computing node, the 3 rd column second computing node has a corresponding relationship with the 7 th column first computing node, and the 5 th column second computing node has a corresponding relationship with the 8 th column first computing node.
After the target compute node is determined, the contents of the storage resource of the second compute node in column 1 are copied to the storage resource of the first compute node in column 6, i.e., the contents of the storage resource of node 56 are copied to the storage resource of node 61, the contents of the storage resource of node 48 are copied to the storage resource of node 53, the contents of the storage resource of node 40 are copied to the storage resource of node 45, and so on, the contents of the storage resource of node 0 are copied to the storage resource of node 5. The copying operation between the second computing node in the 3 rd column and the first computing node in the 7 th column, and the copying operation between the second computing node in the 5 th column and the first computing node in the 7 th column are similar, and the description is not repeated here.
Through the above operation, the distributed chip is changed from the chip shown in fig. 4 to the chip shown in fig. 5. Fig. 5 is a schematic processing diagram of a distributed chip according to an embodiment of the present disclosure. In fig. 5, the node 56' corresponds to the node 61 in the distributed chip shown in fig. 4, and is obtained by copying the contents of the storage resource of the node 56 into the storage resource of the node 61, so that the unoccupied computing node is changed into the occupied computing node, and the storage resource thereof is also utilized. Node 58', node 60', etc. are similar and will not be described again. In fig. 5, all the computing nodes in the distributed chip are occupied, the corresponding storage resources are also utilized, the computing nodes and the storage resources are fully utilized, and the target computing nodes are determined in a sparse and uniform manner, so that the average hop count among the computing nodes can be reduced, and the effective computing power of the distributed chip is improved.
It should be noted that, if a plurality of consecutive rows or a plurality of consecutive columns of second computing nodes are directly selected as the target computing node, although the number of computing nodes and the number of storage resources are both increased, the average hop count between the computing nodes is also correspondingly increased, which may result in that the effective computing power is not improved. Moreover, in some possible implementation manners, the above operations may also cause the length-to-width ratios of the computing nodes participating in task processing in the distributed chip to be greatly different, and narrow-edge blocking is aggravated, so that the utilization efficiency of computing power is seriously affected. Therefore, it is not suitable to directly select a plurality of consecutive rows or a plurality of consecutive columns of second computing nodes as the target computing nodes to process the distributed chip. According to the method provided by the embodiment of the disclosure, the target computing node is selected in a sparse and uniform manner, and the increase of the average hop count can be reduced while the number of computing nodes and storage resources is increased, so that the effective computing power of the distributed chip is improved.
In some possible implementations, after determining that the target number of columns is 3, 2 columns of second computing nodes may be selected from the 1 st column to the 5 th column as the target computing nodes (i.e., m1=3 and m2= 2).
Exemplarily, a second computing node in the 2 nd column and a second computing node in the 4 th column are selected as target computing nodes, the second computing node in the 2 nd column has a corresponding relationship with the first computing node in the 6 th column, the second computing node in the 4 th column has a corresponding relationship with the first computing node in the 7 th column, and the first computing node in the 8 th column is still in an unoccupied state.
After the target compute node is determined, the contents of the storage resources of the second compute node in column 2 are copied to the storage resources of the first compute node in column 6, i.e., the contents of the storage resources of node 57 are copied to the storage resources of node 61, the contents of the storage resources of node 49 are copied to the storage resources of node 53, the contents of the storage resources of node 41 are copied to the storage resources of node 45, and so on, the contents of the storage resources of node 1 are copied to the storage resources of node 5. The copy operation between the second computing node in column 4 and the first computing node in column 7 is similar and will not be repeated here.
Through the above operation, the distributed chip is changed from the chip shown in fig. 4 to the chip shown in fig. 6.
Fig. 6 is a schematic processing diagram of a distributed chip according to an embodiment of the disclosure. In fig. 6, node 57' corresponds to node 61 in the distributed chip shown in fig. 4, which is obtained by copying the contents of the storage resource of node 57 into the storage resource of node 61, so that the unoccupied computing node is changed into the occupied computing node, and the storage resource thereof is also utilized. Node 49', node 41', etc. are similar and will not be described again.
It should be noted that the case where m2 is equal to 1 is similar to the above, and will not be described herein.
It should be noted that, in the above description, the shape of the area of the first computation node/the second computation node is a regular rectangle, but in some possible implementations, if the computation nodes in the distributed chip are not occupied in a whole row or a whole column, the shape of the area of the first computation node/the second computation node may be irregular. For an irregular region shape, a target calculation node cannot be directly selected according to the above method, but the irregular region shape needs to be divided into a plurality of regular rectangular regions by a region division method before the target calculation node is selected.
In some possible implementations, before step S201, the task processing method according to the embodiment of the present disclosure may further include: determining the area shape of a first computing node according to the node occupation information of the distributed chip; in the case where the region shape does not belong to a rectangle, the region of the first calculation node is divided into at least two rectangular regions.
In one example, the node occupation information of the distributed chip includes the row-column distribution information of the first computing node and the row-column distribution information of the second computing node, and based on the information, the region shape of the first computing node can be determined more accurately and conveniently.
After dividing the irregular area corresponding to the first computing node into a plurality of regular rectangular areas, in some possible implementations, step S201 may include: firstly, respectively determining the fixed direction of a target computing node in each rectangular area according to the row and column distribution information of a first computing node in each rectangular area; and secondly, sequentially determining target computing nodes corresponding to each rectangular area according to the fixed direction, the row-column distribution information of the first computing node and the row-column distribution information of the second computing node.
In some possible implementation manners, sequentially determining target computing nodes corresponding to each rectangular region according to the fixed direction, the row-column distribution information of the first computing node, and the row-column distribution information of the second computing node, includes:
firstly, selecting a target computing node corresponding to a 1 st rectangular area from second computing nodes according to the fixed direction of a first computing node in the 1 st rectangular area, the row-column distribution information of the first computing node in the 1 st rectangular area and the row-column distribution information of the second computing nodes; secondly, selecting a target calculation node corresponding to the kth rectangular area from the second calculation node and target calculation nodes corresponding to the first k-1 rectangular areas according to the fixed direction of the first calculation node in the kth rectangular area, the row and column distribution information of the first calculation node in the kth rectangular area, the distribution information of the target nodes in the first k-1 rectangular areas and the row and column distribution information of the second calculation nodes, wherein k is an integer greater than or equal to 2.
Fig. 7 is a schematic diagram of a distributed chip according to an embodiment of the disclosure. Referring to fig. 7, the distributed chip is formed by cascading four small distributed chips, namely, the first chip to the fourth chip, and the second computing node is represented by a black circle and the first computing node is represented by a white circle. As can be seen from FIG. 7, in the distributed chip, all the compute nodes of the first chip are occupied; the computing nodes in the 1 st row and the 2 nd row of the second chip are occupied, and the computing nodes in the 3 rd row and the 4 th row are not occupied; the 1 st column of computing nodes of the third chip are occupied, and the rest 3 columns of computing nodes are not occupied; the first two compute nodes in the first column of the fourth chip are occupied and the remaining compute nodes are unoccupied. Based on this, it can be determined that the region shape of the first computation node corresponds to "L" rotated 90 degrees counterclockwise, and does not belong to a regular rectangle, and therefore, it is necessary to divide the region of the first computation node into at least two rectangular regions.
Fig. 8 is a schematic diagram of region division according to an embodiment of the present disclosure. Referring to fig. 8, the area of the first computation node is divided into two rectangular areas, and the distribution of the first computation node in the two rectangular areas is 2 × 8 and 6 × 3, respectively.
For a 6 × 3 rectangular region, the fixed direction is the column direction, and the number of columns of the rectangular region is 3, so that the target number of columns is determined to be 3. Since the number of columns of the second compute node is 5 columns and the number of target columns is 3, it is determined that the node column interval is 1. Based on this, second calculation nodes of the 1 st column, the 3 rd column and the 5 th column are respectively selected as target calculation nodes corresponding to the 1 st column, the 2 nd column and the 3 rd column in the 6 × 3 rectangular region, and the obtained distributed chip is shown in fig. 9.
Fig. 9 is a schematic processing diagram of a distributed chip according to an embodiment of the present disclosure. Within the distributed chip shown in fig. 9, the first computing nodes within the 6 × 3 rectangular region have all determined corresponding target computing nodes (e.g., the first computing node 56' corresponds to the node 61 in fig. 6, and its target computing node is the node 56), and only the first computing nodes within the remaining 2 × 8 rectangular region have not determined corresponding target computing nodes.
For a 2 × 8 rectangular region, the fixed direction is the row direction, and the number of rows is 2, so that the target number of rows is determined to be 2. Since the number of lines of the calculation nodes is 6 and the number of target lines is 2 in the new rectangular region composed of the second calculation node and the 6 × 3 rectangular region, the node line interval is determined to be 2. Based on this, the 1 st row and the 4 th row of computation nodes in the new rectangular region are selected as the 1 st row and the 2 nd row of target computation nodes in the 2 × 8 rectangular region, respectively, and the obtained distributed chip is shown in fig. 10.
Fig. 10 is a schematic processing diagram of a distributed chip according to an embodiment of the present disclosure. In fig. 10, all first compute nodes within the distributed chip are designated respective target compute nodes. Taking node 56 "and node 56-as examples: node 56 "corresponds to node 8 in FIG. 7, with the target computing node being node 56; node 56 corresponds to node 13 in FIG. 7, with the target computing node being node 56'. The other nodes are similar to node 56 "and node 56. The description is not repeated here. After the contents in the storage resources of the target computing node are copied to each first computing node, all computing nodes in the distributed chip are in an occupied state, the corresponding storage resources are also utilized, and when the distributed chip executes the task to be processed, the task execution capacity is improved based on all the computing nodes and the storage resources in the computing nodes.
In some possible implementations, after obtaining the distributed chip diagram shown in fig. 8, considering that the nodes 56, 48, 40, etc. have already been subjected to primary redundant storage, for the remaining 2 × 8 rectangular regions, the nodes 56', 48', etc. do not need to be subjected to secondary redundant storage, and based on this, the distributed chip shown in fig. 11 may be obtained.
Fig. 11 is a schematic processing diagram of a distributed chip according to an embodiment of the present disclosure. Referring to fig. 11, nodes 13, 14, 15, 5, 6 and 7 are in an unoccupied state because the nodes 56', 48', etc. are no longer redundantly stored again.
In some possible implementations, the area of the second computing node may be an irregular area, in which case the redundant setting of the nodes may be done in a similar manner.
It should be noted that, the above area division method for the first computing node is only an example, and the embodiment of the present disclosure does not limit this. In some possible implementation manners, the region of the first computing node may be further divided into two other rectangular regions, that is, the right side 3 columns and 8 rows of the first computing nodes in the distributed chip shown in fig. 6 are divided into the first rectangular region, the distribution condition of the first computing nodes in the first rectangular region is 8 × 3, and the remaining 2 rows and 5 columns of the first computing nodes on the lower side of the distributed chip are divided into the second rectangular region, and the distribution condition of the first computing nodes in the second rectangular region is 2 × 5. Accordingly, after determining the target computing node based on the two rectangular regions according to the method and copying the content in the storage resource of the target computing node into the storage resource of the first computing node, the distributed chip may execute the task to be processed based on the first computing node and the second computing node.
In some possible implementation manners, when the area of the first computing node is an irregular area with other shapes, the area may be further divided into more than two rectangular areas, and a target computing node is selected for each rectangular area and content replication is performed. The specific processing manner is similar to the foregoing, and the description is not repeated here.
Fig. 12 is a block diagram of a task processing device based on a distributed chip according to an embodiment of the present disclosure. Referring to fig. 12, the task processing device includes:
a determining module 1201, configured to determine an unoccupied first computing node and an occupied second computing node according to node occupation information of the distributed chip.
The node occupation information is used for representing the occupation condition of the computing nodes in the distributed chip.
A selecting module 1202, configured to select a target computing node from the second computing nodes.
A copying module 1203, configured to copy the content in the storage resource corresponding to the target computing node into the storage resource of the first computing node.
An executing module 1204, configured to execute the task to be processed based on the first computing node and the second computing node.
In some possible implementation manners, the node occupation information includes quantity information of the first computing node and the second computing node, and the quantity of the target computing nodes is less than or equal to that of the first computing node; a determining module 1201, configured to select a target computing node from the second computing nodes according to the quantity information.
In some possible implementations, the node occupancy information includes location information of the first compute node and the second compute node; a determining module 1201, configured to select a computing node from the second computing nodes according to the location information.
Correspondingly, the copying module 1203 is configured to copy, according to the location information, the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node according to the random location.
In some possible implementations, the node occupancy information includes row and column distribution information of the first computing node and row and column distribution information of the second computing node. Accordingly, the selecting module 1202 includes: the device comprises a first direction determining unit and a first node selecting unit. The first direction determining unit is used for determining a fixed direction of a target computing node according to the row-column distribution information of the first computing node, wherein the fixed direction comprises a row direction or a column direction; and the first node selection unit is used for selecting the target computing node from the second computing nodes according to the row-column distribution information of the first computing node, the row-column distribution information of the second computing node and the fixed direction of the target computing node.
In some possible implementations, the node selection unit further includes a target line number determination unit and a first selection execution unit. The target line number determining unit is used for determining a target line number n1 according to the row-column distribution information of the first computing node under the condition that the fixed direction is the line direction; and the first selection execution unit is used for selecting the second computing nodes in the n2 rows as the target computing nodes according to the row-column distribution information of the second computing nodes and the target row number n1, wherein both n1 and n2 are integers greater than or equal to 1, and n2 is less than or equal to n1.
In some possible implementations, the first selection execution unit further includes: a first interval determination subunit and a first execution subunit. The first interval determining subunit is used for determining a node line interval according to the row-column distribution information of the second computing node and the target line number n1; and the first execution subunit is used for selecting the second computing nodes in the n2 rows as target computing nodes according to the node row interval.
In some possible implementations, the node selection unit further includes a target column number determination unit and a second selection execution unit. The target column number determining unit is used for determining a target column number m1 according to the row-column distribution information of the first computing node under the condition that the fixed direction is the column direction; and the second selection execution unit is used for selecting m2 columns of second calculation nodes as target calculation nodes according to the row-column distribution information of the second calculation nodes and the target column number m1, wherein m1 and m2 are integers greater than or equal to 1, and m2 is less than or equal to m1.
In some possible implementations, the second selection performing unit further includes: the second interval defines the subunit and the second execution subunit. The second interval determining subunit is used for determining the node column interval according to the row-column distribution information of the second computing node and the target column number m1; and the second execution subunit is used for selecting m2 columns of second computing nodes as target computing nodes according to the node column interval.
In some possible implementations, the task processing device further includes a shape determination module and a region division module. The shape determining module is used for determining the region shape of the first computing node according to the node occupation information of the distributed chip; and the region dividing module is used for dividing the region of the first computing node into at least two rectangular regions under the condition that the shape of the region does not belong to the rectangle.
In some possible implementations, after the area of the first computing node is divided into at least two rectangular areas based on the area dividing module, the selecting module includes: a second direction determining unit and a second node selecting unit. The second direction determining unit is used for respectively determining the fixed direction of the target computing node in each rectangular area according to the row-column distribution information of the first computing node in each rectangular area; and the second node selection unit is used for sequentially determining the target calculation nodes corresponding to the rectangular areas according to the fixed direction, the row-column distribution information of the first calculation nodes and the row-column distribution information of the second calculation nodes.
In some possible implementations, the second node selecting unit is configured to: selecting a target computing node corresponding to the 1 st rectangular area from the second computing nodes according to the fixed direction of the first computing node in the 1 st rectangular area, the row-column distribution information of the first computing node in the 1 st rectangular area and the row-column distribution information of the second computing nodes; and selecting a target calculation node corresponding to the kth rectangular area from the second calculation node and the target calculation nodes corresponding to the first k-1 rectangular areas according to the fixed direction of the first calculation node in the kth rectangular area, the row and column distribution information of the first calculation node in the kth rectangular area, the distribution information of the target nodes in the first k-1 rectangular areas and the row and column distribution information of the second calculation nodes, wherein k is an integer greater than or equal to 2.
In some possible implementations, the task to be processed includes any one of an image processing task, a voice processing task, a text processing task, and a video processing task, which is not limited in this disclosure.
Fig. 13 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Referring to fig. 13, an embodiment of the present disclosure provides an electronic device including: at least one processor 1301; and memory 1302 communicatively coupled to the at least one processor 1301; the memory 1302 stores one or more computer programs that can be executed by the at least one processor 1301, and the one or more computer programs are executed by the at least one processor 1301, so that the at least one processor 1301 can perform the task processing method described above.
In some embodiments, the electronic device may be a brain-like chip and/or a neural network chip, and may adopt a vectorization calculation method, and need to call in parameters such as weight information of a neural network model through an external memory, for example, a Double Data Rate (DDR) synchronous dynamic random access memory. Therefore, the operation efficiency of batch processing is high in the embodiment of the disclosure.
Furthermore, the disclosed embodiments also provide a computer readable medium, on which a computer program is stored, wherein the computer program, when executed by a processor/processing core, implements the above-described task processing method.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. It will, therefore, be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (20)

1. A task processing method is characterized in that a distributed chip is provided with a plurality of computing nodes distributed in an array, and each computing node is allocated with a corresponding storage resource, and the method comprises the following steps:
determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip;
selecting a target computing node from the second computing nodes, and copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node;
executing a task to be processed based on the first compute node and the second compute node.
2. The task processing method according to claim 1, wherein the node occupancy information includes information on the number of the first computing node and the second computing node, and the number of the target computing nodes is less than or equal to the first computing node;
the selecting a target computing node from the second computing nodes comprises:
and selecting the target computing node from the second computing nodes according to the quantity information.
3. The task processing method according to claim 1, wherein the node occupancy information includes location information of the first computing node and the second computing node;
the selecting a target computing node from the second computing nodes comprises:
and selecting the target computing node from the second computing nodes according to the position information.
4. The task processing method according to claim 3, wherein the copying contents in the storage resource corresponding to the target computing node to the storage resource of the first computing node comprises:
and copying the content in the storage resource corresponding to the target computing node into the storage resource of the first computing node according to the position information and the random position.
5. The task processing method according to claim 3, wherein the location information includes row-column distribution information of the first computing node and the second computing node;
the selecting a target computing node from the second computing nodes comprises:
and selecting the target computing node from the second computing nodes according to the row-column distribution information.
6. The task processing method according to claim 5, wherein the copying contents in the storage resource corresponding to the target computing node to the storage resource of the first computing node comprises:
and copying the content in the storage resource corresponding to the target computing node into the storage resource of the first computing node according to the row and column distribution information and a fixed direction.
7. The task processing method according to claim 6, wherein the fixed direction includes a row direction and/or a column direction;
copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node according to the row-column distribution information, wherein the copying comprises:
and copying the content in the storage resource corresponding to the target computing node into the storage resource of the unoccupied first computing node according to the row direction and/or the column direction.
8. The task processing method according to claim 7, wherein the fixed direction is a row direction, and a target row number is determined to be n1 according to row-column distribution information of the first computing node;
the selection of the target computing node comprises the following steps: and selecting n2 rows of second computing nodes as the target computing nodes according to the row-column distribution information of the second computing nodes and the target row number n1, wherein n2 is less than or equal to n1.
9. The task processing method according to claim 8, wherein the selecting n2 rows of second computing nodes as the target computing nodes according to the row-column distribution information of the second computing nodes and the target row number n1 includes:
determining a node line interval according to the row-column distribution information of the second computing node and the target line number n1;
and selecting n2 rows of second computing nodes as the target computing nodes according to the node row interval.
10. The task processing method according to claim 7, wherein the fixed direction is a column direction, and a target column number is determined to be m1 according to the column-row distribution information of the first computing node;
the selection of the target computing node comprises the following steps: and selecting m2 columns of second computing nodes as the target computing nodes according to the row-column distribution information of the second computing nodes and the target column number m1, wherein m2 is not more than m1.
11. The task processing method according to claim 10, wherein the selecting m2 columns of second computing nodes as the target computing nodes according to the row-column distribution information of the second computing nodes and the target column number m1 includes:
determining node column intervals according to the column-row distribution information of the second computing nodes and the target column number m1;
and selecting m2 rows of second computing nodes as the target computing nodes according to the node row intervals.
12. The task processing method according to claim 1, wherein before selecting a target computing node from the second computing nodes occupied by the distributed chip according to the node occupation information of the distributed chip, the method further comprises:
determining the area shape of the first computing node according to the node occupation information of the distributed chip;
in the case where the region shape does not belong to a rectangle, dividing the region of the first computation node into at least two rectangular regions.
13. The task processing method according to claim 12, wherein the selecting a target computing node from among the second computing nodes occupied by the distributed chip according to the node occupancy information of the distributed chip comprises:
respectively determining the fixed direction of a target computing node in each rectangular area according to the row-column distribution information of the first computing node in each rectangular area;
and sequentially determining target computing nodes corresponding to each rectangular area according to the fixed direction, the row and column distribution information of the first computing nodes and the row and column distribution information of the second computing nodes.
14. The task processing method according to claim 1, wherein the selecting a target computing node from among the occupied second computing nodes of the distributed chip comprises:
selecting the target computing node from the second computing nodes according to the load information of the second computing nodes;
wherein the load information includes at least a calculation frequency and a calculation amount.
15. The task processing method according to claim 1, wherein the distributed chip includes one chip or a distributed chip formed by cascading a plurality of chips.
16. The task processing method according to any one of claims 1 to 15, wherein the task to be processed includes any one of an image processing task, a voice processing task, a text processing task, and a video processing task.
17. A task processing apparatus in which a distributed chip is provided with a plurality of computing nodes distributed in an array, and each of the computing nodes is allocated with a corresponding storage resource, the apparatus comprising:
the determining module is used for determining an unoccupied first computing node and an occupied second computing node according to the node occupation information of the distributed chip;
a selecting module, configured to select a target computing node from the second computing nodes;
the copying module is used for copying the content in the storage resource corresponding to the target computing node to the storage resource of the first computing node;
and the execution module is used for executing the task to be processed based on the first computing node and the second computing node.
18. A distributed chip, comprising: the distributed chip is provided with a plurality of computing nodes distributed in an array, each computing node is allocated with corresponding storage resources, each computing node comprises an unoccupied first computing node and an occupied second computing node, each second computing node comprises a target computing node, and each target computing node is a second computing node with the storage resources connected with the storage resources of the corresponding first computing node.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the task processing method of any one of claims 1-16.
20. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out a method of task processing according to any one of claims 1 to 16.
CN202210961374.4A 2022-08-11 2022-08-11 Task processing method and device, distributed chip, electronic device and medium Pending CN115562848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210961374.4A CN115562848A (en) 2022-08-11 2022-08-11 Task processing method and device, distributed chip, electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210961374.4A CN115562848A (en) 2022-08-11 2022-08-11 Task processing method and device, distributed chip, electronic device and medium

Publications (1)

Publication Number Publication Date
CN115562848A true CN115562848A (en) 2023-01-03

Family

ID=84739899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210961374.4A Pending CN115562848A (en) 2022-08-11 2022-08-11 Task processing method and device, distributed chip, electronic device and medium

Country Status (1)

Country Link
CN (1) CN115562848A (en)

Similar Documents

Publication Publication Date Title
US3654615A (en) Element placement system
US7665092B1 (en) Method and apparatus for distributed state-based load balancing between task queues
CN112272102B (en) Method and device for unloading and scheduling edge network service
KR20100004605A (en) Method for selecting node in network system and system thereof
CN109189327A (en) The compression processing method and device of block chain data
JP2017059058A (en) Parallel information processing device, communication procedure determination method, and communication procedure determination program
CN114721993A (en) Many-core processing device, data processing method, data processing equipment and medium
CN101951661B (en) Address distribution method in sensor network and sensor network node
CN111858069B (en) Cluster resource scheduling method and device and electronic equipment
Gastineau et al. Leader election and local identifiers for three‐dimensional programmable matter
CN115562848A (en) Task processing method and device, distributed chip, electronic device and medium
Buyukates et al. Gradient coding with dynamic clustering for straggler mitigation
CN110838993B (en) Subband switched path planning method and system
CN114546493A (en) Core sharing method and device, processing core, electronic device and medium
CN114492292A (en) Method and device for configuring chip, equipment and storage medium
CN107015883B (en) Dynamic data backup method and device
CN105224501B (en) The method and apparatus improved annulus torus network and its determine data packet transmission path
CN114741217B (en) Method, device, equipment and storage medium for determining fault tolerance of network structure
CN116304212A (en) Data processing system, method, equipment and storage medium
CN111866078B (en) Networking method and system for dynamic heterogeneous P2P network
US11163470B2 (en) Method, electronic device and computer program product for managing redundant arrays of independent disks
CN114006813B (en) Dynamic generation method and system for virtual private line distribution route
CN112468317A (en) Cluster topology updating method, system, equipment and computer storage medium
US20230267260A1 (en) Method for layout placement and routing, circuit layout, electronic device, and storage medium
Sarrafzadeh Hierarchical approaches to VLSI circuit layout

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination