CN115470174A - Route generation method and device, many-core system and computer readable medium - Google Patents

Route generation method and device, many-core system and computer readable medium Download PDF

Info

Publication number
CN115470174A
CN115470174A CN202110647124.9A CN202110647124A CN115470174A CN 115470174 A CN115470174 A CN 115470174A CN 202110647124 A CN202110647124 A CN 202110647124A CN 115470174 A CN115470174 A CN 115470174A
Authority
CN
China
Prior art keywords
core
core object
input
output
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110647124.9A
Other languages
Chinese (zh)
Inventor
吴臻志
何伟
丁瑞强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202110647124.9A priority Critical patent/CN115470174A/en
Publication of CN115470174A publication Critical patent/CN115470174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17312Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present disclosure provides a route generation method, including: determining a connection relation between at least one first core object in a first core object cluster and at least one second core object in a second core object cluster according to a task docking relation between the first core object cluster and the second core object cluster, wherein each first core object describes configuration information of one processing core of a many-core system, and each second core object describes configuration information of one processing core of the many-core system; and determining the hardware route of the many-core system according to the connection relation. The disclosure also provides a route generation device, a many-core system and a computer readable medium.

Description

Route generation method and device, many-core system and computer readable medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a route generation method, a route generation apparatus, a many-core system, and a computer-readable medium.
Background
The many-core system may be composed of at least one chip, each chip having multiple computing units, and the smallest computing unit in each chip that can be independently scheduled and has full computing power is called a processing core.
In the many-core system, a plurality of processes can work jointly, each processing core can respectively and independently run program instructions, the running speed of a program is accelerated by utilizing the parallel computing capability, and the multitask processing capability is provided.
There is a need for an efficient method of generating many-core system routes.
Disclosure of Invention
The present disclosure provides a route generation method, a route generation apparatus, a many-core system, and a computer readable medium.
In a first aspect, an embodiment of the present disclosure provides a method for generating a route, including:
determining a connection relation between at least one first core object in a first core object cluster and at least one second core object in a second core object cluster according to a task docking relation between the first core object cluster and the second core object cluster, wherein each first core object describes configuration information of one processing core of a many-core system, and each second core object describes configuration information of one processing core of the many-core system;
and determining the hardware route of the many-core system according to the connection relation.
In some embodiments, the step of determining a connection relationship between at least one first core object in the first core object cluster and at least one second core object in the second core object cluster according to the task interfacing relationship between the first core object cluster and the second core object cluster includes:
obtaining output vectors of the at least one first core object, wherein each output vector corresponds to one first core object;
scheduling the output vector to obtain at least one vector to be input, wherein each vector to be input corresponds to one second core object;
respectively acquiring a vector to be input corresponding to the at least one second core object through the at least one second core object to obtain an input vector of each second core object;
and determining the connection relation according to the corresponding relation between at least one output vector and at least one input vector.
In some embodiments, each of the first core objects corresponds to at least one of the output vectors, each of the output vectors carries a source address, and each of the vectors to be input carries a source address of its corresponding output vector; each second core object corresponds to at least one input vector, and each input vector carries a destination address; the step of determining the connection relation according to the correspondence between at least one of the output vectors and at least one of the input vectors comprises:
respectively acquiring a source address carried by a vector to be input corresponding to the second core object through the at least one second core object;
and determining the corresponding relation between at least one source address and at least one destination address to obtain a receiving table, wherein the receiving table represents the connection relation.
In some embodiments, the step of determining the connection relation according to the correspondence of at least one of the output vectors and at least one of the input vectors further comprises:
and generating a sending table according to the receiving table, wherein the sending table represents the connection relation, and one source address in the sending table corresponds to at least one destination address.
In some embodiments, comprising a plurality of output vectors, scheduling said output vectors to obtain at least one input vector comprises:
generating an output matrix from a plurality of said output vectors;
and generating a plurality of vectors to be input according to the output matrix.
In some embodiments, the step of generating an output matrix from a plurality of said output vectors comprises:
and arranging the output vectors to construct the output matrix.
In some embodiments, the step of generating a plurality of said input vectors from said output matrix comprises:
arranging the output matrix to obtain an input matrix;
splitting the input matrix into a plurality of vectors to be input.
In some embodiments, the step of obtaining the output vector of the at least one first core object comprises:
and calling an output function of each first core object to obtain an output vector of each first core object.
In some embodiments, the step of obtaining the corresponding input vectors by the at least one second core object respectively comprises:
and calling the input function of each second core object to obtain the input vector of each second core object.
In some embodiments, each of the output vectors carries a time identifier, where the time identifier characterizes a time at which one of the second core objects performs a computation corresponding to the output vector; each vector to be input carries a moment identifier of a corresponding output vector; the route generation method further comprises:
respectively acquiring time marks carried by corresponding vectors to be input through a plurality of second core objects;
and determining the corresponding relation between a plurality of time instant identifications and a plurality of input vectors.
In some embodiments, the second core object corresponds to a plurality of storage subspaces, and the computation corresponding to at least one input vector stored in the storage subspaces is performed at the same time; the step of determining a correspondence of a plurality of said time instant identifications to a plurality of said input vectors comprises:
and respectively determining storage subspaces for storing the input vectors corresponding to the time identifications, and generating a calculation time table representing the corresponding relation between the time identifications and the storage subspaces.
In some embodiments, determining the hardware route of the many-core system according to the connection relationship comprises:
mapping the first core object cluster and the second core object cluster to the many-core system respectively, wherein each first core object corresponds to one processing core, and each second core object corresponds to one processing core;
and generating a hardware routing table according to the corresponding relation between at least one first core object and at least one processing core, the corresponding relation between at least one second core object and at least one processing core and the topological structure of the many-core system, and representing the hardware routing.
In a second aspect, an embodiment of the present disclosure provides a route generation apparatus, including:
one or more processors;
a memory on which one or more programs are stored, the one or more programs, when executed by the one or more processors, causing the one or more processors to implement any one of the route generation methods described in the first aspect of the embodiments of the present disclosure;
one or more I/O interfaces connected between the processor and the memory and configured to enable information interaction between the processor and the memory.
In a third aspect, an embodiment of the present disclosure provides a many-core system, including:
a plurality of processing cores; and
a network on chip configured to interact data among the plurality of processing cores and external data;
one or more instructions are stored in one or more of the processing cores, and the one or more instructions are executed by the one or more processing cores to enable the one or more processing cores to execute any one of the route generation methods described in the first aspect of the embodiments of the present disclosure.
In a fourth aspect, the present disclosure provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the route generation methods described in the first aspect of the present disclosure.
The embodiment of the disclosure provides a route generation method, which can determine a connection relationship between core objects in each core object cluster according to a task docking relationship of the core object cluster defined by a software layer, and generate a hardware route mapping a plurality of core object clusters to a many-core system according to the connection relationship, so that the efficiency of route generation of the many-core system is improved, and the plurality of core object clusters are conveniently mapped to the many-core system.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
fig. 1 is a flow chart of a method of route generation in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of task docking in an embodiment of the present disclosure;
FIG. 3 is a flow chart of some steps in another route generation method in an embodiment of the present disclosure;
FIG. 4 is a flow chart of some steps in a route generation method according to yet another embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating one implementation of generating a display look-up table in an embodiment of the present disclosure;
FIG. 6 is a flow chart of some steps in a further method of route generation according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a route generation apparatus according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of a many-core system in accordance with an embodiment of the disclosure.
Detailed Description
To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, … … specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In a first aspect, referring to fig. 1, an embodiment of the present disclosure provides a route generation method, including:
in step S100, determining a connection relationship between at least one first core object in a first core object cluster and at least one second core object in a second core object cluster according to a task docking relationship between the first core object cluster and the second core object cluster, where each first core object describes configuration information of one processing core of a many-core system, and each second core object describes configuration information of one processing core of the many-core system;
in step S200, the hardware route of the many-core system is determined according to the connection relationship.
In the embodiments of the present disclosure, the processing cores of the many-core system may be regarded as reconfigurable functional units, that is, each processing core has a basic function and can be configured as a functional core having a specific function, and a plurality of processing cores configured as specific functions can constitute a processing core cluster having a specific function.
The core objects are obtained by instantiating the processing core classes, the core objects are descriptions of the processing cores of the many-core system at a software level, and the core object cluster consisting of at least one core object is a description of the processing core cluster consisting of at least one processing core at a software level.
The embodiment of the present disclosure does not specifically limit the functional core class. For example, there may be multiple functional core classes, with different functional core classes defining functional cores with different functions. In some embodiments, the plurality of first core objects may be instantiated by different functional core classes; multiple second core objects may also be instantiated by different functional core classes.
In the embodiment of the present disclosure, the first core object cluster and the second core object cluster refer to two core object clusters having a task interfacing relationship. The task docking relationship means that the first core object cluster executes the calculation task and transmits the obtained output data to the second core object cluster, and the second core object cluster receives the output data and executes the calculation task. It should be further noted that the first core object cluster and the second core object cluster may be any pair having a task interfacing relationship in a plurality of core object clusters. For example, when a neural network is built by using a plurality of core object clusters and each core object cluster executes one computation node of the neural network, the first core object cluster and the second core object cluster respectively correspond to one computation node in adjacent layers of the neural network. In a program constructed by a plurality of core object clusters, the connection relationship between any two core object clusters can be determined through step S100, and a hardware route in which all the core object clusters are mapped to a many-core system is generated through step S200.
In some embodiments, the input, output and routing transmission of the first core object cluster and the second core object cluster may be transformed by data transposition, order interchange, neural network connection relation, and the like. Referring to FIG. 2, taking tasks Task A and Task B as examples, the output of Task A is D, and D is the input of Task B; the Task A is mapped to a core object Cluster A, and the Task A is executed as an input Routing process (Cluster A In Routing), a calculation process (Cluster A Computing) and an output Routing process (Cluster A Out Routing); the Task B is mapped to a core object Cluster B, and the Task B is executed as Cluster B In Routing, cluster B Computing and Cluster B Out Routing; through Route Recording (Route Recording), cluster A In Routing and Cluster B In Routing can be Parallel (In Parallel), cluster A Routing and Cluster B Routing can be Parallel, cluster A Out Routing and Cluster B Out Routing can be Parallel; the Task a, task B docking process can be characterized as a Look Up Table (LUT), a Data Transfer (Data Transfer) process, a parallel Cluster a Computing and a Cluster B Computing through Route Extracted (Route Extracted). FIG. 2 shows various substitution Opportunities (Permuting Opportunities), for example, the Task A output and Task B input processes may perform transformations of data transposition, order interchange, neural network connection relation, etc.; parallel route input, parallel route output and parallel calculation processes can perform transformation such as data transposition, sequence interchange, neural network connection relation and the like. In the disclosed embodiment, at least one of the above transformations can be characterized by the connection relationship determined in step S100.
The embodiment of the disclosure provides a route generation method, which can determine a connection relationship between core objects in each core object cluster according to a task docking relationship of the core object cluster defined by a software layer, and generate a hardware route mapping a plurality of core object clusters to a many-core system according to the connection relationship, so that the efficiency of route generation of the many-core system is improved, and the plurality of core object clusters are conveniently mapped to the many-core system.
In some embodiments, an abstract routing model is introduced. In the abstract routing model, an output vector of a first core object and an input vector of a second core object can be obtained, and a plurality of replacement opportunities such as input replacement, output replacement, system-level replacement and the like can be supported. For example, the abstract routing model supports a replacement opportunity, which means that at least one output object of the first core object can be scheduled according to input and output of the first core object cluster and the second core object cluster and transformation such as data transposition, order interchange, neural network connection relation and the like which may occur in routing transmission, so that logical connection between the first core object and the second core object is established in the abstract routing model. A connection relationship of the at least one first core object and the at least one second core object can be determined based on the logical connection.
Accordingly, referring to fig. 3, in some embodiments, step S100 comprises:
in step S110, obtaining output vectors of the at least one first core object, where each of the output vectors corresponds to one of the first core objects;
in step S120, scheduling the output data to obtain at least one to-be-input vector, where each to-be-input vector corresponds to one second core object;
in step S130, obtaining a corresponding to-be-input vector through at least one second core object, to obtain an input vector of each second core object;
in step S140, the connection relationship is determined according to a corresponding relationship between at least one of the output vectors and at least one of the input vectors.
In some embodiments, the abstract routing model can call an output function of each first core object to obtain an output vector of each first core object.
In some embodiments, the abstract routing model can call an input function of each second core object to obtain an input vector of each second core object.
The embodiment of the present disclosure does not specifically limit how to express the connection relationship. In some embodiments, the connection relationship is expressed as a display Look-Up Table (LUT). The LUT comprises a receiving table and a sending table, and the corresponding relation between a source address and a destination address is recorded in the LUT.
It should be noted that, in the embodiment of the present disclosure, the first core object cluster is regarded as a sending end, and the second core object cluster is regarded as a receiving end. Correspondingly, relative to the second core object, the address corresponding to the output vector of the first core object is a source address; relative to the first core object, the address corresponding to the input vector of the second core object is the destination address.
In some embodiments, a receiving table can be determined with a destination address as an index and a source address as a table value by simulating a data transfer process from a first core object to a second core object through an abstract routing model. Wherein, in the receiving table, each destination address corresponds to a source address.
Accordingly, in some embodiments, each of the first core objects corresponds to at least one of the output vectors, each of the output vectors carries a source address, and each of the vectors to be input carries a source address of its corresponding output vector; each second core object corresponds to at least one input vector, and each input vector carries a destination address; referring to fig. 4, step S140 includes:
in step S141, respectively obtaining, by the at least one second core object, a source address carried by a vector to be input corresponding to the second core object;
in step S142, a corresponding relationship between at least one source address and at least one destination address is determined, and a receiving table is obtained, where the receiving table represents the connection relationship.
In some embodiments, in the case of obtaining the receiving table, a sending table with the source address as an index and the destination address as a table value can be obtained by a reverse-operation. Since there may be a multicast relationship between the first core object and the second core object, i.e. the output vector of one first core object is transmitted to a plurality of second core objects, each source address corresponds to at least one destination address in the forwarding table.
Accordingly, in some embodiments, referring to fig. 4, step S140 further comprises:
in step S143, a sending table is generated according to the receiving table, the sending table represents the connection relationship, and one source address in the sending table corresponds to at least one destination address.
The form of the source address and the destination address is not particularly limited in the embodiments of the present disclosure. In some embodiments, each core object has a core identification (e.g., core number), each output vector of the core object has an output address, each input vector has an input address; the destination address is calculated according to the core identifier and the input address, and the source address is calculated according to the core identifier and the output address. In some embodiments, a virtual serial number is configured for each output vector of the core object, as an output address of the output vector, and the source address is represented by a binary group of the core number and the virtual number; meanwhile, a virtual serial number is configured for each input vector, and is used as an input address of the input vector, and a destination address is represented by a binary group of a core number and the virtual number.
In some embodiments, the output vector and its source address are obtained simultaneously by calling the output function of the first core object; and (3) an input function of the second core object is investigated, an input vector and the source address are simultaneously obtained, the source address and the destination address are correspondingly written into a receiving table of the LUT, and then the receiving table is used for carrying out reverse thrust to obtain a sending table of the LUT.
In some embodiments, as shown in fig. 5, core 1, core 2 are first Core objects and Core3, core 4, core 5 are second Core objects. Wherein, core 1 comprises output vectors with source addresses (src.) of (1,10) and (1,20), core 2 comprises output vectors with source addresses of (2,15) and (2,30), core3 comprises input vectors with destination addresses (dest.) of (3,5), 3,10 and (3,15), core 4 comprises input vectors with destination addresses of (4,5), and Core 5 comprises input vectors with destination addresses of (5,10) and (5,15). The output vectors of Core 2 with source addresses of (2,15) and (2,30) are multicast to Core3 and Core 5. Fig. 5 also shows the reception table (Rtable) and the transmission table (Ftable) obtained by the back-extrapolation of the reception table.
The embodiment of the present disclosure does not make any special limitation on how to execute step S120 to implement scheduling of the output vector and obtain the to-be-input vector. For example, the abstract routing model may shape a plurality of output vectors into an output matrix and then generate a plurality of vectors to be input from the output matrix.
Accordingly, in some embodiments, scheduling the output data to obtain a plurality of input vectors comprises:
generating an output matrix from a plurality of said output vectors;
generating a plurality of said input vectors from said output matrix.
In some embodiments, in the output routing process, when the output matrix is generated according to the multiple output vectors, the multiple output vectors are sorted first, and the output matrix is generated according to the sorted output vectors. I.e. to implement output permutation.
Accordingly, in some embodiments, the step of generating an output matrix from a plurality of said output vectors comprises:
and arranging the output vectors to construct the output matrix.
In some embodiments, in the input routing process, the output matrices may be arranged to obtain an input matrix, and then the input matrix is split into a plurality of vectors to be input. I.e. to implement input permutation.
Accordingly, in some embodiments, comprising a plurality of output vectors, the step of generating a plurality of said input vectors from said output matrix comprises:
arranging the output matrix to obtain an input matrix;
splitting the input matrix into a plurality of vectors to be input.
In some embodiments, a time identifier is carried when a first core object in a first core object cluster serving as a sending end sends data, a time identifier is received while a second core object in a second core object cluster serving as a receiving end receives data, and corresponding calculation is performed when a time indicated by the time identifier arrives according to the time identifier. Compared with some related technologies in which a sending end sends data at a specific time so that a receiving end performs corresponding calculation at the specific time, the sending end in the embodiment of the present disclosure does not need to send data at the specific time.
In some embodiments, data corresponding to different time instants at the receiving end are stored in different storage sub-spaces, and the data received by the receiving end is written into the corresponding storage sub-spaces according to the time instant identifier. And when the corresponding time arrives, the core object at the receiving end reads data required for executing the calculation at the time from the corresponding storage subspace.
In the embodiment of the present disclosure, while generating the LUT, the receiving end determines the corresponding relationship between the time identifier and the input vector to generate the timing relationship between the first core object cluster and the first core object cluster.
In some embodiments, each of the output vectors carries a time identifier, where the time identifier characterizes a time at which one of the second core objects performs a computation corresponding to the output vector; each vector to be input carries a moment identifier of a corresponding output vector; the route generation method further comprises:
respectively acquiring time marks carried by corresponding vectors to be input through a plurality of second core objects;
and determining the corresponding relation between a plurality of time instant identifications and a plurality of input vectors.
In some embodiments, the correspondence of the time identity to the storage subspace is characterized by a configuration computation schedule.
Accordingly, in some embodiments, the second core object corresponds to a plurality of storage subspaces, and the computation corresponding to at least one of the input vectors stored in the storage subspaces is performed at the same time; the step of determining a correspondence of a plurality of said time instant identifications to a plurality of said input vectors comprises:
and respectively determining storage subspaces for storing the input vectors corresponding to the time identifications, and generating a calculation time table representing the corresponding relation between the time identifications and the storage subspaces.
For example, data 1 and data 2 correspond to time t1, data 3 corresponds to time t2, and when the receiving end receives data 1 and data 2, the receiving end stores data 1 and data 2 into Memory subspace Memory 1; and when receiving the data 3, the receiving end stores the data 3 into the Memory subspace Memory 2. And simultaneously generating a calculation time table, wherein the calculation time table stores a table item representing the corresponding relation between the t1 time and the storage subspace Memory1, and also stores a table item representing the corresponding relation between the t2 time and the storage subspace Memory 2. When the time t1 comes, the second core object can read data 1 and data 2 from the Memory1 corresponding to the time t1 according to the calculation time table to perform corresponding calculation; when the time t2 comes, the second core object may read data 3 from the Memory 2 corresponding to the time t2 according to the calculation time table to perform corresponding calculation.
In the embodiment of the present disclosure, for the neural network, when the receiving end receives multiple data, the multiple data may be placed on one axon (axon), or each data may be placed on one axon. The embodiment of the present disclosure is not particularly limited in this regard.
In some embodiments, referring to fig. 6, step S200 comprises:
in step S210, mapping the first core object cluster and the second core object cluster to the many-core system, respectively, where each of the first core objects corresponds to one of the processing cores, and each of the second core objects corresponds to one of the processing cores;
in step S220, a hardware routing table is generated according to a correspondence relationship between at least one first core object and at least one processing core, a correspondence relationship between at least one second core object and at least one processing core, and a topology structure of the many-core system, so as to characterize the hardware routing.
In some embodiments, the plurality of first core objects and the plurality of second core objects are treated as a one-dimensional (1D) sequence in step S100, and the 1D core objects are mapped into a two-dimensional (2D) many-core system in step S210; and under the condition that the corresponding relation between the first core object and the second core object is expressed by using the LUT, determining a routing path in the many-core system according to the LUT, and generating a hardware routing table.
In some embodiments, after the LUT is generated through step S100, the correctness of the LUT table may also be verified. The abstract routing model calls an output function of each core object cluster, transmits data to each destination address according to a receiving table of the LUT, and reports an error if the data transmitted to a certain destination address cannot be transmitted. In some embodiments, a virtual core (dummy core) may be used to collect unreachable invalid data.
In some embodiments, verifying the correctness of the LUT table may enable multicast detection. And carrying out multicast detection by detecting whether the addresses of the input buffers of the core object clusters with multicast relation are consistent. In some embodiments, when there is an inconsistency, the data is sorted by the rescheduling core with the new routing table.
In a second aspect, referring to fig. 7, an embodiment of the present disclosure provides a route generation apparatus, including:
one or more processors 101;
a memory 102, on which one or more programs are stored, which, when executed by one or more processors, cause the one or more processors to implement any one of the route generation methods described in the first aspect of the embodiments of the present disclosure;
one or more I/O interfaces 103 coupled between the processor and the memory and configured to enable information interaction between the processor and the memory.
The processor 101 is a device with data processing capability, which includes but is not limited to a Central Processing Unit (CPU), etc.; memory 102 is a device having data storage capabilities including, but not limited to, random access memory (RAM, more specifically SDRAM, DDR, etc.), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), FLASH memory (FLASH); an I/O interface (read/write interface) 103 is connected between the processor 101 and the memory 102, and can realize information interaction between the processor 101 and the memory 102, which includes but is not limited to a data Bus (Bus) and the like.
In some embodiments, the processor 101, memory 102, and I/O interface 103 are interconnected via a bus 104, which in turn connects with other components of the computing device.
In a third aspect, with reference to fig. 8, embodiments of the present disclosure provide a many-core system, comprising:
the system comprises a plurality of processing cores 201 and a network on chip 202, wherein the plurality of processing cores 201 are all connected with the network on chip 202, and the network on chip 202 is used for interacting data among the plurality of processing cores and external data.
One or more instructions are stored in the one or more processing cores 201, and the one or more instructions are executed by the one or more processing cores 201, so that the one or more processing cores 201 can execute any one of the route generation methods described in the first aspect of the embodiments of the present disclosure.
In a fourth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the route generation methods described in the first aspect of the present disclosure.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as is well known to those skilled in the art.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (15)

1. A route generation method, comprising:
determining a connection relation between at least one first core object in a first core object cluster and at least one second core object in a second core object cluster according to a task docking relation between the first core object cluster and the second core object cluster, wherein each first core object describes configuration information of one processing core of a many-core system, and each second core object describes configuration information of one processing core of the many-core system;
and determining the hardware route of the many-core system according to the connection relation.
2. The route generation method according to claim 1, wherein the step of determining a connection relationship between at least one first core object in the first core object cluster and at least one second core object in the second core object cluster according to a task interfacing relationship between the first core object cluster and the second core object cluster comprises:
obtaining output vectors of the at least one first core object, wherein each output vector corresponds to one first core object;
scheduling the output vector to obtain at least one vector to be input, wherein each vector to be input corresponds to one second core object;
respectively acquiring a vector to be input corresponding to the at least one second core object through the at least one second core object to obtain an input vector of each second core object;
and determining the connection relation according to the corresponding relation between at least one output vector and at least one input vector.
3. The route generation method according to claim 2, wherein each of the first core objects corresponds to at least one of the output vectors, each of the output vectors carries a source address, and each of the vectors to be input carries a source address of its corresponding output vector; each second core object corresponds to at least one input vector, and each input vector carries a destination address; the step of determining the connection relation according to the correspondence between at least one of the output vectors and at least one of the input vectors comprises:
respectively acquiring a source address carried by a vector to be input corresponding to the second core object through the at least one second core object;
and determining the corresponding relation between at least one source address and at least one destination address to obtain a receiving table, wherein the receiving table represents the connection relation.
4. The route generation method according to claim 3, wherein the step of determining the connection relation according to the correspondence of at least one of the output vectors and at least one of the input vectors further comprises:
and generating a sending table according to the receiving table, wherein the sending table represents the connection relation, and one source address in the sending table corresponds to at least one destination address.
5. The route generation method of claim 2, wherein a plurality of output vectors are included, and wherein scheduling the output vectors to obtain at least one input vector comprises:
generating an output matrix from a plurality of said output vectors;
and generating a plurality of vectors to be input according to the output matrix.
6. The route generation method according to claim 5, wherein the step of generating an output matrix from the plurality of output vectors comprises:
and arranging the output vectors to construct the output matrix.
7. The route generation method according to claim 5 or 6, wherein the step of generating a plurality of the input vectors from the output matrix comprises:
arranging the output matrix to obtain an input matrix;
and splitting the input matrix into a plurality of vectors to be input.
8. The route generation method according to any of claims 2 to 6, wherein the step of obtaining the output vector of the at least one first core object comprises:
and calling an output function of each first core object to obtain an output vector of each first core object.
9. The route generation method according to any one of claims 2 to 6, wherein the step of obtaining the corresponding input vector by the at least one second core object respectively comprises:
and calling the input function of each second core object to obtain the input vector of each second core object.
10. The route generation method according to claim 2, wherein each of the output vectors carries a time identifier, and the time identifier represents a time at which one of the plurality of second core objects performs the computation corresponding to the output vector; each vector to be input carries a moment identifier of a corresponding output vector; the route generation method further comprises:
respectively acquiring time marks carried by corresponding vectors to be input through a plurality of second core objects;
and determining the corresponding relation between a plurality of time instant identifications and a plurality of input vectors.
11. The route generation method according to claim 10, wherein the second core object corresponds to a plurality of storage subspaces, and the computation corresponding to at least one of the input vectors stored in the storage subspaces is performed at the same time; the step of determining a correspondence of a plurality of said time instant identifications to a plurality of said input vectors comprises:
and respectively determining storage subspaces for storing the input vectors corresponding to the time identifications, and generating a calculation time table representing the corresponding relation between the time identifications and the storage subspaces.
12. The route generation method according to any one of claims 1 to 6, 10, and 11, wherein the step of determining the hardware route of the many-core system according to the connection relationship comprises:
mapping the first core object cluster and the second core object cluster to the many-core system respectively, wherein each first core object corresponds to one processing core, and each second core object corresponds to one processing core;
and generating a hardware routing table according to the corresponding relation between at least one first core object and at least one processing core, the corresponding relation between at least one second core object and at least one processing core and the topological structure of the many-core system, and representing the hardware routing.
13. A route generation apparatus comprising:
one or more processors;
a memory having one or more programs stored thereon that, when executed by the one or more processors, cause the one or more processors to implement the route generation method of any one of claims 1 to 12;
one or more I/O interfaces connected between the processor and the memory and configured to enable information interaction between the processor and the memory.
14. A many-core system, comprising:
a plurality of processing cores; and
a network on chip configured to interact data among the plurality of processing cores and external data;
one or more of the processing cores have stored therein one or more instructions that are executed by the one or more processing cores to enable the one or more processing cores to perform the route generation method of any of claims 1 to 12.
15. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the route generation method according to any one of claims 1 to 12.
CN202110647124.9A 2021-06-10 2021-06-10 Route generation method and device, many-core system and computer readable medium Pending CN115470174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110647124.9A CN115470174A (en) 2021-06-10 2021-06-10 Route generation method and device, many-core system and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110647124.9A CN115470174A (en) 2021-06-10 2021-06-10 Route generation method and device, many-core system and computer readable medium

Publications (1)

Publication Number Publication Date
CN115470174A true CN115470174A (en) 2022-12-13

Family

ID=84365063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110647124.9A Pending CN115470174A (en) 2021-06-10 2021-06-10 Route generation method and device, many-core system and computer readable medium

Country Status (1)

Country Link
CN (1) CN115470174A (en)

Similar Documents

Publication Publication Date Title
US8769034B2 (en) Query performance data on parallel computer system having compute nodes
US8984085B2 (en) Apparatus and method for controlling distributed memory cluster
CN107633016B (en) Data processing method and device and electronic equipment
CN108121608A (en) A kind of array dispatching method and node device
CN108351806A (en) Database trigger of the distribution based on stream
CN1997987A (en) An apparatus and method for packet coalescing within interconnection network routers
CN109684099A (en) Message treatment method and device
CN113347164A (en) Block chain-based distributed consensus system, method, device and storage medium
CN111083179B (en) Internet of things cloud platform, and equipment interaction method and device based on same
US11941514B2 (en) Method for execution of computational graph in neural network model and apparatus thereof
US20080222303A1 (en) Latency hiding message passing protocol
CN111798238A (en) Parallel chain consensus method, device and storage medium
CN115629844A (en) Virtual machine migration method, system, equipment and storage medium
US11275661B1 (en) Test generation of a distributed system
CN115408107A (en) Thermal migration method, equipment, system and storage medium of virtual machine
CN107391672A (en) The reading/writing method of data and the distributed file system of message
US10496592B2 (en) System and method to effectively overlap computation and reduction operation of nonblocking collective communication
CN106789446A (en) The cluster distributed test frame and method of a kind of node equity
CN115470174A (en) Route generation method and device, many-core system and computer readable medium
CN109361625A (en) Verify the method, apparatus and controller of forwarding-table item
US9203733B2 (en) Method of pseudo-dynamic routing in a cluster comprising static communication links and computer program implementing that method
CN110266610B (en) Traffic identification method and device and electronic equipment
CN106156069B (en) Log system and log recording method
US11061720B2 (en) Processing system and method of detecting congestion in processing system
CN111984202A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination