CN113778938A

CN113778938A - Method and device for determining network-on-chip topological structure and chip

Info

Publication number: CN113778938A
Application number: CN202111014880.4A
Authority: CN
Inventors: 王坚烽
Original assignee: Shanghai Power Tensors Intelligent Technology Co Ltd
Current assignee: Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-12-10
Anticipated expiration: 2041-08-31
Also published as: WO2023029487A1; CN113778938B

Abstract

The disclosure provides a method, a device and a chip for determining a network-on-chip topological structure, wherein the method comprises the following steps: acquiring first connection relations of a plurality of on-chip components of an on-chip system and attribute information of the plurality of on-chip components; simplifying the first connection relation based on the attribute information of the on-chip components to obtain second connection relations corresponding to the on-chip components; and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain the topological structures corresponding to the plurality of on-chip components.

Description

Method and device for determining network-on-chip topological structure and chip

Technical Field

The present disclosure relates to the field of network-on-chip technologies, and in particular, to a method, an apparatus, and a chip for determining a network-on-chip topology.

Background

As the demand of a System on Chip (SoC) for low interconnection delay, high throughput and scalability is continuously increasing, the interconnection mode based on the bus is difficult to meet the performance demand of the SoC, and the Network on Chip (NoC) based on information exchange has gradually become an interconnection architecture for communication between different components in the SoC.

In the related art, when constructing the network-on-chip topology structure, a designer needs to manually construct the network-on-chip topology structure according to the connection relationship, and as the network topology structure becomes more and more complex, the problem of low efficiency of manually constructing the network topology becomes more and more obvious.

Disclosure of Invention

The embodiment of the disclosure at least provides a method, a device and a chip for determining a network-on-chip topological structure.

In a first aspect, an embodiment of the present disclosure provides a method for determining a network-on-chip topology, including:

acquiring first connection relations of a plurality of on-chip components of an on-chip system and attribute information of the plurality of on-chip components;

simplifying the first connection relation based on the attribute information of the on-chip components to obtain second connection relations corresponding to the on-chip components;

and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain the topological structures corresponding to the plurality of on-chip components.

Therefore, the efficiency is higher when the network-on-chip topological structure is determined by simplifying the first connection relation; and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain a topological structure corresponding to the plurality of on-chip components.

In a possible embodiment, the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, and/or an address space range accessible by the on-chip component;

in a possible implementation manner, when the attribute information of the on-chip component includes an address space range that can be accessed by the on-chip component, the simplifying processing on the first connection relationship based on the attribute information of the plurality of on-chip components to obtain second connection relationships corresponding to the plurality of on-chip components includes:

clustering on-chip components, the address space range of which can be accessed by the on-chip components meets a first preset condition, and obtaining a first clustering result;

and determining second connection relations corresponding to the plurality of on-chip components based on the first clustering result.

Therefore, the connection relation can be clustered according to the dimension of the address space, and the subsequent speed in the process of constructing the topological structure is higher.

In a possible implementation manner, in a case that the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the determining, based on the first clustering result, a second connection relationship corresponding to the plurality of on-chip components includes:

clustering the on-chip assemblies with bandwidth requirements meeting second preset conditions in the first clustering result to obtain a second clustering result;

and determining a second connection relation corresponding to the plurality of on-chip components based on the second clustering result.

In this way, the first clustering result is clustered again in other dimensions, so that a plurality of on-chip components after clustering are similar in a plurality of dimensions, and the clustering effect is better.

In a possible implementation manner, when the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the simplifying processing on the first connection relationship based on the attribute information of the plurality of on-chip components to obtain a second connection relationship corresponding to the plurality of on-chip components includes:

clustering the on-chip components with bandwidth requirements meeting second preset conditions to obtain a third clustering result;

and determining a second connection relation corresponding to the plurality of on-chip components based on the third clustering result.

In a possible implementation manner, in a case that the attribute information of the on-chip component includes an address space range accessible by the on-chip component, the determining, based on the third classification result, the second connection relationship corresponding to the plurality of on-chip components includes:

clustering the on-chip components of which the accessible address space ranges meet first preset conditions in the third clustering result to obtain a fourth clustering result;

and determining a second connection relation corresponding to the plurality of on-chip components based on the fourth clustering result.

In a possible embodiment, the adding, in the system on chip, a routing component for connecting the plurality of components on chip based on the second connection relationship includes:

acquiring attribute information of the routing component; the attribute information comprises the maximum input quantity and the maximum output quantity of the routing component, wherein the maximum input quantity and the maximum output quantity are used for representing the quantity of on-chip components connected by the routing component;

determining the type of the routing component and the deployment position of the routing component based on the attribute information of the routing component and the second connection relation;

and adding a routing component in the system on chip according to the type of the routing component and the deployment position of the routing component.

In a possible embodiment, the method further comprises:

aiming at any routing component in the topological structure, determining a candidate data link formed by the any routing component and a data end; the data end is an on-chip component for data sending or an on-chip component for data receiving;

determining a first target routing component for which the candidate data links are identical;

integrating the first target routing component based on the input and output quantity of the first target routing component connection;

adjusting the topology based on the integrated first target routing component.

Therefore, the use of the routing components can be reduced and the use efficiency of the routing components can be improved by integrating the first routing components.

In a possible embodiment, the method further comprises:

determining an input bit width and an output bit width of a routing component based on an initial clock frequency of the routing component and a bandwidth requirement of the on-chip component;

and distributing a clock domain for each routing component in the topological structure based on the input bit width and the output bit width of the routing component.

Therefore, the clock domains are distributed for the routing components based on the input bit width and the output bit width of the routing components, so that the distributed clock domains have the least clock domain crossing, and further the loss caused by clock domain crossing is reduced.

In one possible embodiment, the determining the input bit width and the output bit width of the routing component based on the initial clock frequency of the routing component and the bandwidth requirement of the on-chip component includes:

determining an output bit width of the on-chip component based on the bandwidth requirement of the on-chip component and a clock domain corresponding to the on-chip component; and determining a second target routing component directly connected to the on-chip component;

determining the input bit width of each second target routing component based on the bandwidth requirement of the first on-chip component connected with each second target routing component; and determining an output bit width of each second target routing component based on the initial clock frequency and the bandwidth requirement of the first on-chip component;

and determining the input bit width and the output bit width of other routing components except the second target routing component in the topological structure based on the output bit width of each second target routing component and the topological structure.

In one possible embodiment, the determining the output bit width of each second target routing component based on the initial clock frequency and the bandwidth requirement of the first on-chip component includes:

determining input bandwidth for each second target routing component based on bandwidth requirements of the first on-chip component;

determining an output bit width of each second target routing component based on the initial clock frequency and the input bandwidth of each second target routing component.

In one possible embodiment, the allocating a clock domain to each routing component in the topology based on the input bit width and the output bit width of the routing component includes:

and distributing a clock domain for each routing component in the topological structure based on the topological structure and the input bit width and the output bit width of the routing component, wherein the sum of the bit widths of the clock domains distributed for each routing component across the clock domains is the minimum.

In this way, the sum of the clock domain cross-clock domain bit widths distributed to each routing component is minimum, so that the hardware resources required by data transmission in the clock domain cross-clock domain are minimum.

In a possible implementation manner, the allocating a clock domain to each routing component in the topology based on the topology and the input bit width and the output bit width of the routing component includes:

determining at least one distribution combination to be screened based on the topological structure, wherein different distribution combinations are used for distributing different clock domains for the routing components;

and determining the sum of the bit widths of clock domains crossed under each distribution combination based on the input bit width and the output bit width of the routing components, and determining the distribution combination with the minimum sum of the bit widths as the clock domain distributed to each routing component in the topological structure.

In a possible embodiment, the determining, based on the topology, at least one allocation combination to be filtered includes:

based on the topological structure, carrying out aggregation processing on the on-chip component and the routing component;

and determining at least one distribution combination to be screened based on the aggregation result of the on-chip component and the routing component.

In this way, by performing aggregation processing on the on-chip components and the routing components, the number of generated distribution combinations to be screened is less, and thus the distribution efficiency of the clock domains can be improved.

In one possible implementation, after distributing clock domains for the routing components in the topology, the method further includes:

re-determining the input bit width and the output bit width of the routing component based on the target clock frequency corresponding to the clock domain distributed to the routing component and the bandwidth requirement of the on-chip component;

re-determining a plurality of distribution combinations to be screened based on the topological structure and the re-determined input bit width and output bit width of the routing assembly, and determining the sum of the bit widths of clock domains under each distribution combination;

and determining a target distribution combination with the minimum sum of bit widths in the multiple newly determined distribution combinations to be screened, and returning and executing the step of newly determining the input bit width and the output bit width of the routing component based on the target distribution combination under the condition that the target distribution combination is different from the distributed clock domains.

In this way, the distribution result of the clock domain is verified, and the distribution process of the clock domain is re-executed under the condition that the verification fails, so that the sum of the finally determined clock domain crossing bit widths is minimum.

In a possible embodiment, the method further comprises:

and under the condition that the number of times of the return execution exceeds the preset number of times, stopping executing the circulating process and sending first alarm information.

Therefore, the designer can be reminded to adjust the topological structure under the condition that the topological structure is unreasonable.

In a possible implementation manner, after adding a routing component for connecting the plurality of components on chip in the system on chip based on the second connection relationship to obtain a topology corresponding to the plurality of components on chip, the method further includes:

and verifying the input bandwidth and the output bandwidth of the routing component based on the bandwidth requirements of the components on the chip, and sending second alarm information under the condition that the verification fails.

Therefore, by verifying the input bandwidth and the output bandwidth of each routing component in the topological structure, a designer can be reminded to adjust the bandwidth under the condition that the data bandwidth corresponding to the topological structure is unreasonable.

In a possible embodiment, the method further comprises:

responding to an adding operation instruction of a target device, and adding the target device in the topological structure; wherein the target device comprises a first-in-first-out storage unit and/or a network rate adapter.

Therefore, the adjustment of the topological structure by a designer can be responded, and the data transmission performance of the topological structure is better.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for determining a network-on-chip topology, including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring first connection relations of a plurality of on-chip components of the on-chip system and attribute information of the plurality of on-chip components;

the simplification module is used for simplifying the first connection relation based on the attribute information of the on-chip components to obtain second connection relations corresponding to the on-chip components;

and the adding module is used for adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain the topological structures corresponding to the plurality of on-chip components.

In a third aspect, an embodiment of the present disclosure further provides a chip, including: an on-chip component and a routing component;

the network topology between the routing component and the on-chip component is determined based on the first aspect or the method for determining a network topology on-chip according to any one of the possible embodiments of the first aspect.

In a fourth aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fifth aspect, this disclosed embodiment further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the above-mentioned determining apparatus, chip, computer device and storage medium for network-on-chip topology, refer to the description of the above-mentioned determining method for network-on-chip topology, which is not described herein again.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 is a flowchart illustrating a method for determining a network-on-chip topology according to an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a specific method for obtaining a second connection relationship in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 3 is a flowchart illustrating another specific method for obtaining a second connection relationship in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a specific method for adding a routing component in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 5a is a schematic diagram illustrating a splitting algorithm in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 5b is a schematic diagram illustrating a cascade relationship of routing components in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 5c is a schematic diagram illustrating another cascade relationship of routing components in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 5d is a schematic diagram illustrating another cascade relationship of routing components in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a specific method for adjusting a topology in a method for determining a network-on-chip topology according to an embodiment of the present disclosure;

fig. 7a shows a schematic diagram of a topology before adjustment in a method for determining a network-on-chip topology according to an embodiment of the present disclosure;

fig. 7b shows a schematic diagram of an adjusted topology in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 8 is a flowchart illustrating a specific method for allocating clock domains to each routing component in a network-on-chip topology determination method according to an embodiment of the present disclosure;

fig. 9 is a flowchart illustrating a specific method for determining an input bit width and an output bit width of a routing component in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 10 is a flowchart illustrating another specific method for allocating clock domains to each routing component in a topology in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 11a to 11d are schematic diagrams illustrating clock domains allocated to each routing component in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 12 is a flowchart illustrating a specific method for verifying a clock domain allocation result in the method for determining a network-on-chip topology according to the embodiment of the present disclosure;

fig. 13 is a schematic architecture diagram illustrating an apparatus for determining a network-on-chip topology according to an embodiment of the present disclosure;

fig. 14 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Research shows that when the network-on-chip topological structure is constructed, designers need to manually construct the network-on-chip topological structure according to the connection relation, and the problem of low efficiency of manually constructing the network topology is more and more obvious along with the more and more complex network topological structure.

Based on the research, the present disclosure provides a method, an apparatus, and a chip for determining a network-on-chip topology, which obtain first connection relationships of a plurality of on-chip components of a system-on-chip and attribute information of the plurality of on-chip components; based on the attribute information of the plurality of on-chip components, simplifying the first connection relation to obtain a second connection relation corresponding to the plurality of on-chip components, so that the efficiency is higher when determining the on-chip network topology structure by simplifying the first connection relation; and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain a topological structure corresponding to the plurality of on-chip components.

To facilitate understanding of the present embodiment, first, a method for determining a network-on-chip topology disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for determining a network-on-chip topology provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the method for determining the network-on-chip topology may be implemented by a processor calling computer-readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a method for determining a network-on-chip topology according to an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

s101: the method comprises the steps of obtaining first connection relations of a plurality of on-chip components of the on-chip system and attribute information of the plurality of on-chip components.

S102: and simplifying the first connection relation based on the attribute information of the on-chip components to obtain second connection relations corresponding to the on-chip components.

S103: and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain the topological structures corresponding to the plurality of on-chip components.

The following is a detailed description of the above steps.

For S101, the on-chip component represents a component in the system on chip, and may be a pre-designed circuit function module (i.e., an IP core), where the IP core includes a network interface that can be used for network communication on chip, so that the corresponding IP core can be called through the network interface of the IP core to implement a corresponding function; the first connection relation represents a connection relation between the on-chip components in the system on chip.

Specifically, the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component and an address space range that the on-chip component can access; wherein,

the bandwidth requirement represents a minimum data transmission bandwidth required by normal operation of an on-chip component, taking the on-chip component as an IP core as an example, the bandwidth requirement of the IP core represents a minimum bandwidth required for realizing a corresponding circuit function, for example, if the circuit function of a certain IP core is to process 100 tasks, the bandwidth requirement corresponding to the IP core is a minimum bandwidth required for transmitting 100 tasks;

the memory space address range represents a range of address spaces accessible by on-chip components (such as a range of address spaces of off-chip memory components accessible by on-chip components), and is composed of a base address and an offset range, wherein the offset range is used to represent a range of offsets. Adding the maximum offset in the offset range on the basis of the base address to obtain the maximum access address which can be accessed by the on-chip assembly, and adding the minimum offset in the offset range on the basis of the base address to obtain the minimum access address which can be accessed by the on-chip assembly; the maximum offset within the offset range is added on the basis of the base address, so that the maximum access address which can be accessed by the on-chip component can be obtained, and the address space range between the minimum access address and the maximum access address is the address space range which can be accessed by the on-chip component, for example, if the base address of an IP core is X, the offset range is Y-Z (Y is less than Z), the minimum access address is X + Y, the maximum access address is X + Z, and the address space range which can be accessed by the IP core is X + Y-X + Z.

In a possible implementation manner, in a case that the attribute information of the on-chip component includes an address space range accessible by the on-chip component, as shown in fig. 2, the second connection relationships corresponding to the plurality of on-chip components may be obtained by:

s201: and clustering the on-chip assemblies of which the address space ranges which can be accessed by the on-chip assemblies meet first preset conditions to obtain a first clustering result.

S202: and determining second connection relations corresponding to the plurality of on-chip components based on the first clustering result.

Here, when clustering is performed according to the accessible address space, on-chip components having similar address spaces, i.e., two address spaces having overlapping partial address spaces, or the same on-chip components may be clustered.

Illustratively, taking address spaces corresponding to the IP core 1 as a to C, address spaces corresponding to the IP core 2 as B to D, and a < B < C < D as an example, overlapping portions B to C exist in the address spaces corresponding to the IP core 1 and the IP core 2, so that the IP core 1 and the IP core 2 can be clustered, and the address spaces corresponding to the clustered IP cores are a to D.

Further, after clustering the on-chip components, a simplified connection relationship may be determined according to the clustered on-chip components.

Exemplary, IP core A₁And IP core A₂Respectively connected with an IP core B and an IP core A₁And IP core A₂After clustering, the IP core A is directly passed throughThe connection relation of the IP core B can be used for realizing the connection effect before clustering, thereby achieving the effect of simplifying the connection relation.

In a possible implementation manner, in a case that the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, as shown in fig. 3, the second connection relationship may be further determined according to the following steps:

s2021: and clustering the on-chip assemblies with bandwidth requirements meeting second preset conditions in the first clustering result to obtain a second clustering result.

S2022: and determining a second connection relation corresponding to the plurality of on-chip components based on the second clustering result.

Here, when clustering is performed according to the bandwidth requirement, clustering may be performed on components on the chip with similar bandwidth requirements, where the determination condition for similar bandwidth requirements includes: and the absolute value of the difference between the bandwidth requirements of the two on-chip components is smaller than a first preset value, and the quotient of the larger bandwidth requirement value divided by the smaller bandwidth requirement value is smaller than a second preset value.

Illustratively, in the first clustering result obtained after clustering based on the address space range, the IP core A₁IP core A₂IP core A₃The bandwidth requirements are 1Mbps, 5Mbps and 6Mbps in sequence as an example, because the IP core A₂IP core A₃Is less than a first predetermined value N, it can be determined that IP core a is a₂IP core A₃Is similar in bandwidth requirements, so that IP core A can be implemented₂And IP core A₃Clustering, and IP core A₁No clustering is performed.

It should be noted that, when the attribute information of the on-chip component includes the bandwidth requirement of the on-chip component and the address space range that the on-chip component can access, clustering may be performed according to the address space range, and then clustering may be performed according to the bandwidth requirement on the basis of the clustering result; the clustering can also be carried out according to the bandwidth requirement, and then the clustering is carried out according to the address space range on the basis of the clustering result, namely the clustering sequence does not influence the clustering result.

Specifically, under the condition that the attribute information of the on-chip components includes the bandwidth requirements of the on-chip components, when the first connection relationship is simplified based on the attribute information of the plurality of on-chip components to obtain second connection relationships corresponding to the plurality of on-chip components, the on-chip components whose bandwidth requirements meet the second preset condition may be clustered first to obtain a third clustering result; and determining a second connection relation corresponding to the plurality of on-chip components based on the third clustering result.

In a possible implementation manner, when the attribute information of the on-chip component includes an address space range that can be accessed by the on-chip component, and when a second connection relationship corresponding to the plurality of on-chip components is determined based on the third clustering result, the on-chip components whose address space ranges that can be accessed meet the first preset condition in the third clustering result may be clustered to obtain a fourth clustering result; and determining a second connection relation corresponding to the plurality of on-chip components based on the fourth clustering result.

Here, the routing component is a component that is required to provide a network routing service for the on-chip component, and may be a multiplexer or a router.

The data selector has a plurality of input terminals and an output terminal, and in use, one input terminal can be selected from the plurality of input terminals for data transmission with the output terminal, for example, a 4-input 1-output (abbreviated as 4to1) data selector can select 1 input terminal from the 4 input terminals for data transmission with the output terminal.

The router has an input and a plurality of outputs, and in use, one of the outputs can be selected for data transmission with the input, for example, a 1-input-4-output (1 to4) router, and one of the 4 outputs can be selected for data transmission with the input.

In one possible implementation, as shown in fig. 4, a routing component may be added according to the following steps:

s301: acquiring attribute information of the routing component; the attribute information includes a maximum input number and a maximum output number of the routing component, where the maximum input number and the maximum output number are used to represent the number of on-chip components and other routing components connected by the routing component.

Here, the maximum number of inputs and the maximum number of outputs may be set by a designer according to design requirements, for example, the maximum number of inputs is 5, which means that the data selector can connect 5 on-chip components or other routing components at most. For another example, the maximum output number is 4, which means that the router can connect 4 on-chip components or other routing components at most.

S302: determining the type of the routing component and the deployment position of the routing component based on the attribute information of the routing component and the second connection relation.

Specifically, when determining the type of the routing component and the deployment position of the routing component, for any data path in the second connection relationship, based on the number of on-chip components at the input end and the number of on-chip components at the output end of the data path, the type and the number of on-chip components inserted into the data path are determined.

Wherein, any data path is a data transmission channel including at least one data link, and may include a plurality of data links, and an input end of each data link belongs to one class cluster (data path) and an output end of each data link belongs to one class cluster (data path).

Here, for any data path in the second connection relationship, based on the number of on-chip components at the input end and the number of on-chip components at the output end of the data path, the following four cases can be classified:

in case 1, the number of on-chip components that need to be connected to the input end of the routing component does not exceed the maximum input number, and the number of on-chip components that need to be connected to the output end of the routing component does not exceed the maximum output number.

At this time, the number of the input ends of the routing components may be determined as the number of on-chip components that need to be connected to the input ends, for example, if the number of on-chip components that need to be connected to the input ends is N, N is a value greater than 1 and less than M, and M is the maximum input number, the routing component may be determined as a data selector of Nto 1; the deployment position is the output end of the on-chip assembly needing to be connected, and the connection mode of the routing assembly and the on-chip assembly is that the input end of the routing assembly is directly connected with the output end of the on-chip assembly.

For example, taking the maximum number of inputs of the routing component as 5 as an example, when the number of on-chip components required to be connected to an input of a certain routing component is 4, the type of the routing component may be determined to be 4to1 data selector.

Furthermore, the number of the output ends of the routing components can be determined as the number of the on-chip components required to be connected with the output ends, the deployment positions are the input ends of the on-chip components required to be connected, and the connection mode of the routing components and the on-chip components is that the output ends of the routing components are directly connected with the input ends of the on-chip components.

Taking the maximum output number of the routing component as 5 as an example, when the number of on-chip components required to be connected at the output end of a certain routing component is 3, the type of the routing component can be determined to be a router of 1to3, the routing components on the data path are a data selector of 4to1 (on-chip components connected with 4 input ends) and a router of 1to3 (on-chip components connected with 3 output ends), and the output end of the data selector of 4to1 is connected with the input end of the router of 1to 3.

Case 2, the number of on-chip components that need to be connected at the routing component input exceeds the maximum input number, but the number of on-chip components that need to be connected at the routing component output does not exceed the maximum output number.

Here, one routing component that originally needs to be connected here may be split into a plurality of routing components having a cascade relationship according to the number of on-chip components that need to be connected at the input end of the routing component, the maximum input number, and a preset splitting algorithm.

Specifically, the splitting algorithm may be as shown in fig. 5a, where in fig. 5a, the number of on-chip components that need to be connected to the input end of the routing component is denoted as K, the maximum input number is denoted as N (generally, N is greater than or equal to 3), and K is greater than N, and the splitting algorithm includes the following steps when being specifically executed:

and step 1, judging whether K can be evenly divided by N.

Here, it may be determined first from N, taking the quotient of K and N as M and the remainder as P, and when it is determined that K is divisible by N (i.e., P is 0), step 2a is performed; when it is judged that K cannot be evenly divided by N, step 2b is performed.

And 2a, judging whether M is larger than N.

If M is less than or equal to N, executing step 3; and if M is larger than N, taking M as K, and iteratively executing the step 1.

And 2b, judging whether the M +1 is larger than N.

If M +1 is less than or equal to N, executing step 3; and if the M +1 is larger than the N, taking the M +1 as K, and iteratively executing the step 1.

And 3, determining a to-be-screened routing combination obtained after splitting according to the quotient, the divisor and the corresponding remainder obtained in the step, wherein the to-be-screened routing combination comprises the type and the number of each routing component.

Here, the divisor at the time of executing step 1 for the first time may be determined as the number of connections of the input end of the routing component of the first type in the first layer in the cascade relationship (herein, referred to as the number of input ends of routing component 1); determining the quotient obtained when the step 1 is executed for the first time as the number of the routing components 1 in the cascade relation; and determining the remainder of the first execution of the step 1 as the connection number of the input ends of the second type routing components of the first layer in the cascade relation (here, the number is referred to as the number of the input ends of the routing components 2), wherein the number of the routing components 2 is 1.

Similarly, the divisor obtained when step 1 is executed for the second time may be determined as the number of connections of the input ends of the routing components of the first type at the second layer in the cascade relationship (here, referred to as the number of input ends of routing component 3); determining the quotient obtained when the step 1 is executed for the second time as the number of the routing components 3 in the cascade relation; and determining the remainder of the second execution of step 1 as the number of connections of the input ends of the second type of routing components at the second layer in the cascade relationship (here, referred to as the number of input ends of the routing components 4), where the number of the routing components 4 is 1. Iteratively executing the steps until the types and the number of the routing components of each layer except the last layer of the cascade relation are determined, wherein if the remainder of the last operation is not 0, the connection number of the input end of the routing component of the last layer of the cascade relation is the quotient +1 of the last operation, and the number of the routing components is 1; if the remainder of the last operation is 0, the number of the connections of the input ends of the routing components at the last layer of the cascade relation is the quotient of the last operation, and the number of the routing components at the last layer is 1.

It can be seen that the number of layers of the routing components in the finally obtained cascade relationship may be +1 of the number of times of executing step 1.

For example, taking the maximum input number of the routing components as 5 as an example, when the number of on-chip components and other routing components that need to be connected to the input end of a certain routing component is 32, 32 ÷ 5 ═ 6 … 2 can be used, and if it is determined that 6+1 is greater than 5, 7 ÷ 5 ═ 1 … 2 is used, and at this time, 1+1 is smaller than 5, the operation ends.

At this time, the cascade relationship of the routing components may be as shown in fig. 5b, the divisor 5 at the time of the first operation is determined as the data selector of which the first type routing component in the first layer of the cascade relationship is 5to1, the quotient 6 at the time of the first operation is determined as the number of the first type routing components is 6, the remainder 2 at the time of the first operation is determined as the data selector of which the second type routing component is 2to1, and the number of the second type routing components is 1, so that the routing components in the first layer of the cascade relationship are 6 data selectors of 5to1 and 1 output data selector of 2to 1; according to the above method, the divisor, quotient and remainder of the second operation are determined, and the routing components at the second layer in the cascade relationship are 1 data selector from 5to1 and 1 data selector from 2to1, and the routing components at the last layer (third layer) in the cascade relationship are 1 data selector from 2to 1.

And 4, sequentially using the positive integers in the (N, 2) interval, and updating N in the step until a plurality of to-be-screened route combinations corresponding to all the positive integers in the [ N, 2] interval are obtained.

Taking the maximum input number of the routing module as 5 as an example, after obtaining the routing combination to be screened corresponding to N ═ 5 according to the above calculation, the N may be updated by using 4, the calculation is performed by substituting N ═ 4, using 32 ÷ 4 ═ 8, judging that 8 is greater than 4, using 8 ÷ 4 ═ 2, and at this time, 2 is smaller than 4, and the calculation is ended.

At this time, the cascade relationship of the routing components may be as shown in fig. 5c, the divisor 4 at the time of the first operation is determined as a data selector that is 4to1 and the routing component of the first type in the first layer of the cascade relationship, and the quotient 8 at the time of the first operation is determined as the number of the routing components of the first type is 8, so that the routing components of the first layer in the cascade relationship are determined as 8 data selectors of 4to 1; according to the above method, the divisor, quotient and remainder of the second operation are determined, and 2 data selectors of 4to1 for the second layer routing element and 1 data selector of 2to1 for the last layer (third layer) routing element in the cascade relation are obtained.

Further, the routing combinations to be screened corresponding to N-3 and N-2 are continuously determined according to the above steps, so that 4 routing combinations to be screened corresponding to N-5, 4, 3, and 2 can be obtained.

And 5, determining a target route combination from the plurality of route combinations to be screened based on a preset screening rule.

Here, when determining the target routing combination from the plurality of routing combinations to be screened, the determination may be performed based on a sum of remainders corresponding to the routing combinations to be screened, and/or a sum of quotients.

In a possible implementation manner, for each N, it may be determined that each route combination to be screened obtains a remainder and a quotient in the process of performing step 1 for multiple times, and a sum of the corresponding remainders and a sum of the quotients may be determined; determining the route combination to be screened with the minimum sum of corresponding remainders in the plurality of route combinations to be screened as a first route combination to be screened; and determining the first to-be-screened route combination with the minimum sum of corresponding quotients in the first to-be-screened route combinations as the target route combination.

Specifically, the remainder in the operation process represents the type of the second type of routing component used in the hierarchical relationship, and in order to ensure fairness of different input ends and output ends, the smaller the remainder is, the better the remainder is (the smaller the remainder is, the fairer the remainder is, when the remainder is 0, the representation that no second type of routing component exists in the first layer of the hierarchical relationship, and since the first type of routing component is used, fairness of each input end and output end is the highest at this time); the quotient in the operation process represents the number of the used routing components of a certain type in the cascade relation, the sum of the quotients represents the number of the whole routing combination to be screened, and the sum of the quotients is the minimum, namely the number of the used routing components is the minimum.

Following the above example, when N is determined to be 5, the quotient in the operation process is 6 and 1, and the remainder is 2 and 2; when N is 4, the quotient in the operation process is 8 and 2, and the remainder is 0; when N is 3, the quotient in the operation process is 10, 3, 1, and the remainder is 2, 1; when N is 2, the quotient in the calculation process is 16, 8, 4, 2, and the remainder is 0.

Thus, when N is 5, the sum of the quotients is 7 and the sum of the remainders is 4; when N is 4, the sum of the quotient is 10, and the sum of the remainder is 0; when N is 3, the sum of the quotient is 14, and the sum of the remainder is 5; when N is 2, the sum of the quotients is 30 and the sum of the remainders is 0. At this time, two first routing combinations to be screened, i.e. N-4 and N-2, may be determined according to the minimum sum of the remainders, and then N-4 may be determined as the target routing combination according to the minimum sum of the quotients.

Taking the maximum output number of the routing component as 5 as an example, when the number of on-chip components required to be connected at the output end of a certain routing component is 3, the type of the routing component can be determined to be a router of 1to3, the routing components on the data path are a plurality of data selectors and a router of 1to3 (on-chip components connected with 3 output ends) in a cascade relationship as shown in fig. 5c, and the output end of the data selector of 2to1 in fig. 5c is connected with the input end of the router of 1to 3.

Case 3, the number of on-chip components that need to be connected at the input end of the routing component does not exceed the maximum input number, and the number of on-chip components that need to be connected at the output end of the routing component exceeds the maximum output number.

Here, the number of the input ends of the routing component may be determined as the number of the on-chip components required to be connected with the input ends, the deployment position may be the output end of the on-chip component required to be connected, and the connection manner between the routing component and the on-chip component is that the input end of the routing component is directly connected with the output end of the on-chip component.

Then, according to the number of on-chip components that the output end of the routing component needs to be connected to, the maximum output number, and a preset splitting algorithm, splitting one routing component that originally needs to be connected to a plurality of routing components having a cascade relationship, where the number of on-chip components that the output end of the routing component needs to be connected to can be brought into K in the case 2, the maximum output number is brought into N (generally, N is greater than or equal to 3) in the case 2, and K is greater than N, and then sequentially executing steps 1to5 in the case 2, so that the types and connection modes of the plurality of routing components having the cascade relationship after splitting can be obtained.

For example, taking the maximum output number of the routing components as 5 as an example, when the number of the on-chip components and other routing components that need to be connected to the output end of a certain routing component is 7, 7 ÷ 5 ═ 1 … 2 may be used, where 1+1 is less than 5, the operation is ended, the order of the judgment is opposite to that in the above case 2, the judgment is started from the quotient of the last operation until the quotient and the remainder of the first operation are judged, and it may be judged that the first layer is 1to2 routers, the second layer is 1to 15 routers, and 1to2 routers in the cascade relationship.

Case 4, the number of on-chip components that need to be connected at the routing component input exceeds the maximum input number, and the number of on-chip components that need to be connected at the routing component output also exceeds the maximum output number.

Then, according to the number of on-chip components that the output end of the routing component needs to connect to, the maximum output number, and a preset splitting algorithm, splitting one routing component that originally needs to connect to a plurality of routing components having a cascade relationship, where relevant contents of the splitting algorithm refer to relevant descriptions in case 2 and case 3, an operation process for the output end may refer to contents in case 2, and an operation process for the input end may refer to contents in case 3.

For example, taking the maximum input number and the maximum output number of the routing components as 5, when the number of on-chip components and other routing components that need to be connected to the input end of a certain routing component is 32, and the number of on-chip components and other routing components that need to be connected to the output end is 7, the cascade relationship of the routing components may be as shown in fig. 5d, the first layer is 8 data selectors to1, the second layer is 2 data selectors to1, the third layer is 1 data selector to1, the fourth layer is a router to2, the fifth layer is 1 router to5 and 1 router to2, and the determination process of the routing components in each layer may refer to the examples in case 2 and case 3, and will not be described again here.

S303: and adding a routing component in the system on chip according to the type of the routing component and the deployment position of the routing component.

And after the type and the deployment position of the routing component are obtained according to the steps, the routing component can be added in the system on chip according to the type and the deployment position of the routing component, so that the topological structures corresponding to the plurality of components on chip are obtained.

In practical application, after the topological structures corresponding to the plurality of on-chip components are obtained, the topological structures can be adjusted so that the topological structures are more in line with practical requirements, for example, the topological structures can be simplified so as to save the use number of routing components and improve the use efficiency of the routing components; or, the target device can be added to the topology structure in response to an operation instruction added by the target device of the designer, so as to improve the data transmission efficiency of the network topology structure.

In a possible implementation, after obtaining the topology corresponding to the on-chip component, as shown in fig. 6, the topology may be adjusted according to the following steps:

s401: aiming at any routing component in the topological structure, determining a candidate data link formed by the any routing component and a data end; the data end is an on-chip component for data sending or an on-chip component for data receiving.

Illustratively, the topology is used as an on-chip component for data transmission and is an IP core a, the routing component is a routing component B, and the on-chip component for data reception is an IP core C₁IP core C₂For example, the candidate data link formed by the routing component and the data end is a-B, B-C₁、B-C₂。

S402: a first target routing component is determined that the candidate data links are identical.

Here, when determining the first target routing component, a routing component in a data link from the same on-chip component (i.e., data ends at input ends are the same) or a data link to the same on-chip component (i.e., data ends at output ends are the same) may be determined as the first target routing component; in one embodiment, the first target routing component whose corresponding candidate data link is the same may be understood as a routing component connected to the same on-chip component or a routing component connected to the same on-chip component through the same routing component.

For example, as shown in fig. 7a, the topology diagram before adjustment may be as shown in fig. 7a, there are 10 IP cores at the data transmitting end, and 12 IP cores at the data receiving end, the data path is formed by 10 routers 1to2, a crossbar switch matrix 10to8, and a crossbar switch matrix 10to4, and for the crossbar switch matrix 10to8 and the crossbar switch matrix 10to4, 10 identical routers 1to2 are respectively connected to 10 identical IP cores, so that the crossbar switch matrix 10to8 and the crossbar switch matrix 10to4 are respectively the same candidate data link for the data link of the same on-chip component.

The crossbar is formed based on the data selectors and the routers, so that the connection relationship in the topology structure is more concise, for example, a 10to8 crossbar may be formed by 1 10to1 data selectors and 1to8 routers, and a 10to8 crossbar and a 10to4 crossbar are generally packaged as an integral component, and thus may be regarded as a routing component.

S403: and integrating the first target routing component based on the input and output quantity connected with the first target routing component.

In practical applications, in order to reduce the number of routing components in the topology, the first target routing component may be integrated. Specifically, since the candidate data links connected to the plurality of first target routing components are completely the same, the plurality of first target routing components may be merged, and the first target routing components after merging may still ensure that the connection relationship between the data ends before merging does not change by adjusting the topology.

Illustratively, if the candidate data links corresponding to the first target routing component a and the first target routing component B are both data links 1, the first target routing component a and the first target routing component B may be merged, and the data links 1 may be directly connected to the merged routing components, so that the number of routing components may be reduced on the premise that normal data transmission between data ends can be ensured.

Specifically, when the first target routing component is integrated according to the number of inputs and outputs of the first target routing component, the following 2 cases may be classified:

in case 1, the first target routing components have the same number of inputs and different numbers of outputs.

In this case, the input side data paths of the first target routing components are the same and the output side data paths are different; in integrating the first target routing component, similar to the role of step 5 in case 2 in S302 described above, to ensure fairness at different inputs and outputs, it is necessary to ensure that the types of routing components that need to be added after integration are the same (i.e., they can be divisible by an integer, the remainder is 0, and fairness is highest at this time), and therefore it is necessary to determine a divisor that can be divisible by an integer, and, to further reduce the use of routing components, it may be determined that the divisor is the greatest common divisor, thereby reducing the use of the number of the routing components by increasing the input and output number of the routing components, the sum of the greatest common divisor of the number of outputs of the different first target routing components and the number of outputs can thus be determined at this time, and determining the first target routing component after integration based on the determined greatest common divisor and the sum of the output quantities.

Specifically, the input number is the input number of the first target routing component after integration, and the routing component connected to the input end of the first target routing component needs to be deleted due to the integration of the first target routing component; the greatest common divisor is the output quantity of the first target routing component after integration; and the quotient of the sum of the output quantities and the greatest common divisor is the output quantity of the first target routing components which need to be newly added after integration, and the greatest common divisor is the quantity of the first target routing components which need to be newly added.

Taking the above example as an example, as shown in fig. 7a, the first destination routing components with the same number of inputs and different numbers of outputs, i.e., the crossbar matrix of 10to8 and the crossbar matrix of 10to4, are the same number of inputs, and it can be determined that the number of inputs of the first destination routing components after the integration is 10 by integrating the crossbar matrix of 10to8 and the crossbar matrix of 10to4 according to the same candidate data link on their left side and the number of inputs; a greatest common divisor of 8 and 4 of 4 may determine that the number of outputs of the first target routing element after the integration is 4, i.e., the first target routing element after the integration includes a crossbar of 10to 4; the sum of the output numbers of the crossbar switch matrix 10to8 and the crossbar switch matrix 10to4 is 12, the greatest common divisor is 4, the quotient of the sum of the output numbers and the greatest common divisor is 3, and it can be determined that the output number of the first target routing component that needs to be newly added after the integration is 3, and the number of the first target routing component that needs to be newly added is 4, that is, 4 routers 1to3 need to be added.

For example, taking the first target routing component as a 10to7 crossbar and a 10to4 crossbar as an example, when performing the integration, the input number of the first target routing component after the integration is 10 (as shown in fig. 7 a-7 b, 101 to2 routers are reduced after the integration), and according to the output numbers 7 and 4, the sum of the output numbers may be determined to be 11, at this time, the output number of the first target routing component after the integration may be determined according to the above splitting algorithm, for example, the output number of the first target routing component after the integration may be determined to be 4, and the first target routing components that need to be newly added are 3 1to3 routers and 1to2 routers.

In case 2, the first target routing component has different input numbers and the same output number.

In this case, the input side data paths of the first target routing components are different and the output side data paths are the same; in integrating the first target routing components, a greatest common divisor of the input quantities and a sum of the input quantities of different first target routing components may be determined, and the first target routing components after integration may be determined based on the determined greatest common divisor and the sum of the output quantities.

Specifically, the output number is the output number of the first target routing component after integration, and the routing component connected to the output end of the first target routing component needs to be deleted due to the integration of the first target routing component; the greatest common divisor is the input quantity of the first target routing component after integration; the quotient of the sum of the input numbers and the greatest common divisor is the input number of the first target routing components that need to be newly added after the integration, and the greatest common divisor is the number of the first target routing components that need to be newly added, and related examples refer to related examples in case 1, which are not described herein again.

S404: adjusting the topology based on the integrated first target routing component.

For example, the schematic diagram of the adjusted topology structure may be as shown in fig. 7b, there are 10 IP cores at the data sending end and 12 IP cores at the data receiving end, and a data path is formed by a 10-input 4-output crossbar and 4 1-input 3-output routers, which reduces the usage of routing components and improves the usage efficiency of the routing components compared with fig. 7 a.

In practical application, after the topological structure is obtained according to the above steps, the initial clock frequency corresponding to each routing component in the topological structure may cause a high hardware cost of passing through two different clock frequencies during data transmission, so that a suitable clock domain may be allocated to each added routing component to reduce the hardware cost of the data path when the data path crosses the clock domain.

In one possible implementation, as shown in fig. 8, clock domains may be allocated to each routing component in the topology according to the following steps:

s501: determining an input bit width and an output bit width of a routing component based on an initial clock frequency of the routing component and a bandwidth requirement of the on-chip component.

Here, for any routing component, the initial clock frequency of the routing component may be the same as the clock domain of any on-chip component connected to the routing component; the bit width indicates the number of bits of data transmitted in a certain clock period, and the bandwidth indicates the amount of data transmitted in a certain time, and the bandwidth can be represented by the product of the bit width and the clock frequency.

In one possible embodiment, as shown in fig. 9, the input bit width and the output bit width of the routing component may be determined by:

s5011: determining an output bit width of the on-chip component based on the bandwidth requirement of the on-chip component and a clock domain corresponding to the on-chip component; and determining a second target routing component directly connected to the on-chip component.

Here, the bandwidth requirement of the on-chip component may be divided by the clock frequency in the clock domain corresponding to the on-chip component, and an obtained result is the output bit width of the on-chip component, where the clock frequencies corresponding to different on-chip components may be different, and thus the output bit widths of the on-chip components that may be obtained may also be different.

S5012: determining the input bit width of each second target routing component based on the bandwidth requirement of the first on-chip component connected with each second target routing component; and determining an output bit width of each second target routing component based on the initial clock frequency and the bandwidth requirement of the first on-chip component.

Here, the output bit width of the first on-chip component is an input bit width of a second target routing component connected to the first on-chip component.

In a possible implementation manner, when determining the output bit width of each second target routing component, after determining the input bandwidth of each second target routing component based on the bandwidth requirement of the first on-chip component, the output bit width of each second target routing component may be determined based on the initial clock frequency and the input bandwidth of each second target routing component.

Specifically, for any one of the second routing components, the input bandwidth of the second routing component is the sum of bandwidth requirements of at least one first on-chip component connected to the second routing component, and after obtaining the input bandwidth of the second routing component, the output bandwidth of the second routing component should be greater than or equal to the input bandwidth, so that the output bandwidth of the second routing component is the input bandwidth at the minimum; the input bandwidth may be divided by an initial clock frequency corresponding to the second routing component to obtain an output bit width of the second routing component.

Exemplary, IP core A connected with a second routing component A₁And IP core A₂The bandwidth requirements of the second routing module a are 800Mbps and 800Mbps, and the initial clock frequency is 100MHZ, for example, the input bandwidth of the second routing module a is 1600Mbps, and according to the formula, the bandwidth is 1600Mbps × 8 × frequency ÷ 8, and the output bit width of the second routing module is 1600Mbps × 8 ÷ 100MHZ ═ 128 bits (i.e., 128 bits).

S5013: and determining the input bit width and the output bit width of other routing components except the second target routing component in the topological structure based on the output bit width of each second target routing component and the topological structure.

And after obtaining the output bit width of the second target routing component on the data link, sequentially determining the input bit width and the output bit width of the routing component on the data link from the data end for data transmission to the data end for data reception, wherein the input bit width of the current device is the output bit width of the last device connected with the current device on the data link, and the sum of the input bit widths of the current device is the sum of the output bit widths of all the last devices connected with the current device on the data link.

S502: and distributing a clock domain for each routing component in the topological structure based on the input bit width and the output bit width of the routing component.

Here, a clock domain may be allocated to each routing component in the topology based on the topology and the input bit width and the output bit width of the routing component, where the sum of the bit widths across the clock domains is the smallest when the clock domains allocated to each routing component are allocated.

In one possible implementation, as shown in fig. 10, each routing component in the topology may be assigned a clock domain by:

s5021: and determining at least one distribution combination to be screened based on the topological structure, wherein different distribution combinations are used for distributing different clock domains for the routing components.

In one possible implementation, when distributing the clock domain to the routing component, the clock domain of the on-chip component having a connection relationship with the routing component may be used for distribution.

Illustratively, for a data path in which the routing component a is located, a clock domain corresponding to an IP core at an input end is clock domain 1, and a clock domain corresponding to an IP core at an output end is clock domain 2, so when distributing the clock domains for the routing component a, the clock domains 1 and 2 may be used for distribution to obtain at least one distribution combination to be screened.

Further, after obtaining the clock domains possibly distributed to the routing components, at least one distribution combination to be screened is generated according to the clock domains possibly distributed to each routing component.

In another possible implementation, when determining the allocation combination to be screened, the on-chip component and the routing component may be further aggregated based on the topology structure; and determining at least one distribution combination to be screened based on the aggregation result of the on-chip component and the routing component, wherein the details are described in detail below and are not described herein.

S5022: and determining the sum of the bit widths of clock domains crossed under each distribution combination, and determining the distribution combination with the minimum sum of the bit widths as the clock domains distributed to each routing component in the topological structure.

Here, the sum of the bit widths across the clock domains under each allocation combination may be calculated according to the input bit width and the output bit width of each routing component, and the allocation combination with the smallest sum of the bit widths across the clock domains may be determined as the target allocation combination.

For example, schematic diagrams of distributing clock domains for each routing component may be shown in fig. 11a to 11d, where:

fig. 11a shows a schematic diagram of a data path before clock domains are allocated in a topology, in fig. 11a, NI represents a network interface of an IP core, different hatching types represent different clock domains, where NI0 and NI1 are located in clock domain 1, NI2 is located in clock domain 2, NI3 and NI4 are located in clock domain 3, s0, s1, and s2 represent routing component 1, routing component 2, and routing component 3, respectively, arrow directions represent transmission directions of data, and numbers represent bit widths of data transmitted on the data path;

fig. 11b is a schematic diagram illustrating aggregation processing performed on the on-chip components, in fig. 11b, NI0 and NI1 in the same clock domain 1 are aggregated to generate NI 0-1, and the output bit width after aggregation is 1024+ 512-1536; performing aggregation processing on NI3 and NI4 which are located in the clock domain 3 to generate NI 3-4, wherein the input bit width after the aggregation processing is 512+ 512-1024;

further, all possibilities of performing aggregation processing on the routing component and the on-chip component can be determined according to the connection relation corresponding to the topological structure, and the screened distribution combination can be determined according to the result of the aggregation processing;

specifically, according to the topology structure of the routing component 1, the routing component 1 and the NIs 0-1 may be aggregated based on fig. 11b, that is, the routing component 1 is allocated with the clock domain 1; according to the topological structures of the routing component 2 and the routing component 3, the routing component 2, the routing component 3 and NIs 3-4 can be aggregated on the basis of FIG. 11b, that is, the routing component 2 and the routing component 3 are distributed with clock domains 3;

for example, a schematic diagram of performing aggregation processing on a routing component and an on-chip component may be as shown in fig. 11c, where in fig. 11c, aggregation processing is performed on routing component 1 and NI 0-1 to obtain NI 0-1, s 0; carrying out polymerization treatment on the routing component 2, the routing component and NI 3-4 to obtain s1, s2 and NI 3-4;

further, according to the connection relationship between the routing component 1, the routing component 2, the routing component 3 and other components in the topology structure, 4 corresponding distribution combinations to be screened after clustering can be obtained, wherein (1) the routing component 1, the routing component 2 and the routing component 3 are sequentially distributed with a clock domain 1, a clock domain 3 and a clock domain 3; (2) distributing a clock domain 1, a clock domain 2 and a clock domain 3 for the routing component 1, the routing component 2 and the routing component 3 in sequence; (3) distributing a clock domain 1, a clock domain 2 and a clock domain 2to the routing component 1, the routing component 2 and the routing component 3 in sequence; (4) and distributing a clock domain 1, a clock domain 2 and a clock domain 1to the routing component 1, the routing component 2 and the routing component 3 in sequence. After sequentially determining the cross-clock domain bit width corresponding to each distribution combination to be screened, obtaining a target distribution combination with the minimum cross-clock domain sum as a distribution combination (1);

fig. 11d shows a schematic diagram of a data path after clock domains are distributed in a topology structure, in fig. 11d, clock domains 1, 3, and 3 are distributed to routing component 1, routing component 2, and routing component 3 in sequence, and the sum of the bandwidths across the clock domains is 512+ 128-640, which is the smallest among all the distribution combinations to be screened.

In practical application, after the clock domain allocated to each added routing component is obtained according to the above steps, because the input bit width and the output bit width of the previously determined routing component are determined based on the initial clock frequency corresponding to the routing component, after a new clock frequency is determined, the input bit width and the output bit width of the routing component may also be changed correspondingly, and it is required to verify the allocation result if it is desired to determine that the sum of the bit widths across the clock domains after the bit width change is the minimum.

In a possible implementation, after distributing a clock domain to each routing component in the topology, as shown in fig. 12, the distribution result of the clock domain may be verified by the following steps:

s601: and re-determining the input bit width and the output bit width of the routing component based on the target clock frequency corresponding to the clock domain distributed to the routing component and the bandwidth requirement of the on-chip component.

S602: and re-determining various distribution combinations to be screened based on the topological structure and the re-determined input bit width and output bit width of the routing component, and determining the sum of the bit widths of the cross clock domains of the distribution combinations.

S603: and determining a target distribution combination with the minimum sum of bit widths in the multiple newly determined distribution combinations to be screened, and returning and executing the step of newly determining the input bit width and the output bit width of the routing component based on the target distribution combination under the condition that the target distribution combination is different from the distributed clock domains.

Here, the relevant description of the input bit width and the output bit width of the routing component is determined, a plurality of allocation combinations to be screened are determined, and the relevant description of the target allocation combination is determined, which is referred to above and is not described herein again.

Specifically, if the target distribution combination is different from the allocated clock domain, the steps of determining the input bit width and the output bit width of the routing component and determining the target distribution combination may be performed in a loop until the target distribution combination is the same as the allocated clock domain.

In practical applications, it may happen that the above steps are performed in a loop multiple times, but the target distribution combination is still different from the distributed clock domains.

In a possible embodiment, in the case that the number of times of the return execution exceeds a preset number, the execution cycle process may be stopped, and the first alarm information may be sent.

Here, the current topological structure of designer can not satisfy the design requirement through first alarm information suggestion, needs to adjust topological structure.

In a possible implementation manner, after a routing component for connecting the plurality of on-chip components is added to the system on chip based on the second connection relationship to obtain a topology corresponding to the plurality of on-chip components, an input bandwidth and an output bandwidth of the routing component may be verified based on a bandwidth requirement of each on-chip component, and second alarm information may be sent when the verification fails.

Specifically, when the input bandwidth and the output bandwidth of the routing component are verified, for any data path, it may be determined whether the output bandwidth of the last routing component can meet the bandwidth requirement of the on-chip component connected to the last routing component, and if the output bandwidth of the last routing component is smaller than the minimum bandwidth required by the on-chip component connected to the last routing component, it indicates that the bandwidth allocation at this time is unreasonable, and at this time, a second alarm message needs to be sent to prompt a designer that the allocated bandwidth is insufficient, and a corresponding adjustment needs to be performed.

In a possible implementation manner, after obtaining the topology structures corresponding to the plurality of on-chip components, adding a target device in the topology structures in response to a target device adding operation instruction; wherein the target device comprises a first-in-first-out storage unit and/or a network rate adapter.

Here, the designer can adjust the topological structure according to the received alarm information by using a corresponding target, so that the data transmission performance of the topological structure is better.

For example, after receiving the second alarm information, the designer may add an operation instruction through the target device, and add a first-in first-out storage unit and a network rate adapter in the topology structure to reduce network congestion and data transmission latency, thereby implementing optimization of the topology structure.

The method for determining the network-on-chip topological structure, provided by the embodiment of the disclosure, includes acquiring first connection relations of a plurality of on-chip components of a system-on-chip and attribute information of the plurality of on-chip components; based on the attribute information of the plurality of on-chip components, simplifying the first connection relation to obtain a second connection relation corresponding to the plurality of on-chip components, so that the efficiency is higher when determining the on-chip network topology structure by simplifying the first connection relation; and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain a topological structure corresponding to the plurality of on-chip components.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a device for determining a network-on-chip topology corresponding to the method for determining a network-on-chip topology, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the method for determining a network-on-chip topology described in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 13, an architecture schematic diagram of an apparatus for determining a network-on-chip topology according to an embodiment of the present disclosure is shown, where the apparatus includes: an obtaining module 1301, a simplifying module 1302 and an adding module 1303; wherein,

an obtaining module 1301, configured to obtain first connection relationships of multiple on-chip components of a system on chip and attribute information of the multiple on-chip components;

a simplifying module 1302, configured to simplify the first connection relation based on the attribute information of the multiple on-chip components, to obtain second connection relations corresponding to the multiple on-chip components;

an adding module 1303, configured to add, in the system on chip, a routing component for connecting the multiple components on chip based on the second connection relationship, so as to obtain a topology structure corresponding to the multiple components on chip.

in a possible implementation manner, when the attribute information of the on-chip component includes an address space range accessible by the on-chip component, the simplifying module 1302 is configured to, when performing a simplifying process on the first connection relationship based on the attribute information of the plurality of on-chip components to obtain second connection relationships corresponding to the plurality of on-chip components,:

In a possible implementation manner, in a case that the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the simplifying module 1302, when determining, based on the first clustering result, second connection relationships corresponding to the plurality of on-chip components, is configured to:

In a possible implementation manner, when the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the simplifying module 1302 is configured to, when performing a simplifying process on the first connection relationship based on the attribute information of the plurality of on-chip components to obtain a second connection relationship corresponding to the plurality of on-chip components,:

In a possible implementation manner, in a case that the attribute information of the on-chip component includes an address space range accessible by the on-chip component, the simplifying module 1302, when determining, based on the third clustering result, a second connection relationship corresponding to the plurality of on-chip components, is configured to:

In a possible implementation manner, the adding module 1303, when adding a routing component for connecting the plurality of components on chip in the system on chip based on the second connection relationship, is configured to:

In a possible implementation, the apparatus further includes an adjusting module 1304 configured to:

adjusting the topology based on the integrated first target routing component.

In a possible implementation, the apparatus further includes an assigning module 1305, configured to:

In one possible embodiment, the allocating module 1305, when determining the input bit width and the output bit width of the routing component based on the initial clock frequency of the routing component and the bandwidth requirement of the on-chip component, is configured to:

In one possible implementation, the allocating module 1305, when determining the output bit width of each second target routing component based on the initial clock frequency and the bandwidth requirement of the first on-chip component, is configured to:

In one possible implementation, the allocating module 1305, when allocating a clock domain to each routing component in the topology based on the input bit width and the output bit width of the routing component, is configured to:

In one possible embodiment, the allocating module 1305, when allocating a clock domain to each routing component in the topology based on the input bit width and the output bit width of the topology and the routing components, is configured to:

In a possible embodiment, the assignment module 1305, when determining at least one assignment combination to be filtered based on the topology, is configured to:

In one possible implementation, after allocating clock domains for the routing components in the topology, the allocating module 1305 is further configured to:

In a possible implementation, the assigning module 1305 is further configured to:

In a possible implementation manner, after adding, in the system on chip, a routing component for connecting the plurality of components on chip based on the second connection relationship to obtain a topology corresponding to the plurality of components on chip, the allocating module 1305 is further configured to:

In a possible implementation, the adding module 1303 is further configured to:

The device for determining the network-on-chip topological structure, provided by the embodiment of the disclosure, acquires first connection relations of a plurality of on-chip components of a system-on-chip and attribute information of the plurality of on-chip components; based on the attribute information of the plurality of on-chip components, simplifying the first connection relation to obtain a second connection relation corresponding to the plurality of on-chip components, so that the efficiency is higher when determining the on-chip network topology structure by simplifying the first connection relation; and adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relation to obtain a topological structure corresponding to the plurality of on-chip components.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 14, a schematic diagram of a computer device 1400 provided in the embodiment of the present disclosure includes a processor 1401, a memory 1402, and a bus 1403. The storage 1402 is used for storing execution instructions, and includes a memory 14021 and an external storage 14022; the memory 14021 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 1401 and the data exchanged with the external memory 14022 such as a hard disk, the processor 1401 exchanges data with the external memory 14022 through the memory 14021, and when the computer apparatus 1400 is operated, the processor 1401 and the memory 1402 communicate with each other through the bus 1403, so that the processor 1401 executes the following instructions:

The embodiment of the present disclosure further provides a chip, including: an on-chip component and a routing component; the network topology between the routing component and the on-chip component may be determined based on the method for determining the on-chip network topology in any embodiment of the present disclosure.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method for determining a network-on-chip topology described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and an instruction included in the program code may be used to execute the step of the method for determining a network-on-chip topology structure in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and is not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for determining a network-on-chip topology, comprising:

2. The method according to claim 1, wherein the attribute information of the on-chip component comprises a bandwidth requirement of the on-chip component and/or an address space range accessible by the on-chip component.

3. The method according to claim 2, wherein, in a case that the attribute information of the on-chip component includes an address space range that can be accessed by the on-chip component, the simplifying processing on the first connection relation based on the attribute information of the plurality of on-chip components to obtain second connection relations corresponding to the plurality of on-chip components includes:

4. The method according to claim 3, wherein in a case that the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the determining, based on the first clustering result, the second connection relationships corresponding to the plurality of on-chip components includes:

5. The method of claim 2, wherein, when the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component, the simplifying processing on the first connection relation based on the attribute information of the plurality of on-chip components to obtain second connection relations corresponding to the plurality of on-chip components includes:

6. The method according to claim 5, wherein in a case that the attribute information of the on-chip component includes an address space range accessible by the on-chip component, the determining, based on the third classification result, the second connection relationship corresponding to the plurality of on-chip components includes:

7. The method according to any of claims 1to 6, wherein adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relationship comprises:

8. The method of any one of claims 1to 7, further comprising:

adjusting the topology based on the integrated first target routing component.

9. The method of any one of claims 1to8, further comprising:

10. The method of claim 9, wherein determining the input bit width and the output bit width of the routing component based on the initial clock frequency of the routing component and the bandwidth requirement of the on-chip component comprises:

11. The method of claim 10, wherein said determining an output bit width for each second target routing component based on said initial clock frequency and a bandwidth requirement of said first on-chip component comprises:

12. The method according to any one of claims 9 to 11, wherein said allocating a clock domain to each routing component in said topology based on an input bit width and an output bit width of said routing component comprises:

13. The method of claim 12, wherein said assigning a clock domain to each routing component in said topology based on input bit widths and output bit widths of said topology and said routing components comprises:

14. The method of claim 13, wherein determining at least one allocation combination to be filtered based on the topology comprises:

15. The method of any of claims 9 to 14, wherein after assigning clock domains to each routing component in the topology, the method further comprises:

16. The method of claim 15, further comprising:

17. The method according to any one of claims 1to 16, wherein after adding a routing component for connecting the plurality of on-chip components in the system on chip based on the second connection relationship to obtain a topology corresponding to the plurality of on-chip components, the method further comprises:

18. The method of any one of claims 1to 17, further comprising:

19. A chip, comprising: an on-chip component and a routing component;

the network topology between the routing component and the on-chip component is determined based on the method for determining the on-chip network topology according to any one of claims 1to 16.

20. An apparatus for determining a network-on-chip topology, comprising:

21. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the method of determining a network-on-chip topology of any of claims 1to 18.

22. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the steps of the method for network-on-chip topology determination as claimed in any one of the claims 1to 18.