WO2023029487A1 - 用于确定片上网络拓扑结构的方法、装置及芯片 - Google Patents

用于确定片上网络拓扑结构的方法、装置及芯片 Download PDF

Info

Publication number
WO2023029487A1
WO2023029487A1 PCT/CN2022/086325 CN2022086325W WO2023029487A1 WO 2023029487 A1 WO2023029487 A1 WO 2023029487A1 CN 2022086325 W CN2022086325 W CN 2022086325W WO 2023029487 A1 WO2023029487 A1 WO 2023029487A1
Authority
WO
WIPO (PCT)
Prior art keywords
chip
routing component
topology
component
routing
Prior art date
Application number
PCT/CN2022/086325
Other languages
English (en)
French (fr)
Inventor
王坚烽
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023029487A1 publication Critical patent/WO2023029487A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery

Definitions

  • the present disclosure relates to the field of network-on-chip technology, and in particular, to a method, device and chip for determining a network-on-chip topology.
  • Embodiments of the present disclosure at least provide a method, a device and a chip for determining a network-on-chip topology.
  • an embodiment of the present disclosure provides a method for determining a network-on-chip topology, including: acquiring a first connection relationship of a plurality of on-chip components of a system-on-chip and attribute information of the plurality of on-chip components; based on the attribute information of the plurality of on-chip components, and simplify the first connection relationship to obtain a second connection relationship corresponding to the plurality of on-chip components; A routing component that connects the multiple on-chip components to obtain a topology corresponding to the multiple on-chip components.
  • the efficiency of determining the topology of the on-chip network can be made higher; by adding in the system on chip based on the second connection relationship
  • the routing components can obtain the topology structures corresponding to the plurality of on-chip components, and can automatically generate the topology structure, thereby effectively improving the efficiency of network topology construction.
  • the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component and/or an address space range that the on-chip component can access.
  • performing simplified processing on the first connection relationship based on the attribute information of the plurality of on-chip components Obtaining the second connection relationship corresponding to the plurality of on-chip components includes: clustering the on-chip components whose accessible address space range meets the first preset condition to obtain a first clustering result; based on the first clustering Based on the result, determine the second connection relationship corresponding to the plurality of on-chip components. In this way, the connection relationship can be clustered according to the dimension of the address space, so that the subsequent topology structure is more efficient.
  • the determining the second connection relationship corresponding to the plurality of on-chip components based on the first clustering result includes: In the first clustering result, the on-chip components whose bandwidth requirement meets the second preset condition are clustered to obtain a second clustering result; based on the second clustering result, the first clustering result corresponding to the plurality of on-chip components is determined. Two connections. In this way, by performing clustering again on the first clustering result in other dimensions, the multiple on-chip components after clustering may be similar in multiple dimensions, so that the clustering effect is better.
  • the simplified processing of the first connection relationship is performed based on the attribute information of the plurality of on-chip components , to obtain the second connection relationship corresponding to the plurality of on-chip components, including: clustering the on-chip components whose bandwidth requirements meet the second preset condition, to obtain a third clustering result; based on the third clustering result, determining The second connection relationship corresponding to the plurality of on-chip components.
  • the second connection relationship corresponding to the plurality of on-chip components is determined based on the third clustering result , including: performing clustering on the on-chip components whose accessible address space range meets the first preset condition in the third clustering result, to obtain a fourth clustering result; based on the fourth clustering result, A second connection relationship corresponding to the plurality of on-chip components is determined.
  • the adding a routing component for connecting the multiple on-chip components in the system-on-chip based on the second connection relationship includes: acquiring attribute information of the routing component, the The attribute information includes the maximum input quantity and the maximum output quantity of the routing component, and the maximum input quantity and the maximum output quantity are used to indicate the number of on-chip components connected to the routing component; based on the attribute information of the routing component and The second connection relationship determines the type and deployment location of the routing component; and adds the routing component to the SoC according to the type and deployment location of the routing component.
  • the method further includes: for any routing component in the topology, determining a candidate data link formed by the routing component and a data terminal, where the data terminal is an on-chip component or an on-chip component for receiving data; determine the first routing component with the same candidate data link; integrate the first routing component based on the number of inputs and outputs connected to the first routing component; obtain based on the integration
  • the first target routing component adjusts the topology. In this way, by integrating the first routing component, the use of the routing component can be reduced and the use efficiency of the routing component can be improved.
  • the method further includes: based on the initial clock frequency of each routing component in the topology and the bandwidth requirement of the on-chip component, determining the input bit width and Output bit width: assigning a clock domain to each routing component in the topology based on the input bit width and output bit width of each routing component in the topology.
  • allocating clock domains for each routing component based on the input bit width and output bit width of each routing component in the topology structure can make the allocated clock domain cross clock domains to the least, thereby reducing the loss caused by crossing clock domains.
  • the determining the input bit width and output bit width of each routing component in the topology based on the initial clock frequency of each routing component in the topology and the bandwidth requirement of the on-chip component includes: determining one or more second routing components directly connected to the on-chip component in the topology; for each of the second routing components, based on the bandwidth of the first on-chip component connected to the second routing component determining the input bit width of the second routing component, and determining the output bit width of the second routing component based on the initial clock frequency of the second routing component and the bandwidth requirement of the first on-chip component; Based on the output bit width of each of the second routing components, determine the input bit width and output bit width of other routing components in the topology except each of the second routing components.
  • the determining the output bit width of the second routing component based on the initial clock frequency of the second routing component and the bandwidth requirement of the first on-chip component includes: based on the Determine the input bandwidth of the second routing component based on the bandwidth requirement of the first on-chip component; determine the input bandwidth of the second routing component based on the initial clock frequency of the second routing component and the input bandwidth of the second routing component Output bit width.
  • the allocating a clock domain to each routing component in the topology based on the input bit width and output bit width of each routing component in the topology includes: based on the topology and the The input bit width and the output bit width of each routing component in the topological structure, and assign a clock domain to each routing component in the topological structure.
  • the sum of the bit widths of the clock domains allocated to each routing component across clock domains is the smallest, so that the hardware resources required for data transmission across clock domains are the smallest.
  • the allocating a clock domain for each routing component in the topology based on the topology and the input bit width and output bit width of each routing component in the topology includes: based on the According to the above topology, at least one candidate allocation combination is determined, and different candidate allocation combinations are used to allocate different clock domains for each routing component in the topology; based on the input bit width and output bit of each routing component in the topology width, determine the sum of bit widths across clock domains under each candidate allocation combination, and determine the candidate allocation combination with the smallest sum of bit widths as the target allocation combination, so as to provide routing components in the topology according to the target allocation combination Assign clock domains.
  • the determining at least one candidate allocation combination based on the topology includes: performing aggregation processing on the on-chip components and routing components in the topology based on the topology; As a result of the aggregation process described above, at least one candidate allocation combination is determined. In this way, by performing aggregation processing on the on-chip components and routing components in the topology, fewer candidate allocation combinations are generated, thereby improving the allocation efficiency of the clock domain.
  • the method further includes: based on the target clock frequency corresponding to the clock domain allocated for the routing component and the on-chip component Bandwidth requirements, re-determining the input bit width and output bit width of the routing component; based on the topology and the re-determined input bit width and output bit width of the routing component, re-determining multiple candidate allocation combinations, and determining The sum of bit widths across clock domains under each candidate allocation combination; from the re-determined multiple candidate allocation combinations, determine the target allocation combination with the smallest sum of bit widths, and when the target allocation combination is different from the allocated clock domain
  • the method further includes: when the number of return executions exceeds the preset number of times, stopping the execution of the loop process and sending a first alarm message. In this way, the designer can be reminded in time to adjust the topology when the topology is unreasonable.
  • the The method further includes: based on the bandwidth requirements of each on-chip component, verifying the input bandwidth and output bandwidth of each routing component in the topology structure, and sending a second alarm message if the verification fails. In this way, by verifying the input bandwidth and output bandwidth of each routing component in the topology structure, the designer can be reminded in time to adjust the bandwidth when the data bandwidth corresponding to the topology structure is unreasonable.
  • the method further includes: responding to a target device addition operation instruction, adding a target device to the topology structure, where the target device includes a first-in-first-out storage unit and/or a network rate adapter.
  • the designer can respond to the adjustment of the topology structure, so that the data transmission performance of the topology structure is better.
  • an embodiment of the present disclosure further provides an apparatus for determining the network-on-chip topology, including: an acquisition module, configured to acquire the first connection relationship of multiple on-chip components of the system-on-chip and the first connection relationship of the multiple on-chip components. Attribute information; a simplification module, configured to simplify the first connection relationship based on the attribute information of the plurality of on-chip components, to obtain a second connection relationship corresponding to the plurality of on-chip components; an adding module, configured to simplify the first connection relationship based on In the second connection relationship, a routing component for connecting the multiple on-chip components is added in the system-on-chip to obtain a topology corresponding to the multiple on-chip components.
  • an embodiment of the present disclosure further provides a chip, including an on-chip component and a routing component.
  • the network topology between the routing component and the on-chip component is determined based on the method for determining the on-chip network topology described in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present disclosure further provides a computer device, including a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processor The memory communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps in the above-mentioned first aspect or any possible implementation manner of the first aspect are performed.
  • the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any possible implementation manner of the above-mentioned first aspect or the first aspect is executed. in the steps.
  • FIG. 1 shows a flow chart of a method for determining a network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 2 shows a flow chart of a specific method for obtaining a second connection relationship in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 3 shows a flow chart of another specific method for obtaining the second connection relationship among the methods for determining the network-on-chip topology provided by the embodiments of the present disclosure
  • FIG. 4 shows a flow chart of a specific method for adding a routing component in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • Fig. 5a shows a schematic diagram of a splitting algorithm in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 5b shows a schematic diagram of a cascading relationship of routing components in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 5c shows a schematic diagram of another cascading relationship of routing components in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 5d shows a schematic diagram of another cascading relationship of routing components in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 6 shows a flow chart of a specific method for adjusting the topology in the method for determining the topology of the network-on-chip provided by an embodiment of the present disclosure
  • Fig. 7a shows a schematic diagram of the topology structure before adjustment in the method for determining the topology structure of the network-on-chip provided by the embodiment of the present disclosure
  • FIG. 7b shows a schematic diagram of the adjusted topology in the method for determining the topology of the network-on-chip provided by an embodiment of the present disclosure
  • FIG. 8 shows a flow chart of a specific method for allocating clock domains for each routing component in the topology in the method for determining the topology of the network-on-chip provided by an embodiment of the present disclosure
  • FIG. 9 shows a flow chart of a specific method for determining the input bit width and output bit width of the routing component in the method for determining the topology of the on-chip network provided by an embodiment of the present disclosure
  • FIG. 10 shows a flow chart of another specific method for allocating clock domains for each routing component in the topology among the methods for determining the topology of the network-on-chip provided by the embodiments of the present disclosure
  • Figures 11a to 11d show schematic diagrams of allocating clock domains for each routing component in the method for determining the network-on-chip topology provided by the embodiments of the present disclosure
  • FIG. 12 shows a flow chart of a specific method for verifying the clock domain allocation results in the method for determining the network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 13 shows a schematic structural diagram of an apparatus for determining a network-on-chip topology provided by an embodiment of the present disclosure
  • FIG. 14 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • the present disclosure provides a method, device, and chip for determining the topology of an on-chip network.
  • the The efficiency when determining the topology of the on-chip network is higher; by adding a routing component for connecting the plurality of on-chip components in the system-on-chip based on the second connection relationship, the corresponding topology of the plurality of on-chip components is obtained , can automatically generate a topology structure, thereby effectively improving the efficiency of network topology construction.
  • the execution subject of the method for determining the network-on-chip topology is generally a computer device with certain computing capabilities, such as a terminal device or a server or other processing device, and the terminal device may be a user device ( User Equipment, UE), mobile devices, user terminals, terminals, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the method for determining the topology of the on-chip network may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 1 is a flowchart of a method for determining a network-on-chip topology provided by an embodiment of the present disclosure, the method includes steps S101-S103.
  • S101 Acquire a first connection relationship of multiple on-chip components of a system on chip and attribute information of the multiple on-chip components.
  • S102 Simplify the first connection relationship based on the attribute information of the multiple on-chip components to obtain a second connection relationship corresponding to the multiple on-chip components.
  • the on-chip component represents a component in the system on chip, which can be a pre-designed circuit function module (ie IP core), and the IP core includes a network interface that can be used for on-chip network communication, so that it can pass through the network of the IP core.
  • the interface calls the corresponding IP core to realize the corresponding function; the first connection relationship represents the connection relationship between various on-chip components in the system-on-chip.
  • the attribute information of the on-chip component includes the bandwidth requirement of the on-chip component and the address space range that the on-chip component can access.
  • the bandwidth requirement represents the minimum data transmission bandwidth required for the on-chip component to work normally.
  • the bandwidth requirement of the IP core represents the minimum bandwidth required to implement the corresponding circuit function. For example, if the circuit function of an IP core is to process 100 tasks, the corresponding bandwidth requirement of the IP core is the minimum bandwidth required to transmit 100 tasks.
  • the storage space address range indicates the range of the address space that the on-chip component can access (such as the range of the address space of the off-chip storage component that the on-chip component can access), which consists of a base address and an offset range.
  • the offset range is used to indicate the range of the offset. Add the maximum offset within the offset range to the base address to obtain the maximum access address that the on-chip components can access, and add the minimum offset within the offset range to the base address to obtain The minimum access address that the on-chip component can access; the address space range between the minimum access address and the maximum access address is the address space range that the on-chip component can access.
  • the base address of an IP core is X
  • the offset range is Y ⁇ Z (Y ⁇ Z)
  • the minimum access address is X+Y
  • the maximum access address is X+Z
  • the address space that the IP core can access The range is X+Y ⁇ X+Z.
  • the multiple on-chip components can be obtained through the following steps:
  • S201 Cluster the on-chip components whose address space range accessible by the on-chip components meets a first preset condition, to obtain a first clustering result.
  • the address spaces corresponding to IP core 1 and IP core 2 are A-D.
  • a simplified connection relationship may be determined according to the clustered on-chip components.
  • IP core A1 and IP core A2 are respectively connected to IP core B, and IP core A1 and IP core A2 are clustered to form IP core A, then directly through the connection relationship between IP core A and IP core B, it can be used for Realize the connection effect before clustering, so as to achieve the effect of simplifying the connection relationship.
  • the second connection relationship may also be determined according to the following steps:
  • S2021 Perform clustering on the on-chip components whose bandwidth requirements meet the second preset condition in the first clustering result, to obtain a second clustering result.
  • the judging conditions for similar bandwidth requirements include at least one of the following: the absolute value of the difference between the bandwidth requirements of two on-chip components is less than a first preset value, or the quotient of dividing the larger bandwidth requirement by the smaller bandwidth requirement is less than Second default value.
  • the bandwidth requirements of IP core A1, IP core A2, and IP core A3 are 1 Mbps, 5 Mbps, and 6 Mbps as an example. Since IP core A2 , the absolute value of the difference in the bandwidth requirements of IP core A3 is less than the first preset value N, then it can be determined that the bandwidth requirements of IP core A2 and IP core A3 are similar, so IP core A2 and IP core A3 can be clustered, and IP core A2 and IP core A3 can be clustered, and IP Kernel A1 is not clustered.
  • the multiple on-chip components after clustering can be similar in multiple dimensions, so that the clustering effect is better.
  • the attribute information of the on-chip component includes the bandwidth requirement of the on-chip component and the range of address space it can access, in addition to performing a clustering according to the range of the address space first, and then based on the clustering result, Perform secondary clustering based on bandwidth requirements; you can also perform primary clustering based on bandwidth requirements, and then perform secondary clustering based on the address space range on the basis of the primary clustering results, that is, the attribute information on which the clustering is based
  • the sequence basically has no effect on the clustering results.
  • the first connection relationship is simplified based on the attribute information of the plurality of on-chip components to obtain the For the corresponding second connection relationship, the on-chip components whose bandwidth requirement meets the second preset condition may be clustered first to obtain a third clustering result; based on the third clustering result, determine the plurality of on-chip The second connection relationship corresponding to the component.
  • the address space range that can be accessed in the third clustering result conforms to the first preset
  • the conditional on-chip components are clustered to obtain a fourth clustering result; based on the fourth clustering result, a second connection relationship corresponding to the plurality of on-chip components is determined.
  • the routing component represents a component that provides network routing services for the on-chip component, for example, it may be a data selector multiplexer or a router.
  • the data selector has multiple input terminals and one output terminal, and when in use, one input terminal and output terminal can be selected from the multiple input terminals for data transmission.
  • a 4-input-1-output (referred to as 4to1) data selector can select one input terminal and output terminal from four input terminals for data transmission.
  • the router has an input terminal and multiple output terminals, and one of the multiple output terminals can be selected for data transmission with the input terminal during use.
  • a 1-input-4-output (1to4 for short) router can select one of the four output terminals to transmit data with the input terminal.
  • a routing component can be added according to the following steps:
  • the attribute information includes the maximum input quantity and the maximum output quantity of the routing component, and the maximum input quantity and the maximum output quantity are used to indicate the quantity of on-chip components and other routing components connected to the routing component.
  • the maximum number of inputs and the maximum number of outputs may be set by designers according to design requirements. For example, the maximum number of inputs is 5, which means that the data selector can connect up to 5 on-chip components or other routing components. For another example, the maximum number of outputs is 4, which means that the router can connect up to 4 on-chip components or other routing components.
  • S302 Based on the attribute information of the routing component and the second connection relationship, determine a type and a deployment location of the routing component.
  • any one of the data paths is a data transmission path including at least one data link, the input end of each data link belongs to a class cluster (data path), and the output end belongs to a class cluster (data path).
  • any data path in the second connection relationship based on the number of on-chip components at the input end and the number of on-chip components at the output end of the data path, it can be divided into the following four situations.
  • the number of input terminals of the routing component may be firstly determined as the number of on-chip components that need to be connected to the input terminals.
  • the number of on-chip components that need to be connected to the input is N
  • N is a value greater than 1 and less than M
  • M is the maximum number of inputs
  • the routing component is a data selector of Nto1; the deployment location is that it needs to be connected
  • the output terminal of the on-chip component is connected in such a way that the input terminal of the routing component is directly connected to the output terminal of the on-chip component.
  • the maximum number of inputs of the routing component as 5 as an example, when the number of on-chip components to be connected to the input end of a certain routing component is 4, the type of the routing component can be determined as a 4to1 data selector.
  • the number of output terminals of the routing component is the number of on-chip components that need to be connected to the output terminals
  • the deployment location is the input terminal of the on-chip component that needs to be connected
  • the connection method is that the output terminal of the routing component is directly connected to the input of the on-chip component end connection.
  • the routing components on the data path are 4to1 data selectors (on-chip components connected to 4 input terminals) and 1to3 routers (on-chip components connected to 3 output terminals), and the output terminals of the 4to1 data selectors are connected to all The input side of the 1to3 router is connected.
  • Case 2 The number of on-chip components to be connected to the input of the routing component exceeds the maximum input quantity, but the number of on-chip components to be connected to the output of the routing component does not exceed the maximum output quantity.
  • a routing component that originally needs to be connected can be split into multiple routing components with a cascading relationship .
  • the split algorithm can be as shown in Figure 5a, in Figure 5a, the number of on-chip components that need to be connected to the input of the routing component is denoted as K, and the maximum input quantity is denoted as N (generally, N is greater than or equal to 3), K is greater than N, and described splitting algorithm comprises the following steps during concrete execution:
  • Step 3022a judge whether M is greater than N. If M is less than or equal to N, execute step 3023; if M is greater than N, then use M as K, and iteratively execute the above step 3021.
  • Step 3022b judge whether M+1 is greater than N. If M+1 is less than or equal to N, execute step 3023; if M+1 is greater than N, then use M+1 as K, and execute the above step 3021 iteratively.
  • Step 3023 according to the quotient, divisor and corresponding remainder obtained in the above steps, determine the candidate route combination obtained after splitting.
  • the candidate routing combination includes the type and quantity of each routing component.
  • the divisor when step 3021 is executed for the first time can be determined as the number of connections of the input terminals of the first type of routing components of the first layer in the cascading relationship (here denoted as the number of input terminals of routing component 1);
  • the quotient when step 3021 is executed is determined as the number of routing components 1 in the cascading relationship; the remainder when step 3021 is executed for the first time is determined as the second type of routing component of the first layer in the cascading relationship
  • the number of connections of the input terminals of (here denoted as the number of input terminals of the routing component 2), wherein the number of the routing component 2 is 1.
  • the divisor when step 3021 is executed for the second time can be determined as the number of connections of the input terminals of the first type of routing components in the second layer in the cascading relationship (here denoted as the number of input terminals of the routing component 3);
  • the quotient when step 3021 is executed for the second time is determined as the number of routing components 3 in the cascade relationship; the remainder when step 3021 is executed for the second time is determined as the second type of the second layer in the cascade relationship
  • the number of connections of the input terminals of the routing component (here denoted as the number of input terminals of the routing component 4), wherein the number of the routing component 4 is 1.
  • the above steps are performed iteratively until the types and quantities of the routing components of each layer except the last layer of the cascade relationship are determined. If the remainder of the last calculation is not 0, the number of connections at the input of the routing component of the last layer of the cascade relationship is the quotient+1 of the last calculation, and the number of the routing component is 1; if the last calculation If the remainder is 0, then the number of connections at the input ends of the routing components at the last level of the cascading relationship is the quotient of the last calculation, and the number of routing components at the last level is 1. It can be seen that the finally obtained number of layers of routing components in the cascading relationship may be the number of times step 3021 is performed + 1.
  • the cascading relationship of the routing components can be as shown in Figure 5b.
  • the divisor 5 in the first operation is determined as the routing component of the first type in the first layer of the cascading relationship as 5to1
  • the data selector determines the quotient 6 in the first calculation as the number of routing components of the first type to be 6, and determines the remainder 2 in the first calculation as the data of the routing component of the second type being 2to1 selector, the number of routing components of the second type is 1, so that it can be determined that the routing components of the first layer in the cascading relationship are 6 5to1 data selectors and 1 2to1 output data selector; according to the above
  • the method judges the divisor, quotient and remainder in the second operation, and the routing components of the second layer in the cascading relationship can be obtained as a 5to1 data selector and a 2to1 data selector, and the cascading relationship
  • the routing component of the last layer (third layer) is a 2to1 data selector.
  • Step 3024 Use the positive integers in the interval (N, 2) sequentially to update N in the above steps, until multiple candidate route combinations corresponding to all the positive integers in the interval [N, 2] are obtained.
  • the cascading relationship of the routing components may be as shown in FIG. 5c.
  • the divisor 4 in the first operation is determined as the first type of routing component in the first layer of the cascading relationship as a 4to1 data selector
  • the quotient 8 in the first operation is determined as the first
  • the number of routing components of the first type is 8, so it can be determined that the routing components of the first layer in the cascading relationship are 8 4to1 data selectors; according to the above method to judge the divisor, quotient and remainder of the second operation, you can It is obtained that the routing components of the second layer in the cascading relationship are two 4to1 data selectors, and the routing components of the last layer (third layer) in the cascading relationship are one 2to1 data selector.
  • Step 3025 based on preset screening rules, determine a target route combination from multiple candidate route combinations. For example, the determination may be made based on at least one of a sum of remainders and a sum of quotients corresponding to candidate route combinations.
  • the remainder and the quotient obtained by each candidate routing combination in the process of executing step 3021 multiple times can be determined, and the corresponding sum of the remainder and the sum of the quotient can be determined;
  • the candidate routing combination with the smallest sum of corresponding remainders is the first candidate routing combination; among the first candidate routing combinations, the first candidate routing combination with the smallest corresponding quotient sum is the target Routing combination.
  • the remainder in the operation process represents the type of the second type of routing component used in the cascading relationship at this level. To ensure fairness across different inputs and outputs, the smaller the remainder, the better. For example, when the remainder is 0, it means that there is no routing component of the second type in the first layer of the cascading relationship. Since the routing component of the first type is used, the fairness of each input end and output end is the highest at this time.
  • the quotient in the operation represents the number of routing components of a certain type used in the cascading relationship. Accordingly, the sum of quotients represents the number of entire candidate routing combinations, and the smallest sum of quotients means the least number of routing components used.
  • the number of output terminals of the routing component is the number of on-chip components that need to be connected to the output terminals
  • the deployment location is the input terminal of the on-chip component that needs to be connected
  • the connection method is that the output terminal of the routing component is directly connected to the input of the on-chip component end connection.
  • the type of the routing component is a 1to3 router.
  • the routing component on the data path is a plurality of data selectors and 1to3 routers (on-chip components connected to 3 output terminals) with a cascade relationship as shown in Figure 5c, and the output terminals of the 2to1 data selectors in Figure 5c are connected to The input side of the 1to3 router is connected.
  • Case 3 The number of on-chip components to be connected to the input of the routing component does not exceed the maximum input quantity, and the number of on-chip components to be connected to the output of the routing component exceeds the maximum output quantity.
  • the number of on-chip components that need to be connected to the output of the routing component can be brought into K in the case 2, and the maximum output quantity can be brought into N in the case 2 (generally, N is greater than or equal to 3 ), K is greater than N, and then execute steps 3021 to 3025 in the case 2 in sequence, to obtain the types and connection modes of multiple routing components with a cascading relationship after splitting.
  • Case 4 The number of on-chip components to be connected to the input of the routing component exceeds the maximum input quantity, and the number of on-chip components to be connected to the output of the routing component also exceeds the maximum output quantity.
  • a routing component that needs to be connected here can be split into multiple Routing components. Then, according to the number of on-chip components that need to be connected to the output of the routing component, the maximum output quantity, and the preset split algorithm, a routing component that needs to be connected here can be split into cascaded components. Multiple routing components.
  • the relevant content of the split algorithm can refer to the relevant descriptions in case 2 and case 3
  • the operation process for the output end can refer to the content in case 2
  • the operation process for the input end can refer to the content in case 3.
  • the cascading relationship of the routing components may be as shown in FIG. 5d.
  • the first layer is eight 4to1 data selectors
  • the second layer is two 4to1 data selectors
  • the third layer is one 2to1 data selector
  • the fourth layer is a 1to2 router
  • the fifth The layer is a 1to5 router and a 1to2 router.
  • S303 Add a routing component to the SoC according to the type and deployment location of the routing component. In this way, the topology corresponding to the multiple on-chip components can be obtained.
  • the topology structures may be adjusted so that the topology structures are more in line with actual needs.
  • the topology can be simplified to save the number of routing components used and improve the efficiency of routing components; or, it is also possible to respond to the designer’s target device to add an operation instruction to add a target device to the topology to improve the network The data transfer efficiency of the topology.
  • the topology structure may be adjusted according to the following steps:
  • S401 For any routing component in the topology, determine a candidate data link formed by the routing component and a data end, where the data end is an on-chip component for data transmission or an on-chip component for data reception.
  • the candidate data links are A-B, B-C1, and B-C2.
  • S402 Determine multiple first routing components with the same candidate data link.
  • routing components in data links originating from the same on-chip component that is, the data terminals at the input terminals are the same
  • arriving at the same on-chip component that is, the data terminals at the output terminals are the same
  • the first routing component may be understood as a routing component connected to the same on-chip component, or a routing component connected to the same on-chip component through the same routing component.
  • FIG. 7a the schematic diagram of the topology structure before adjustment may be as shown in FIG. 7a.
  • Fig. 7a there are 10 IP cores at the data sending end and 12 IP cores at the data receiving end.
  • the data path is formed by 10 1to2 routers, 1 10to8 crossbar switch matrix, and 1 10to4 crossbar switch matrix.
  • 10to8 crossbar matrix and 10to4 crossbar matrix they are connected to 10 identical IP cores through 10 identical 1to2 routers, so the 10to8 crossbar matrix and 10to4 crossbar matrix are respectively for the same
  • the data link of an on-chip component is the same candidate data link.
  • the crossbar matrix can be formed based on the data selector and router, so as to make the connection relationship in the topology more concise.
  • a 10to8 crossbar switch matrix can be composed of a 10to1 data selector and a 1to8 router.
  • the 10to8 crossbar matrix and the 10to4 crossbar matrix are generally packaged as a whole component, so they can be regarded as a routing component.
  • S403 Based on the number of inputs and outputs connected to each of the first routing components, integrate the multiple first routing components.
  • the multiple first routing components may be integrated. Specifically, since the candidate data links connected by multiple first routing components are identical, multiple first routing components can be merged, and the topology can be adjusted to ensure that the merged first target routing component can still Ensure that the connection relationship between data terminals does not change compared to before the merge.
  • the first routing component A and the first routing component B can be merged, and the data link 1 It can be directly connected to the combined first target routing component, thereby reducing the number of routing components on the premise of ensuring normal data transmission between data terminals.
  • the integration of the first routing component can be divided into the following two situations:
  • the divisor can be determined as the greatest common divisor, thereby reducing the number of first target routing components by increasing the number of inputs and outputs of the first target routing component. Use the number. Therefore, at this time, the greatest common divisor and the sum of the output quantities of the different first routing components may be determined, and the first target routing component may be determined based on the greatest common divisor and the sum of the output quantities.
  • the input quantity of the first routing component is the input quantity of the first target routing component, and the routing component connected to the input end of the first routing component needs to be deleted due to the integration of the first routing component;
  • the greatest common divisor is the output quantity of the first target routing component; the quotient of the sum of the output quantities of the first routing component and the greatest common divisor is the value of the first target routing component that needs to be added after integration
  • the output quantity, the greatest common divisor is the quantity of the first target routing component that needs to be added.
  • the 10to8 crossbar matrix and the 10to4 crossbar matrix are the first routing components with the same number of inputs and different numbers of outputs. According to the same candidate data link on the left and the number of inputs, it can be determined that the number of inputs of the first target routing component obtained by integrating the 10to8 crossbar matrix and the 10to4 crossbar matrix is 10; because the first routing component before the integration The greatest common divisor of the output numbers 8 and 4 is 4, then it can be determined that the output number of the first target routing component is 4, that is, the first target routing component includes a 10to4 crossbar matrix; a 10to8 crossbar matrix and a 10to4 crossbar The sum of the outputs of the matrix is 12, and the greatest common divisor is 4, then the quotient of the sum of the outputs of the first routing component and the greatest common divisor is 3, then it can be determined that the first target routing component that needs to be added after integration The number of outputs is 3, and the number of first destination routing components to be added
  • the number of inputs of the first target routing component is 10 (as shown in Figures 7a-7b, 10 1to2 routers are reduced after integration), According to the number of outputs 7 and 4, it can be determined that the sum of the number of output terminals is 11. At this time, the output quantity of the first target routing component can be determined according to the above split algorithm, for example, the output quantity of the first target routing component can be determined is 4, the first destination routing components that need to be added are three 1to3 routers and one 1to2 router.
  • the greatest common divisor and the sum of the input numbers of different first routing components can be determined, and based on the determined greatest common divisor and The sum of the output quantities of the first routing component determines the first target routing component.
  • the output quantity of the first routing component is the output quantity of the first target routing component, and the routing component connected to the output end of the first routing component needs to be deleted due to the integration of the first routing component;
  • the greatest common divisor is the input quantity of the first target routing component;
  • the quotient of the sum of the input quantities of the first routing component and the greatest common divisor is the value of the first target routing component that needs to be added after integration Input the quantity, and the greatest common divisor is the quantity of the first target routing component that needs to be added.
  • the adjusted topology diagram can be shown in Figure 7b.
  • Figure 7b there are 10 IP cores at the data sending end, and 12 IP cores at the data receiving end, through a 10to4 crossbar switch matrix and 4 1to3
  • the router constitutes the data path, and compared with Figure 7a, the use of routing components is reduced and the use efficiency of routing components is improved.
  • the initial clock frequency corresponding to each routing component in the topology structure may cause a high cost of hardware passing through two different clock frequencies during data transmission, so it can be allocated for each added routing component Appropriate clock domains to reduce the hardware cost of data paths crossing clock domains.
  • a clock domain may be assigned to each routing component in the topology according to the following steps:
  • S501 Based on the initial clock frequency of the routing component in the topology and the bandwidth requirement of the on-chip component, determine an input bit width and an output bit width of the routing component in the topology.
  • the initial clock frequency of the routing component can be the same as the clock domain of any on-chip component connected to the routing component; the bit width indicates the number of bits of data transmitted within a certain clock cycle, and the bandwidth indicates the transmission of data within a certain period of time amount of data.
  • the bandwidth may be represented by a product of a bit width and a clock frequency.
  • the input bit width and output bit width of the routing component may be determined through the following steps:
  • S5011 Determine the output bit width of the on-chip component based on the bandwidth requirement of the on-chip component and the clock domain corresponding to the on-chip component; determine one or more second nodes directly connected to the on-chip component in the topology Routing components.
  • the bandwidth requirement of the on-chip component may be divided by the clock frequency in the clock domain corresponding to the on-chip component to obtain the output bit width of the on-chip component.
  • the clock frequencies corresponding to different on-chip components may be different, and thus the output bit widths of the obtained on-chip components may also be different.
  • S5012 For each second routing component, determine the input bit width of the second routing component based on the bandwidth requirement of the first on-chip component connected to the second routing component, and based on the initial clock frequency and The bandwidth requirement of the first on-chip component determines the output bit width of the second routing component.
  • the output bit width of the first on-chip component may be the input bit width of the second routing component connected to the first on-chip component.
  • the initial clock frequency and the input bandwidth of the second routing component determine the output bit width of the second routing component.
  • the input bandwidth of the second routing component is the sum of bandwidth requirements of at least one first on-chip component connected to the second routing component.
  • the output bandwidth of the second routing component should be greater than or equal to the input bandwidth, so the output bandwidth of the second routing component is at least the input bandwidth.
  • the output bit width of the second routing component may be obtained by dividing the input bandwidth by the initial clock frequency corresponding to the second routing component.
  • S5013 Based on the output bit width of each second routing component, determine the input bit width and output bit width of other routing components in the topology except each second routing component.
  • the input bit width and output bit width of the routing component on the data link are the output bit width of the previous device connected to the current device on the data path, and the sum of the input bit widths of the current device is the output bit width of all the previous devices connected to the current device on the data path. The sum of the output bit widths of a device.
  • S502 Based on the input bit width and output bit width of each routing component in the topology, allocate a clock domain to each routing component in the topology.
  • a clock domain may be assigned to each routing component in the topology based on the topology and the input bit width and output bit width of each routing component in the topology. Wherein, the sum of bit widths across clock domains assigned to clock domains for each routing component is the smallest.
  • clock domains may be assigned to each routing component in the topology through the following steps:
  • S5021 Determine at least one candidate allocation combination based on the topology structure. Wherein, different candidate allocation combinations are used to allocate different clock domains to the routing components in the topology.
  • the clock domain of the on-chip component that has a connection relationship with the routing component may be used for allocation.
  • the clock domain corresponding to the IP core at the input end is clock domain 1
  • the clock domain corresponding to the IP core at the output end is clock domain 2
  • Clock domain 1 and clock domain 2 may be used for allocation to obtain at least one candidate allocation combination.
  • at least one candidate allocation combination is generated according to the clock domain that may be allocated to each routing component.
  • aggregate processing may also be performed on the on-chip components and routing components in the topology based on the topology; based on the result of the aggregation, determine at least one
  • the specific content will be described in detail below, and will not be further described here.
  • S5022 Determine the sum of bit widths across clock domains under each candidate allocation combination, and determine the candidate allocation combination with the smallest sum of bit widths as a target allocation combination, so as to allocate clock domains for each routing component in the topology according to the target allocation combination.
  • the sum of bit widths across clock domains under each candidate allocation combination can be calculated, and the allocation combination with the smallest sum of bit widths across clock domains can be determined as the target allocation combination .
  • schematic diagrams of allocating clock domains for each routing component in the topology may be as shown in FIG. 11a to FIG. 11d .
  • FIG. 11a shows a schematic diagram of the data path before the allocation of clock domains in the topology.
  • NI represents the network interface of the IP core, and different shading types represent different clock domains.
  • Interfaces NI0 and NI1 are located in clock domain 1
  • interfaces NI2 are located in clock domain 2
  • interfaces NI3 and NI4 are located in clock domain 3
  • s0, s1, and s2 respectively represent three routing components
  • the direction of the arrow indicates the direction of data transmission
  • the number indicates the direction of data communication.
  • Figure 11b shows a schematic diagram of aggregation processing of on-chip components.
  • connection relationship corresponding to the topology structure all possibilities of performing aggregation processing on the routing component and the on-chip component can be determined, and the candidate allocation combination can be determined according to the result of the aggregation processing.
  • routing component s0 and interfaces NI0 ⁇ 1 can be aggregated on the basis of Figure 11b, that is, clock domain 1 is allocated to routing component s0; according to routing component s1 and routing component
  • routing component s1, routing component s2, and interfaces NI3-4 can be aggregated on the basis of Figure 11b, that is, clock domain 3 is allocated to routing component s1 and routing component s2.
  • routing component s2 and interfaces NI3-4 are aggregated to obtain combinations s1, s2, and NI3-4.
  • routing component s0 routing component s0
  • routing component s1 routing component s2
  • other components in the topology structure four candidate allocation combinations corresponding to clustering processing can be obtained, respectively.
  • the target allocation combination with the smallest sum of cross-clock domain bit widths can be obtained as the allocation combination 1).
  • Fig. 11d shows a schematic diagram of a data path after clock domains are allocated in the topology.
  • the input bit width and output bit width of the routing component may also change accordingly. If you want to determine that the sum of the bit widths across clock domains after the bit width change is the smallest, you need to verify the allocation results.
  • the allocation result of the clock domain may be verified through the following steps:
  • S601 Re-determine an input bit width and an output bit width of the routing component based on a target clock frequency corresponding to a clock domain allocated to the routing component and a bandwidth requirement of the on-chip component.
  • S602 Based on the topology structure and the re-determined input bit width and output bit width of the routing component, re-determine multiple candidate allocation combinations, and determine a sum of bit widths across clock domains under each candidate allocation combination.
  • S603 From the re-determined multiple candidate allocation combinations, determine the target allocation combination with the smallest sum of bit widths, and return to execute based on the target allocation combination if the target allocation combination is different from the allocated clock domain A step of re-determining the input bit width and output bit width of the routing component.
  • the steps of determining the input bit width and output bit width of the routing component and determining the target allocation combination may be performed cyclically until the target allocation combination is the same as the allocated clock domain.
  • the above steps may be executed repeatedly, but the target allocation combination is still different from the allocated clock domain.
  • the execution of the loop process may be stopped, and a first alarm message may be sent. In this way, the designer can be prompted through the first alarm information that the current topology structure cannot meet the design requirements, and the topology structure needs to be adjusted.
  • the output bandwidth of the last routing component can meet the requirements of the on-chip component connected to the last routing component.
  • Bandwidth requirements if the output bandwidth of the last routing component is less than the minimum bandwidth required by the on-chip component connected to the last routing component, it means that the bandwidth allocation at this time is unreasonable, and a second alarm message needs to be sent to remind the designer of the allocated bandwidth Insufficient, need to adjust accordingly.
  • a target device may also be added to the topology structure in response to a target device addition operation instruction.
  • the target device includes a first-in-first-out storage unit and/or a network rate adapter.
  • designers can adjust the topology according to the received alarm information, so that the data transmission performance of the topology is better.
  • the designer can add operation instructions through the target device, and in the topology structure, add a first-in-first-out storage unit and a network rate adapter to reduce network congestion and data transmission waiting time , to optimize the topology.
  • the method for determining the topology structure of the on-chip network can make the determination of the topology structure of the network on chip more efficient by simplifying the processing of the first connection relationship;
  • a routing component for connecting the multiple on-chip components is added to the system-on-chip to obtain the topology structures corresponding to the multiple on-chip components, and the topology structure can be automatically generated, thereby effectively improving the efficiency of network topology construction.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present disclosure also provides a device for determining the topology of the network on chip corresponding to the method for determining the topology of the network on chip.
  • the method for determining the topology of the network on chip is similar, so the introduction of the device can refer to the introduction of the method, and the repetition will not be repeated.
  • FIG. 13 it is a schematic structural diagram of an apparatus for determining a network-on-chip topology provided by an embodiment of the present disclosure.
  • the apparatus includes an acquisition module 1301 , a simplification module 1302 , and an addition module 1303 .
  • the acquiring module 1301 is configured to acquire the first connection relationship of multiple on-chip components of the system on chip and the attribute information of the multiple on-chip components;
  • the simplification module 1302 is configured to, based on the attribute information of the multiple on-chip components,
  • the first connection relationship is simplified to obtain a second connection relationship corresponding to the plurality of on-chip components;
  • the adding module 1303 is configured to add in the system on chip based on the second connection relationship
  • the routing components of the on-chip components are used to obtain the topology structures corresponding to the multiple on-chip components.
  • the attribute information of the on-chip component includes a bandwidth requirement of the on-chip component and/or an address space range that the on-chip component can access.
  • the simplification module 1302 can be specifically configured to: match the accessible address space range to The on-chip components of the first preset condition are clustered to obtain a first clustering result; based on the first clustering result, a second connection relationship corresponding to the plurality of on-chip components is determined.
  • the simplification module 1302 may be specifically configured to: in the first clustering result, the bandwidth requirement meets the second preset condition The on-chip components are clustered to obtain a second clustering result; based on the second clustering result, a second connection relationship corresponding to the plurality of on-chip components is determined.
  • the simplification module 1302 may be specifically configured to: perform an on-chip component whose bandwidth requirement meets the second preset condition. clustering to obtain a third clustering result; based on the third clustering result, determine a second connection relationship corresponding to the plurality of on-chip components.
  • the simplification module 1302 can be specifically configured to: in the third clustering result, the accessible The on-chip components whose address space range meets the first preset condition are clustered to obtain a fourth clustering result; based on the fourth clustering result, a second connection relationship corresponding to the plurality of on-chip components is determined.
  • the adding module 1303 may be specifically configured to: acquire attribute information of the routing component, where the attribute information includes the maximum number of inputs and the maximum number of outputs of the routing component, and the maximum The number of inputs and the maximum number of outputs are used to indicate the number of on-chip components connected to the routing component; based on the attribute information of the routing component and the second connection relationship, determine the type and deployment location of the routing component; according to The type and deployment location of the routing component is to add a routing component to the system-on-chip.
  • the device further includes an adjustment module 1304, configured to: for any routing component in the topology, determine a candidate data link formed by the routing component and a data terminal, the data terminal An on-chip component for data transmission or an on-chip component for data reception; determine the first routing component with the same candidate data link; based on the number of input and output connections of the first routing component, the performing integration; adjusting the topology structure based on the integrated first target routing component.
  • an adjustment module 1304 configured to: for any routing component in the topology, determine a candidate data link formed by the routing component and a data terminal, the data terminal An on-chip component for data transmission or an on-chip component for data reception; determine the first routing component with the same candidate data link; based on the number of input and output connections of the first routing component, the performing integration; adjusting the topology structure based on the integrated first target routing component.
  • the device further includes an allocation module 1305, configured to: determine the clock frequency of each routing component in the topology based on the initial clock frequency of each routing component in the topology and the bandwidth requirement of the on-chip component. The input bit width and the output bit width; based on the input bit width and the output bit width of each routing component in the topology, allocate a clock domain for each routing component in the topology.
  • the allocation module 1305 may be specifically configured to: determine the output bit width of the on-chip component based on the bandwidth requirement of the on-chip component and the clock domain corresponding to the on-chip component; determine the topology One or more second routing components directly connected to the on-chip component in the structure; for each of the second routing components, based on the bandwidth requirements of the first on-chip component connected to the second routing component, determine the first The input bit width of the second routing component, and based on the initial clock frequency of the second routing component and the bandwidth requirement of the first on-chip component, determine the output bit width of the second routing component; based on each of the second The output bit width of the routing component determines the input bit width and output bit width of other routing components in the topology except the second routing components.
  • the allocation module 1305 may be specifically configured to: determine the input bandwidth of the second routing component based on the bandwidth requirement of the first on-chip component; determine the input bandwidth of the second routing component based on the initial The clock frequency and the input bandwidth of the second routing component determine the output bit width of the second routing component.
  • the allocation module 1305 may be specifically configured to: based on the topology and the input bit width and output bit width of each routing component in the topology, assign Allocating clock domains, wherein the sum of bit widths across clock domains allocated to each routing component is the smallest.
  • the allocation module 1305 may be specifically configured to: determine at least one candidate allocation combination based on the topology structure, and different candidate allocation combinations are used to allocate different routing components in the topology structure. clock domain; based on the input bit width and output bit width of each routing component in the topology, determine the sum of bit widths across clock domains under each candidate allocation combination, and determine the candidate allocation combination with the smallest sum of bit width as The target allocation combination is used to allocate clock domains to each routing component in the topology according to the target allocation combination.
  • the allocation module 1305 may be specifically configured to: perform aggregation processing on the on-chip components and routing components in the topology structure based on the topology structure; determine at least A combination of candidate assignments.
  • the allocation module 1305 is further configured to: re-determine the input bit of the routing component based on the target clock frequency corresponding to the clock domain allocated to the routing component and the bandwidth requirement of the on-chip component width and output bit width; based on the topology and the re-determined input bit width and output bit width of the routing component, re-determine multiple candidate allocation combinations, and determine the difference between the bit widths across clock domains under each candidate allocation combination and; from the re-determined multiple candidate allocation combinations, determine the target allocation combination with the smallest sum of bit widths, and return to execute based on the target allocation combination if the target allocation combination is different from the allocated clock domain A step of re-determining the input bit width and output bit width of the routing component.
  • the allocation module 1305 is further configured to: stop executing the loop process and send a first alarm message when the number of return execution times exceeds a preset number of times.
  • the allocation module 1305 can also be used to: verify the input bandwidth and output bandwidth of each routing component in the topology based on the bandwidth requirements of each on-chip component, and if the verification fails Next, send the second alarm message.
  • the adding module 1303 is further configured to: respond to a target device adding operation instruction, and add a target device to the topology.
  • the target device includes a first-in-first-out storage unit and/or a network rate adapter.
  • the device for determining the topology of the network on chip provided by the embodiments of the present disclosure can make the determination of the topology of the network on chip more efficient by simplifying the processing of the first connection relationship;
  • a routing component for connecting the multiple on-chip components is added to the system on chip to obtain the topology structure corresponding to the multiple on-chip components, and the topology structure can be automatically generated, thereby effectively improving the efficiency of constructing the network topology.
  • the embodiment of the present disclosure also provides a computer device.
  • the computer device 1400 includes a processor 1401 , a memory 1402 , and a bus 1403 .
  • the memory 1402 is used to store execution instructions, including a memory 14021 and an external memory 14022 .
  • the memory 14021 can also be referred to as an internal memory, and is used for temporarily storing calculation data in the processor 1401 and data exchanged with an external memory 14022 such as a hard disk.
  • the processor 1401 exchanges data with the external memory 14022 through the memory 14021.
  • the processor 1401 executes the following instructions: acquire the first connection relationship of the multiple on-chip components of the system-on-chip and the attribute information of the multiple on-chip components; attribute information of the plurality of on-chip components, and simplify the first connection relationship to obtain a second connection relationship corresponding to the plurality of on-chip components; A routing component that connects the multiple on-chip components to obtain a topology corresponding to the multiple on-chip components.
  • An embodiment of the present disclosure also provides a chip, including an on-chip component and a routing component.
  • the network topology between the routing component and the on-chip component may be determined based on the method for determining the on-chip network topology described in any embodiment of the present disclosure.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the method for determining the network-on-chip topology described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the program code includes instructions that can be used to execute the steps of the method for determining the network-on-chip topology described in the above method embodiments
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本公开提供了一种用于确定片上网络拓扑结构的方法、装置及芯片。根据该方法的一个示例,在获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息之后,可基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系。然后,可基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。

Description

用于确定片上网络拓扑结构的方法、装置及芯片 技术领域
本公开涉及片上网络技术领域,具体而言,涉及用于确定片上网络拓扑结构的方法、装置及芯片。
背景技术
随着片上系统(System on Chip,SoC)对低互连延迟、高吞吐率和扩展性的需求不断提高,基于总线的互连方式很难满足现在SoC的性能需求,而基于信息交换的片上网络(Network on Chip,NoC)已逐渐成为SoC中不同组件间通信的互连架构。相关技术中,在构建片上网络拓扑结构时,需要设计人员根据连接关系手动构建。随着网络拓扑结构越来越复杂,人工构建网络拓扑效率低下的问题也愈发明显。
发明内容
本公开实施例至少提供一种用于确定片上网络拓扑结构的方法、装置及芯片。
第一方面,本公开实施例提供了一种用于确定片上网络拓扑结构的方法,包括:获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
这样,通过对第一连接关系进行简化处理,可使得在确定片上网络拓扑结构时的效率更高;通过基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构,可以自动生成拓扑结构,从而有效提高网络拓扑构建效率。
一种可能的实施方式中,所述片上组件的属性信息包括该片上组件的带宽需求和/或该片上组件所能访问的地址空间范围。相应地,在所述片上组件的属性信息包括该片上组件所能访问的地址空间范围的情况下,所述基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系,包括:对所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第一聚类结果;基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系。这样,可以根据地址空间这一维度对连接关系进行聚类,从而使得后续在构件拓扑结构时效率更高。
进一步,在所述片上组件的属性信息还包括该片上组件的带宽需求的情况下,所述基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系,包括:对所述第一聚类结果中,带宽需求符合第二预设条件的片上组件进行聚类,得到第二聚类结果;基于所述第二聚类结果,确定所述多个片上组件对应的第二连接关系。这样,通过在其他维度对第一聚类结果再次进行聚类,可以使得聚类之后的多个片上组件在多个维度上都是相似的,从而使得聚类效果更好。
一种可能的实施方式中,在所述片上组件的属性信息包括该片上组件的带宽需求的情况下,所述基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系,包括:对带宽需求符合第二预设条件的片上组件进行聚类,得到第三聚类结果;基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系。
进一步,在所述片上组件的属性信息还包括该片上组件所能访问的地址空间范围的情况下,所述基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系,包括:对所述第三聚类结果中,对所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第四聚类结果;基于所述第四聚类结果,确定所述多个片上组件对应的第二连接关系。
一种可能的实施方式中,所述基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,包括:获取所述路由组件的属性信息,所述属性信息包括所述路由组件的最大输入数量和最大输出数量,所述最大输入数量和所述最大输出 数量用于表示所述路由组件连接的片上组件的数量;基于所述路由组件的属性信息和所述第二连接关系,确定所述路由组件的类型以及部署位置;按照所述路由组件的类型和部署位置,在所述片上系统中添加路由组件。
一种可能的实施方式中,所述方法还包括:针对所述拓扑结构中的任一路由组件,确定该路由组件与数据端构成的候选数据链路,所述数据端为进行数据发送的片上组件或进行数据接收的片上组件;确定所述候选数据链路完全相同的第一路由组件;基于所述第一路由组件连接的输入输出数量,对所述第一路由组件进行整合;基于整合得到的第一目标路由组件,调整所述拓扑结构。这样,通过对第一路由组件进行整合,可以减少路由组件的使用,提高路由组件的使用效率。
一种可能的实施方式中,所述方法还包括:基于所述拓扑结构中各路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中各路由组件的输入位宽和输出位宽;基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域。这样,基于拓扑结构中各路由组件的输入位宽和输出位宽为各路由组件分配时钟域,可以使得分配的时钟域跨时钟域最少,进而降低跨时钟域所带来的损耗。
一种可能的实施方式中,所述基于拓扑结构中各路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中各路由组件的输入位宽和输出位宽,包括:确定所述拓扑结构中与所述片上组件直接连接的一个或多个第二路由组件;针对每个所述第二路由组件,基于与所述第二路由组件连接的第一片上组件的带宽需求,确定所述第二路由组件的输入位宽,并基于所述第二路由组件的初始时钟频率和所述第一片上组件的带宽需求,确定所述第二路由组件的输出位宽;基于各所述第二路由组件的输出位宽,确定所述拓扑结构中除各所述第二路由组件外的其他路由组件的输入位宽和输出位宽。
一种可能的实施方式中,所述基于所述第二路由组件的初始时钟频率和所述第一片上组件的带宽需求,确定所述第二路由组件的输出位宽,包括:基于所述第一片上组件的带宽需求,确定所述第二路由组件的输入带宽;基于所述第二路由组件的初始时钟频率和所述第二路由组件的输入带宽,确定所述第二路由组件的输出位宽。
一种可能的实施方式中,所述基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域,包括:基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域。其中,为各路由组件分配的时钟域跨时钟域的位宽之和最小,从而使得数据传输在跨时钟域时需要的硬件资源最小。
一种可能的实施方式中,所述基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域,包括:基于所述拓扑结构,确定至少一种候选分配组合,不同的候选分配组合用于为所述拓扑结构中各路由组件分配不同的时钟域;基于所述拓扑结构中各路由组件的输入位宽和输出位宽,确定各候选分配组合下跨时钟域的位宽之和,并将位宽之和最小的候选分配组合确定为目标分配组合,以根据所述目标分配组合为所述拓扑结构中各路由组件分配时钟域。
一种可能的实施方式中,所述基于所述拓扑结构,确定至少一种候选分配组合,包括:基于所述拓扑结构,对所述拓扑结构中的片上组件和路由组件进行聚合处理;基于所述聚合处理的结果,确定至少一种候选分配组合。这样,通过对所述拓扑结构中的片上组件和路由组件进行聚合处理,使得产生的候选分配组合数量更少,从而可以提高时钟域的分配效率。
一种可能的实施方式中,在为所述拓扑结构中各路由组件分配时钟域之后,所述方法还包括:基于为所述路由组件分配的时钟域对应的目标时钟频率和所述片上组件的带宽需求,重新确定所述路由组件的输入位宽和输出位宽;基于所述拓扑结构和重新确定的所述路由组件的输入位宽、输出位宽,重新确定多种候选分配组合,并确定各候选分配组合下跨时钟域的位宽之和;从重新确定的多种候选分配组合中,确定位宽之和最小的目标分配组合,并在所述目标分配组合与已分配的时钟域不同的情况下,基于所述目标分配组合返回执行重新确定所述路由组件的输入位宽和输出位宽的步骤。这样,通过对时钟域的分配结果进行验证,并在验证不通过的情况下,重新执行时钟域的分配过程,从而可以确保最终确定的目标分配组合下的跨时钟域位宽之和最小。
一种可能的实施方式中,所述方法还包括:在返回执行的次数超过预设次数的情况 下,停止执行循环过程,并发送第一报警信息。这样,可以在拓扑结构不合理的情况下,及时提醒设计人员对拓扑结构进行调整。
一种可能的实施方式中,在基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构之后,所述方法还包括:基于各片上组件的带宽需求,对所述拓扑结构中各路由组件的输入带宽和输出带宽进行验证,并在验证不通过的情况下,发送第二报警信息。这样,通过对拓扑结构中各路由组件的输入带宽和输出带宽进行验证,可以在拓扑结构对应的数据带宽不合理的情况下,及时提醒设计人员对带宽进行调整。
一种可能的实施方式中,所述方法还包括:响应目标器件添加操作指令,在所述拓扑结构中添加目标器件,所述目标器件包括先入先出存储单元和/或网络速率适配器。这样,可以响应设计人员对拓扑结构的调整,从而使得拓扑结构的数据传输性能更好。
第二方面,本公开实施例还提供一种用于确定片上网络拓扑结构的装置,包括:获取模块,用于获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;简化模块,用于基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;添加模块,用于基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
第三方面,本公开实施例还提供一种芯片,包括片上组件和路由组件。其中,所述路由组件和片上组件之间的网络拓扑结构,为基于上述第一方面或第一方面中任一种可能的实施方式所述的用于确定片上网络拓扑结构的方法确定的。
第四方面,本公开实施例还提供一种计算机设备,包括处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面或第一方面中任一种可能的实施方式中的步骤。
第五方面,本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面或第一方面中任一种可能的实施方式中的步骤。
关于上述用于确定片上网络拓扑结构的装置、芯片、计算机设备及存储介质的效果描述参见上述用于确定片上网络拓扑结构的方法的说明,这里不再赘述。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种用于确定片上网络拓扑结构的方法的流程图;
图2示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,一种得到第二连接关系的具体方法的流程图;
图3示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,另一种得到第二连接关系的具体方法的流程图;
图4示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,添加路由组件的具体方法的流程图;
图5a示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,拆分算法的示意图;
图5b示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,一种路由组件的级联关系的示意图;
图5c示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,另一种路由组件的级联关系的示意图;
图5d示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,另一种路由组件的级联关系的示意图;
图6示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,对所述拓扑结构进行调整的具体方法的流程图;
图7a示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,调整前的拓扑结构示意图;
图7b示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,调整后的拓扑结构示意图;
图8示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,一种为拓扑结构中各路由组件分配时钟域的具体方法的流程图;
图9示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,确定路由组件的输入位宽和输出位宽的具体方法的流程图;
图10示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,另一种为拓扑结构中各路由组件分配时钟域的具体方法的流程图;
图11a~图11d示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,为各路由组件分配时钟域的示意图;
图12示出了本公开实施例所提供的用于确定片上网络拓扑结构的方法中,对时钟域的分配结果进行验证的具体方法的流程图;
图13示出了本公开实施例所提供的一种用于确定片上网络拓扑结构的装置的架构示意图;
图14示出了本公开实施例所提供的一种计算机设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述。所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
在构建片上网络拓扑结构时,需要设计人员根据连接关系手动构建。而随着网络拓扑结构越来越复杂,人工构建网络拓扑效率低下的问题也愈发明显。基于上述研究,本公开提供了一种用于确定片上网络拓扑结构的方法、装置及芯片,通过基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,可以使得在确定片上网络拓扑结构时的效率更高;通过基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构,可以自动生成拓扑结构,从而有效提高网络拓扑构建效率。
本公开实施例所提供的用于确定片上网络拓扑结构的方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该用于确定片上网络拓扑结构的方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
参见图1所示,为本公开实施例提供的用于确定片上网络拓扑结构的方法的流程图,所述方法包括步骤S101~S103。
S101:获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息。
S102:基于所述多个片上组件的属性信息对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系。
S103:基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
以下是对上述步骤的详细介绍。针对S101,所述片上组件表示片上系统中的组件,其可以是预先设计好的电路功能模块(即IP核),IP核中包含可用于片上网络通信的网络接口,从而能够通过IP核的网络接口调用相应的IP核,以实现相应的功能;所述第一连接关系表示在片上系统中,各个片上组件之间的连接关系。
所述片上组件的属性信息包括片上组件的带宽需求和片上组件所能访问的地址空间范围。所述带宽需求表示片上组件正常工作所需要使用的最小数据传输带宽,以所述片上组件为IP核为例,所述IP核的带宽需求表示实现对应的电路功能所需要的最小带宽。比如,某IP核的电路功能为处理100个任务,则该IP核对应的带宽需求为传输100个任务所需的最小带宽。
所述存储空间地址范围表示片上组件所能访问的地址空间的范围(比如片上组件所能访问的片外存储组件的地址空间的范围),其由基址和偏移范围组成。所述偏移范围用来表示偏移量的范围。在基址的基础上加上偏移范围内的最大偏移量,可以得到片上组件所能访问的最大访问地址,在基址的基础上加上偏移范围内的最小偏移量,可以得到片上组件所能访问的最小访问地址;最小访问地址与最大访问地址之间的地址空间范围,即为该片上组件所能访问的地址空间范围。比如,某IP核的基址为X,偏移范围为Y~Z(Y<Z),则最小访问地址为X+Y,最大访问地址为X+Z,该IP核所能访问的地址空间范围为X+Y~X+Z。
针对S102,一种可能的实施方式中,在所述片上组件的属性信息包括该片上组件所能访问的地址空间范围的情况下,如图2所示,可以通过以下步骤得到所述多个片上组件对应的第二连接关系:
S201:对所述片上组件所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第一聚类结果。
S202:基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系。
在根据所能访问的地址空间进行聚类时,可以将地址空间相近或相同的片上组件进行聚类。其中,所述地址空间相近是指两个地址空间具有重合部分。
示例性的,以IP核1对应的地址空间为A~C,IP核2对应的地址空间为B~D,A<B<C<D为例,IP核1和IP核2对应的地址空间存在重合部分B~C,可以将IP核1和IP核2进行聚类,聚类之后的IP核对应的地址空间为A~D。
进一步的,在对所述片上组件进行聚类之后,可以根据聚类之后的片上组件确定简化的连接关系。
示例性的,IP核A1和IP核A2分别与IP核B连接,IP核A1和IP核A2聚类后为IP核A,则直接通过IP核A与IP核B的连接关系,即可用于实现聚类前的连接效果,从而达到了简化连接关系的效果。
一种可能的实施方式中,在所述片上组件的属性信息包括该片上组件的带宽需求的情况下,如图3所示,还可以根据以下步骤确定所述第二连接关系:
S2021:对所述第一聚类结果中,带宽需求符合第二预设条件的片上组件进行聚类,得到第二聚类结果。
S2022:基于所述第二聚类结果,确定所述多个片上组件对应的第二连接关系。
在根据所述带宽需求进行聚类时,可以将带宽需求相似的片上组件进行聚类。其中,所述带宽需求相似的判断条件包括以下至少一种:两个片上组件带宽需求之差的绝对值小于第一预设值,或带宽需求较大值除以带宽需求较小值之商小于第二预设值。
示例性的,以基于地址空间范围进行聚类后得到的第一聚类结果中,IP核A1、IP核A2、IP核A3的带宽需求依次为1Mbps、5Mbps、6Mbps为例,由于IP核A2、IP核A3的带宽需求之差的绝对值小于第一预设值N,则可以确定IP核A2、IP核A3的带宽需求相似,因此可以将IP核A2和IP核A3聚类,而IP核A1则不进行聚类。
这样,通过在其他维度对第一聚类结果再次进行聚类,可以使得聚类之后的多个 片上组件在多个维度上都是相似的,从而使得聚类效果更好。
在所述片上组件的属性信息包括该片上组件的带宽需求和其所能访问的地址空间范围的情况下,除了可以先根据地址空间范围进行一次聚类,然后在一次聚类结果的基础上再根据带宽需求进行二次聚类;还可以先根据带宽需求进行一次聚类,然后在一次聚类结果的基础上再根据地址空间范围进行二次聚类,也即聚类所依据的属性信息的先后顺序对聚类结果基本上不会产生影响。
例如,在所述片上组件的属性信息包括该片上组件的带宽需求的情况下,在基于所述多个片上组件的属性信息对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系时,可以先对带宽需求符合所述第二预设条件的片上组件进行聚类,得到第三聚类结果;基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系。接着,在所述片上组件的属性信息还包括该片上组件所能访问的地址空间范围的情况下,还可以对所述第三聚类结果中所能访问的地址空间范围符合所述第一预设条件的片上组件进行聚类,得到第四聚类结果;基于所述第四聚类结果,确定所述多个片上组件对应的第二连接关系。
针对S103,所述路由组件表示为所述片上组件提供网络路由服务的组件,例如可以是数据选择器multiplexer,或路由器router。其中,所述数据选择器有多个输入端和一个输出端,在使用时可以从多个输入端中选择一个输入端与输出端进行数据传输。比如4输入1输出(简称4to1)数据选择器,可以从4个输入端中选择1个输入端与输出端进行数据传输。所述路由器有一个输入端和多个输出端,在使用时可以从多个输出端中选择一个输出端与输入端进行数据传输。比如1输入4输出(简称1to4)路由器,可以从4个输出端中选择1个输出端与输入端进行数据传输。
一种可能的实施方式中,如图4所示,可以根据以下步骤添加路由组件:
S301:获取所述路由组件的属性信息。所述属性信息包括所述路由组件的最大输入数量和最大输出数量,所述最大输入数量和所述最大输出数量用于表示所述路由组件连接的片上组件和其他路由组件的数量。所述最大输入数量和最大输出数量可以是设计人员根据设计需求设定的。比如所述最大输入数量为5,表示数据选择器最多可以连接5个片上组件或其他路由组件。又比如所述最大输出数量为4,表示路由器最多可以连接4个片上组件或其他路由组件。
S302:基于所述路由组件的属性信息和所述第二连接关系,确定所述路由组件的类型以及部署位置。
在确定所述路由组件的类型以及部署位置时,针对所述第二连接关系中的任一数据通路,基于该数据通路的输入端的片上组件的数量以及输出端的片上组件的数量,确定插入该数据通路的片上组件的类型和数量。所述任一数据通路为包含至少一条数据链路的数据传输通道,每个数据链路的输入端属于一个类簇(数据通路),输出端属于一个类簇(数据通路)。
针对所述第二连接关系中的任一数据通路,基于该数据通路的输入端的片上组件的数量以及输出端的片上组件的数量,可以分为以下四种情况。
情况1、路由组件输入端所需要连接的片上组件的数量不超过最大输入数量,且路由组件输出端所需要连接的片上组件的数量也不超过最大输出数量。
此时,可以先确定路由组件的输入端数量为与输入端所需要连接的片上组件的数量。示例性的,若输入端所需要连接的片上组件的数量为N,N为大于1小于M的值,M为最大输入数量,则可以确定路由组件为Nto1的数据选择器;部署位置为需要连接的片上组件的输出端,连接方式为路由组件的输入端直接与片上组件的输出端连接。以所述路由组件的最大输入数量为5为例,当某一路由组件输入端所需要连接的片上组件的数量为4时,可以确定该路由组件的类型为4to1的数据选择器。
进一步的,可以确定路由组件的输出端数量为与输出端所需要连接的片上组件的数量,部署位置为需要连接的片上组件的输入端,连接方式为路由组件的输出端直接与片上组件的输入端连接。承接上例,以所述路由组件的最大输出数量为5为例,当某一路由组件输出端所需要连接的片上组件的数量为3时,可以确定该路由组件的类型为1to3的路由器。这样,该数据通路上的路由组件为4to1的数据选择器(连接4个输入端的片上组件)和1to3的路由器(连接3个输出端的片上组件),所述4to1的数据选择器的输出端与所述1to3的路由器的输入端连接。
情况2、路由组件输入端所需要连接的片上组件的数量超过最大输入数量,但路由组件输出端所需要连接的片上组件的数量不超过最大输出数量。
可以根据所述路由组件输入端所需要连接的片上组件的数量、所述最大输入数量以及预设的拆分算法,将原本需要连接的一个路由组件拆分成具有级联关系的多个路由组件。所述拆分算法可以如图5a所示,图5a中,将所述路由组件输入端所需要连接的片上组件的数量记为K,所述最大输入数量记为N(一般的,N大于等于3),K大于N,所述拆分算法具体执行时包括以下步骤:
步骤3021、判断K能否被N整除。可以首先从N开始判断,记K与N的商为M,余数为P。当判断K能被N整除(即P=0)时,执行步骤3022a;当判断K不能被N整除时,执行步骤3022b。
步骤3022a、判断M是否大于N。若M小于等于N,执行步骤3023;若M大于N,则将M作为K,迭代执行上述步骤3021。
步骤3022b、判断M+1是否大于N。若M+1小于等于N,执行步骤3023;若M+1大于N,则将M+1作为K,迭代执行上述步骤3021。
步骤3023、根据上述步骤得到的商、除数以及对应的余数,确定拆分后得到的候选路由组合。其中,所述候选路由组合包括各路由组件的类型和数量。
可以将第一次执行步骤3021时的除数,确定为级联关系中第一层的第一类型的路由组件的输入端的连接数量(这里记为路由组件1的输入端数量);将第一次执行步骤3021时的商,确定为所述级联关系中路由组件1的数量;将第一次执行步骤3021时的余数,确定为所述级联关系中第一层的第二类型的路由组件的输入端的连接数量(这里记为路由组件2的输入端数量),其中,所述路由组件2的数量为1。
同理,可以将第二次执行步骤3021时的除数,确定为级联关系中第二层的第一类型的路由组件的输入端的连接数量(这里记为路由组件3的输入端数量);将第二次执行步骤3021时的商,确定为所述级联关系中路由组件3的数量;将第二次执行步骤3021时的余数,确定为所述级联关系中第二层的第二类型的路由组件的输入端的连接数量(这里记为路由组件4的输入端数量),其中,所述路由组件4的数量为1。
迭代执行上述步骤,直至确定出所述级联关系的除最后一层之外的各层路由组件的类型和数量。若最后一次运算的余数不为0,则所述级联关系最后一层的路由组件的输入端的连接数量为最后一次运算时的商+1,该路由组件的数量为1;若最后一次运算的余数为0,则所述级联关系最后一层的路由组件的输入端的连接数量为最后一次运算时的商,该最后一层的路由组件的数量为1。可见,最终得到的所述级联关系中路由组件的层数可以为执行步骤3021的次数+1。
以所述路由组件的最大输入数量为5为例,当某一路由组件输入端所需要连接的片上组件和其他路由组件的数量为32时,可以使用32÷5=6…2,判断6+1大于5,则使用7÷5=1…2,此时1+1小于5,运算结束。
此时,所述路由组件的级联关系可以如图5b所示,图5b中,将第一次运算时的除数5,确定为级联关系第一层中的第一类型的路由组件为5to1的数据选择器,将第一次运算时的商6,确定为第一类型的路由组件的数量为6,将第一次运算时的余数2,确定为第二类型的路由组件为2to1的数据选择器,所述第二类型的路由组件的数量为1,这样可以确定出级联关系中第一层的路由组件为6个5to1的数据选择器和1个2to1的输出数据选择器;根据上述方法判断第二次运算时的除数、商以及余数,可得在所述级联关系中第二层的路由组件为1个5to1的数据选择器和1个2to1的数据选择器,以及级联关系中最后一层(第三层)的路由组件为1个2to1的数据选择器。
步骤3024、依次使用(N,2]区间内的正整数,更新上述步骤中的N,直至得到[N,2]区间内所有正整数分别对应的多个候选路由组合。
承接上例,仍以所述路由组件的最大输入数量为5为例,在根据上述计算得到N=5对应的候选路由组合后,可以使用4对N进行更新,代入N=4进行计算,使用32÷4=8,判断8大于4,则使用8÷4=2,此时2小于4,运算结束。
此时,所述路由组件的级联关系可以如图5c所示。图5c中,将第一次运算时的除数4,确定为级联关系第一层中的第一类型的路由组件为4to1的数据选择器,将第一次运算时的商8,确定为第一类型的路由组件的数量为8,这样可以确定出级联关系中第一层的路由组件为8个4to1的数据选择器;根据上述方法判断第二次运算时的除数、商 以及余数,可得在所述级联关系中第二层的路由组件为2个4to1的数据选择器,以及级联关系中最后一层(第三层)的路由组件为1个2to1的数据选择器。
根据上述步骤继续确定N=3和N=2分别对应的候选路由组合,可以得到N=5、4、3、2分别对应的4个候选路由组合。
步骤3025、基于预设的筛选规则,从多个候选路由组合中确定目标路由组合。例如,可以基于候选路由组合对应的余数之和以及商之和中的至少一个进行确定。
一种可能的实施方式中,针对每个N,可以确定各候选路由组合在多次执行步骤3021的过程中得到的余数和商,并可以确定对应的余数之和以及商之和;确定所述多个候选路由组合中,对应的余数之和最小的候选路由组合为第一候选路由组合;确定所述第一候选路由组合中,对应的商之和最小的第一候选路由组合为所述目标路由组合。
运算过程中的余数表示在该层级联关系中,所使用的第二类型的路由组件的类型。为了确保不同输入端和输出端的公平性,余数越小越好。例如,当余数为0时,表征在级联关系的第一层中没有第二类型的路由组件,由于均使用第一类型的路由组件,此时各输入端和输出端的公平性最高。运算过程中的商表征在级联关系中,使用的某一类型的路由组件的数量。据此,商之和即表示整个候选路由组合的数量,商之和最小也即使用路由组件的数量最少。
承接上例,可确定N=5时,运算过程中的商为6和1,余数为2和2;N=4时,运算过程中的商为8和2,余数为0;N=3时,运算过程中的商为10、3、1,余数为2、2、1;N=2时,运算过程中的商为16、8、4、2,余数为0。由此,N=5时,商之和为7,余数之和为4;N=4时,商之和为10,余数之和为0;N=3时,商之和为14,余数之和为5;N=2时,商之和为30,余数之和为0。此时,可根据余数之和最小确定出N=4和N=2这两种第一候选路由组合,再根据商之和最小确定N=4为所述目标路由组合。
进一步的,可以确定路由组件的输出端数量为与输出端所需要连接的片上组件的数量,部署位置为需要连接的片上组件的输入端,连接方式为路由组件的输出端直接与片上组件的输入端连接。
承接上例,以所述路由组件的最大输出数量为5为例,当某一路由组件输出端所需要连接的片上组件的数量为3时,可以确定该路由组件的类型为1to3的路由器,该数据通路上的路由组件为如图5c所示的具有级联关系的多个数据选择器和1to3的路由器(连接3个输出端的片上组件),图5c中的2to1的数据选择器的输出端与所述1to3的路由器的输入端连接。
情况3、路由组件输入端所需要连接的片上组件的数量不超过最大输入数量,路由组件输出端所需要连接的片上组件的数量超过最大输出数量。
可以先确定路由组件的输入端数量为与输入端所需要连接的片上组件的数量,部署位置为需要连接的片上组件的输出端,连接方式为路由组件的输入端直接与片上组件的输出端连接。然后,可以根据所述路由组件输出端所需要连接的片上组件的数量、所述最大输出数量以及预设的拆分算法,将此处原本需要连接的一个路由组件拆分成具有级联关系的多个路由组件。例如,可以将所述路由组件输出端所需要连接的片上组件的数量带入所述情况2中的K,所述最大输出数量带入所述情况2中的N(一般的,N大于等于3),K大于N,然后依次执行所述情况2中的步骤3021~步骤3025,即可得到拆分之后具有级联关系的多个路由组件的类型和连接方式。
以所述路由组件的最大输出数量为5为例,当某一路由组件输出端所需要连接的片上组件和其他路由组件的数量为7时,可以使用7÷5=1…2,此时1+1小于5,运算结束。此时判断的顺序与上述情况2中的顺序相反,先从最后一次运算的商开始判断,直至判断到第一次运算时的商和余数,可以判断出级联关系中第一层为1个1to2的路由器,第二层为1个1to5的路由器和1个1to2的路由器。
情况4、路由组件输入端所需要连接的片上组件的数量超过最大输入数量,路由组件输出端所需要连接的片上组件的数量也超过最大输出数量。
可以根据所述路由组件输入端所需要连接的片上组件的数量、所述最大输入数量以及预设的拆分算法,将此处原本需要连接的一个路由组件拆分成具有级联关系的多个路由组件。然后,可以根据所述路由组件输出端所需要连接的片上组件的数量、所述最大输出数量以及预设的拆分算法,将此处原本需要连接的一个路由组件拆分成具有级联关系的多个路由组件。例如,所述拆分算法的相关内容可参见情况2和情况3中的相关 描述,针对输出端的运算过程可以参照情况2中的内容,针对输入端的运算过程可以参照情况3中的内容。
以所述路由组件的最大输入数量和最大输出数量为5为例,当某一路由组件输入端所需要连接的片上组件和其他路由组件的数量为32,输出端所需要连接的片上组件和其他路由组件的数量为7时,所述路由组件的级联关系可以如图5d所示。图5d中,第一层为8个4to1的数据选择器,第二层为2个4to1的数据选择器,第三层为1个2to1的数据选择器,第四层为1to2的路由器,第五层为1个1to5的路由器和1个1to2的路由器。其中,各层路由组件的确定过程可以参照情况2和情况3的示例,在此不再赘述。
S303:按照所述路由组件的类型和部署位置,在所述片上系统中添加路由组件。这样,可得到所述多个片上组件对应的拓扑结构。
实际应用中,在得到所述多个片上组件对应的拓扑结构之后,可以对所述拓扑结构进行调整,以使得拓扑结构更符合实际需要。比如可以对所述拓扑结构进行简化以节约路由组件的使用数量,提高路由组件的使用效率;或者,也可以响应设计人员的目标器件添加操作指令,为所述拓扑结构添加目标器件,以提高网络拓扑结构的数据传输效率。
一种可能的实施方式中,在得到所述片上组件对应的拓扑结构之后,如图6所示,可以根据以下步骤对所述拓扑结构进行调整:
S401:针对所述拓扑结构中的任一路由组件,确定该路由组件与数据端构成的候选数据链路,所述数据端为进行数据发送的片上组件或进行数据接收的片上组件。
以所述拓扑结构为进行数据发送的片上组件为IP核A,路由组件为路由组件B,进行数据接收的片上组件为IP核C1、IP核C2为例,则该路由组件与数据端构成的候选数据链路即为A-B、B-C1、B-C2。
S402:确定候选数据链路相同的多个第一路由组件。
例如,可将从相同的片上组件出发的数据链路(即位于输入端的数据端相同),或到达相同片上组件的数据链路(即位于输出端的数据端相同)中的路由组件,确定为所述第一路由组件。在一种实施方式中,所述第一路由组件可理解为与相同的片上组件进行连接的路由组件,或者通过相同的路由组件与相同的片上组件进行连接的路由组件。
示例性的,调整前的拓扑结构示意图可以如图7a所示。图7a中,数据发送端的IP核共有10个,数据接收端的IP核共有12个。通过10个1to2的路由器、1个10to8的交叉开关矩阵、1个10to4的交叉开关矩阵构成了数据通路。对于10to8的交叉开关矩阵和10to4的交叉开关矩阵而言,均通过10个相同的1to2的路由器分别与10个相同的IP核进行连接,因此10to8的交叉开关矩阵和10to4的交叉开关矩阵分别针对同一个片上组件的数据链路为相同的候选数据链路。
所述交叉开关矩阵可为基于所述数据选择器和路由器构成的,以使得拓扑结构中连接关系更为简洁。比如10to8的交叉开关矩阵可以是由1个10to1的数据选择器和1个1to8的路由器组成。10to8的交叉开关矩阵和10to4的交叉开关矩阵一般为封装好的一个整体的组件,因此可以看作是一个路由组件。
S403:基于各所述第一路由组件连接的输入输出数量,对所述多个第一路由组件进行整合。
为了减少拓扑结构中路由组件的数量,可以对所述多个第一路由组件进行整合。具体的,由于多个第一路由组件连接的候选数据链路完全相同,因此可以将多个第一路由组件进行合并,并通过调整拓扑结构的方式,确保合并得到的第一目标路由组件仍然能保证数据端之间的连接关系相比合并之前不发生变化。
示例性的,若第一路由组件A和第一路由组件B对应的候选数据链路均为数据链路1,则可以将第一路由组件A和第一路由组件B进行合并,数据链路1可以直接与合并得到的第一目标路由组件进行连接,进而可以在保证数据端之间进行正常的数据传输的前提下,减少路由组件的数量。
根据所述第一路由组件的输入输出数量,对所述第一路由组件进行整合时,可以分为以下2种情况:
情况1、所述第一路由组件的输入数量相同,输出数量不同。
在对输入侧数据通路相同、输出侧数据通路不同的第一路由组件进行整合时,为了确保不同输入端和输出端的公平性,需要确保在进行整合之后需要新增的第一目标路 由组件的类型是相同的(也即能被不同第一路由组件的输出数量整除,余数为0,此时公平性最高),因此需要确定一个能被不同第一路由组件的输出数量整除的约数。而为了进一步的减少第一目标路由组件的使用个数,可以将所述约数确定为最大公约数,从而可以通过增大第一目标路由组件输入输出数量的方式,减少第一目标路由组件的使用个数。因此此时可以确定不同第一路由组件的输出数量的最大公约数和输出数量之和,并基于所述最大公约数和输出数量之和来确定第一目标路由组件。
具体的,所述第一路由组件的输入数量即为第一目标路由组件的输入数量,与所述第一路由组件的输入端连接的路由组件,由于第一路由组件的整合而需要被删除;所述最大公约数即为第一目标路由组件的输出数量;所述第一路由组件的输出数量之和与所述最大公约数的商,即为整合之后需要新增的第一目标路由组件的输出数量,最大公约数即为需要新增的第一目标路由组件的数量。
承接上例,如图7a所示,10to8的交叉开关矩阵和10to4的交叉开关矩阵即为输入数量相同、输出数量不同的第一路由组件。根据左侧的相同候选数据链路以及输入数量可以确定在将10to8的交叉开关矩阵和10to4的交叉开关矩阵进行整合得到的第一目标路由组件的输入数量为10;由于整合之前的第一路由组件的输出数量8和4的最大公约数为4,则可以确定第一目标路由组件的输出数量为4,即第一目标路由组件包含10to4的交叉开关矩阵;10to8的交叉开关矩阵和10to4的交叉开关矩阵的输出数量之和为12,最大公约数为4,则第一路由组件的输出数量之和与所述最大公约数的商为3,则可以确定整合之后需要新增的第一目标路由组件的输出数量3,需要新增的第一目标路由组件的数量为4,即需要添加4个1to3的路由器。
以所述第一路由组件为10to7的交叉开关矩阵和10to4的交叉开关矩阵为例,第一目标路由组件的输入数量为10(如图7a-图7b,整合之后减少了10个1to2路由器),根据输出数量7和4,可以确定输出端数量之和为11,此时可以根据上述拆分算法,确定第一目标路由组件的输出数量,比如可以确定出所述第一目标路由组件的输出数量为4,需要新增的第一目标路由组件为3个1to3的路由器和1个1to2的路由器。
情况2、所述第一路由组件的输入数量不同,输出数量相同。
在对输入侧数据通路不同、输出侧数据通路相同的第一路由组件进行整合时,可以确定不同第一路由组件的输入数量的最大公约数和输入数量之和,并基于确定的最大公约数和第一路由组件的输出数量之和,确定第一目标路由组件。
具体的,所述第一路由组件的输出数量即为第一目标路由组件的输出数量,与所述第一路由组件的输出端连接的路由组件,由于第一路由组件的整合而需要被删除;所述最大公约数即为第一目标路由组件的输入数量;所述第一路由组件的输入数量之和与所述最大公约数的商,即为整合之后需要新增的第一目标路由组件的输入数量,最大公约数即为需要新增的第一目标路由组件的数量,相关示例参见情况1中的相关示例,在此不再赘述。
S404:基于整合得到的第一目标路由组件,调整所述拓扑结构。
示例性的,调整后的拓扑结构示意图可以如图7b所示,图7b中,数据发送端的IP核共有10个,数据接收端的IP核共有12个,通过1个10to4交叉开关矩阵和4个1to3路由器构成了数据通路,与图7a相比,减少了路由组件的使用,提高了路由组件的使用效率。
在根据上述步骤得到拓扑结构之后,拓扑结构中的各路由组件对应的初始时钟频率,可能会造成数据传输时,经过两个不同时钟频率的硬件代价较大,因此可以为添加的各路由组件分配合适的时钟域,以减小数据通路在跨时钟域时的硬件代价。
一种可能的实施方式中,如图8所示,可以根据以下步骤为所述拓扑结构中各路由组件分配时钟域:
S501:基于所述拓扑结构中路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中路由组件的输入位宽和输出位宽。
针对任一路由组件,该路由组件的初始时钟频率可以是与该路由组件连接的任一片上组件的时钟域相同;位宽表示一定时钟周期内传输数据的位数,带宽表示一定时间内传输数据的数据量。其中,所述带宽可以使用位宽与时钟频率的乘积表示。
一种可能的实施方式中,如图9所示,可以通过以下步骤确定所述路由组件的输入位宽和输出位宽:
S5011:基于所述片上组件的带宽需求和所述片上组件对应的时钟域,确定所述片上组件的输出位宽;确定所述拓扑结构中与所述片上组件直接连接的一个或多个第二路由组件。可以使用所述片上组件的带宽需求,除以所述片上组件对应的时钟域中的时钟频率,得到所述片上组件的输出位宽。其中,不同的片上组件对应的时钟频率可能不同,从而得到的片上组件的输出位宽也可能不同。
S5012:针对每个第二路由组件,基于与该第二路由组件连接的第一片上组件的带宽需求确定该第二路由组件的输入位宽,并基于该第二路由组件的初始时钟频率和所述第一片上组件的带宽需求确定该第二路由组件的输出位宽。
第一片上组件的输出位宽可为与所述第一片上组件连接的第二路由组件的输入位宽。一种可能的实施方式中,在确定第二路由组件的输出位宽时,可以基于所述第一片上组件的带宽需求,确定第二路由组件的输入带宽之后,基于所述第二路由组件的初始时钟频率和所述第二路由组件的输入带宽,确定所述第二路由组件的输出位宽。
具体的,针对任一所述第二路由组件,该第二路由组件的输入带宽为与该第二路由组件所连接的至少一个第一片上组件的带宽需求之和。在得到所述第二路组件的输入带宽之后,所述第二路由组件的输出带宽应大于等于所述输入带宽,因此所述第二路由组件的输出带宽最小为所述输入带宽。可以使用所述输入带宽除以所述第二路由组件对应的初始时钟频率,得到所述第二路由组件的输出位宽。
以第二路由组件A连接的IP核A1和IP核A2的带宽需求分别为800Mbps和800Mbps,初始时钟频率为100MHZ为例,可以得到所述第二路由组件A的输入带宽为1600Mbps,根据公式带宽=位宽×频率÷8可得,所述第二路由组件的输出位宽为1600Mbps×8÷100MHZ=128bit(即128位)。
S5013:基于各第二路由组件的输出位宽,确定所述拓扑结构中除各所述第二路由组件外的其他路由组件的输入位宽和输出位宽。
承接上步,针对任一数据链路,在得到该数据链路上的第二路由组件的输出位宽之后,可以从进行数据发送的数据端开始,直至进行数据接收的数据端,依次确定该数据链路上的路由组件的输入位宽和输出位宽。其中,当前器件的输入位宽即为该数据通路上与该当前器件连接的上一器件的输出位宽,当前器件的输入位宽之和即为该数据通路上与该当前器件连接的所有上一器件的输出位宽之和。
S502:基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域。
可以基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域。其中,为各路由组件分配时钟域跨时钟域的位宽之和最小。
一种可能的实施方式中,如图10所示,可以通过以下步骤为所述拓扑结构中各路由组件分配时钟域:
S5021:基于所述拓扑结构确定至少一种候选分配组合。其中,不同的候选分配组合用于为所述拓扑结构中路由组件分配不同的时钟域。
一种可能的实施方式中,在为所述路由组件分配时钟域时,可以使用与路由组件具有连接关系的片上组件的时钟域进行分配。示例性的,针对路由组件A所在的数据通路,输入端的IP核对应的时钟域为时钟域1,输出端的IP核对应的时钟域为时钟域2,则在为路由组件A分配时钟域时,可以使用时钟域1和时钟域2进行分配,以得到至少一种候选分配组合。在得到可能为各路由组件分配的时钟域之后,根据可能为每个路由组件分配的时钟域,生成至少一个候选分配组合。
另一种可能的实施方式中,在确定候选分配组合时,还可以基于所述拓扑结构对所述拓扑结构中的片上组件和路由组件进行聚合处理;基于所述聚合处理的结果,确定至少一种候选分配组合,具体内容将在下文进行详细描述,在此不再展开描述。
S5022:确定各候选分配组合下跨时钟域的位宽之和,并将位宽之和最小的候选分配组合确定为目标分配组合,以根据目标分配组合为拓扑结构中各路由组件分配时钟域。
可以根据拓扑结构中各路由组件的输入位宽和输出位宽,计算各候选分配组合下跨时钟域的位宽之和,将跨时钟域的位宽之和最小的分配组合确定为目标分配组合。示例性的,为拓扑结构中各路由组件分配时钟域的示意图可以如图11a~图11d所示。
图11a表示拓扑结构中分配时钟域前的数据通路的示意图。图11a中,NI表示IP 核的网络接口,不同的阴影类型表示不同的时钟域。接口NI0和NI1位于时钟域1,接口NI2位于时钟域2,接口NI3和NI4位于时钟域3,s0、s1、s2分别表示3个路由组件,箭头方向表示数据的传输方向,数字则表示数据通络上传输的数据位宽。
图11b表示对片上组件进行聚合处理的示意图,图11b中,将同处于时钟域1的接口NI0和NI1进行了聚合处理,生成了接口NI0~1,聚合处理后的输出位宽为1024+512=1536;以及,将同处于时钟域3的接口NI3和NI4进行了聚合处理,生成了接口NI3~4,聚合处理后的输入位宽为512+512=1024。
进一步的,根据拓扑结构对应的连接关系可以确定出将路由组件与片上组件进行聚合处理的所有可能,根据聚合处理的结果即可确定出所述候选分配组合。
具体的,根据路由组件s0的拓扑结构,可以在图11b的基础上将路由组件s0和接口NI0~1再进行聚合处理,也即给路由组件s0分配时钟域1;根据路由组件s1和路由组件s2的拓扑结构,可以在图11b的基础上将路由组件s1、路由组件s2和接口NI3~4再进行聚合处理,也即给路由组件s1和路由组件s2分配时钟域3。
示例性的,对路由组件和片上组件进行聚合处理的示意图可以如图11c所示,图11c中,将路由组件s0和接口NI0~1进行聚合处理得到了组合NI0~1,s0;将路由组件s1和路由组件s2和接口NI3~4进行聚合处理得到了组合s1,s2,NI3~4。
进一步的,根据拓扑结构中路由组件s0、路由组件s1、路由组件s2与其他组件的连接关系,可以得到聚类处理后对应的4种候选分配组合,分别为,
1)依次为路由组件s0、路由组件s1、路由组件s2分配时钟域1、时钟域3、时钟域3;
2)依次为路由组件s0、路由组件s1、路由组件s2分配时钟域1、时钟域2、时钟域3;
3)依次为路由组件s0、路由组件s1、路由组件s2分配时钟域1、时钟域2、时钟域2;
4)依次为路由组件s0、路由组件s1、路由组件s2分配时钟域1、时钟域2、时钟域1。
依次确定各候选分配组合对应的跨时钟域位宽之后,可以得到跨时钟域位宽之和最小的目标分配组合为分配组合1)。
图11d表示拓扑结构中分配时钟域后的数据通路的示意图。图11d中,依次为路由组件s0、路由组件s1、路由组件s2分配时钟域1、时钟域3、时钟域3,跨时钟域带宽之和为512+128=640,在所有候选分配组合中最小。
在根据上述步骤得到为每个添加的路由组件分配的时钟域之后,由于此前确定的路由组件的输入位宽和输出位宽是基于路由组件对应的初始时钟频率确定的,而当确定新的时钟频率之后,路由组件的输入位宽和输出位宽也可能相应的发生变化,想要确定位宽变化之后的跨时钟域的位宽之和最小,则需要对分配结果进行验证。
一种可能的实施方式中,在为所述拓扑结构中各路由组件分配时钟域之后,如图12所示,可以通过以下步骤对所述时钟域的分配结果进行验证:
S601:基于为所述路由组件分配的时钟域对应的目标时钟频率和所述片上组件的带宽需求,重新确定所述路由组件的输入位宽和输出位宽。
S602:基于所述拓扑结构和重新确定的所述路由组件的输入位宽、输出位宽,重新确定多种候选分配组合,并确定各候选分配组合下跨时钟域的位宽之和。
S603:从重新确定的多种候选分配组合中,确定位宽之和最小的目标分配组合,并在所述目标分配组合与已分配的时钟域不同的情况下,基于所述目标分配组合返回执行重新确定所述路由组件的输入位宽和输出位宽的步骤。
确定所述路由组件的输入位宽和输出位宽的相关描述,以及确定候选分配组合和目标分配组合的相关描述,可参照上文相关内容,在此不再赘述。
若目标分配组合与已分配的时钟域不同,可以循环执行确定路由组件的输入位宽和输出位宽,以及确定目标分配组合的步骤,直至目标分配组合与已分配的时钟域相同。
实际应用中,可能出现循环执行上述步骤多次,但目标分配组合与已分配的时钟域仍然不同的情况。一种可能的实施方式中,在返回执行的次数超过预设次数的情况下,可以停止执行循环过程,并发送第一报警信息。这样,可以通过第一报警信息提示设计人员当前的拓扑结构无法满足设计要求,需要对拓扑结构进行调整。
一种可能的实施方式中,在基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构之后,可以基于各片上组件的带宽需求对所述拓扑结构中各路由组件的输入带宽和输出带宽进行验证,并在验证不通过的情况下,发送第二报警信息。
具体的,在对所述拓扑结构中各路由组件的输入带宽和输出带宽进行验证时,针对任一数据通路,可以确定最后一个路由组件的输出带宽是否能满足最后一个路由组件连接的片上组件的带宽需求,若最后一个路由组件的输出带宽小于最后一个路由组件连接的片上组件需求的最小带宽,表示此时的带宽分配不合理,此时需要发送第二报警信息,以提示设计人员分配的带宽不足,需要进行相应的调节。
一种可能的实施方式中,在得到所述多个片上组件对应的拓扑结构之后,还可以响应目标器件添加操作指令,在所述拓扑结构中添加目标器件。其中,所述目标器件包括先入先出存储单元和/或网络速率适配器。
此外,设计人员可根据接收到的报警信息对拓扑结构进行调节,以使得拓扑结构的数据传输性能更好。示例性的,在接收到第二报警信息之后,设计人员可以通过目标器件添加操作指令,在所述拓扑结构中,添加先入先出存储单元和网络速率适配器,以减少网络阻塞和数据传输等待时间,实现对拓扑结构的优化。
本公开实施例提供的用于确定片上网络拓扑结构的方法,通过对第一连接关系进行简化处理,可以使得在确定片上网络拓扑结构时的效率更高;通过基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构,可以自动生成拓扑结构,从而有效提高网络拓扑构建效率。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与用于确定片上网络拓扑结构的方法对应的用于确定片上网络拓扑结构的装置,由于本实施例中的装置解决问题的原理与上述用于确定片上网络拓扑结构的方法相似,因此装置的介绍可以参见方法的介绍内容,重复之处不再赘述。
参照图13所示,为本公开实施例提供的一种用于确定片上网络拓扑结构的装置的架构示意图,所述装置包括获取模块1301、简化模块1302、添加模块1303。其中,获取模块1301,用于获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;简化模块1302,用于基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;添加模块1303,用于基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
一种可能的实施方式中,所述片上组件的属性信息包括该片上组件的带宽需求和/或该片上组件所能访问的地址空间范围。
一种可能的实施方式中,在所述片上组件的属性信息包括该片上组件所能访问的地址空间范围的情况下,所述简化模块1302可具体用于:对所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第一聚类结果;基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系。
进一步,在所述片上组件的属性信息还包括该片上组件的带宽需求的情况下,所述简化模块1302可具体用于:对所述第一聚类结果中,带宽需求符合第二预设条件的片上组件进行聚类,得到第二聚类结果;基于所述第二聚类结果,确定所述多个片上组件对应的第二连接关系。
一种可能的实施方式中,在所述片上组件的属性信息包括该片上组件的带宽需求的情况下,所述简化模块1302可具体用于:对带宽需求符合第二预设条件的片上组件进行聚类,得到第三聚类结果;基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系。
进一步,在所述片上组件的属性信息还包括该片上组件所能访问的地址空间范围的情况下,所述简化模块1302可具体用于:对所述第三聚类结果中,所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第四聚类结果;基于所述第四聚类结果,确定所述多个片上组件对应的第二连接关系。
一种可能的实施方式中,所述添加模块1303可具体用于:获取所述路由组件的属性信息,其中,所述属性信息包括所述路由组件的最大输入数量和最大输出数量,所述最大输入数量和所述最大输出数量用于表示所述路由组件连接的片上组件的数量;基于所述路由组件的属性信息和所述第二连接关系,确定所述路由组件的类型以及部署位置; 按照所述路由组件的类型和部署位置,在所述片上系统中添加路由组件。
一种可能的实施方式中,所述装置还包括调整模块1304,用于:针对所述拓扑结构中的任一路由组件,确定该路由组件与数据端构成的候选数据链路,所述数据端为进行数据发送的片上组件或进行数据接收的片上组件;确定所述候选数据链路完全相同的第一路由组件;基于所述第一路由组件连接的输入输出数量,对所述第一路由组件进行整合;基于整合得到的第一目标路由组件,调整所述拓扑结构。
一种可能的实施方式中,所述装置还包括分配模块1305,用于:基于拓扑结构中各路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中各路由组件的输入位宽和输出位宽;基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域。
一种可能的实施方式中,所述分配模块1305可具体用于:基于所述片上组件的带宽需求和所述片上组件对应的时钟域,确定所述片上组件的输出位宽;确定所述拓扑结构中与所述片上组件直接连接的一个或多个第二路由组件;针对每个所述第二路由组件,基于与该第二路由组件连接的第一片上组件的带宽需求,确定该第二路由组件的输入位宽,并基于所述第二路由组件的初始时钟频率和所述第一片上组件的带宽需求,确定所述第二路由组件的输出位宽;基于各所述第二路由组件的输出位宽,确定所述拓扑结构中除各所述第二路由组件外的其他路由组件的输入位宽和输出位宽。
一种可能的实施方式中,所述分配模块1305可具体用于:基于所述第一片上组件的带宽需求,确定所述第二路由组件的输入带宽;基于所述第二路由组件的初始时钟频率和所述第二路由组件的输入带宽,确定所述第二路由组件的输出位宽。
一种可能的实施方式中,所述分配模块1305可具体用于:基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域,其中,为各路由组件分配的时钟域跨时钟域的位宽之和最小。
一种可能的实施方式中,所述分配模块1305可具体用于:基于所述拓扑结构,确定至少一种候选分配组合,不同的候选分配组合用于为所述拓扑结构中各路由组件分配不同的时钟域;基于所述拓扑结构中各路由组件的输入位宽和输出位宽,确定各候选分配组合下跨时钟域的位宽之和,并将位宽之和最小的候选分配组合确定为目标分配组合,以根据目标分配组合为所述拓扑结构中各路由组件分配时钟域。
一种可能的实施方式中,所述分配模块1305可具体用于:基于所述拓扑结构,对所述拓扑结构中的片上组件和路由组件进行聚合处理;基于所述聚合处理的结果,确定至少一种候选分配组合。
一种可能的实施方式中,所述分配模块1305还可用于:基于为所述路由组件分配的时钟域对应的目标时钟频率和所述片上组件的带宽需求,重新确定所述路由组件的输入位宽和输出位宽;基于所述拓扑结构和重新确定的所述路由组件的输入位宽、输出位宽,重新确定多种候选分配组合,并确定各候选分配组合下跨时钟域的位宽之和;从重新确定的多种候选分配组合中,确定位宽之和最小的目标分配组合,并在所述目标分配组合与已分配的时钟域不同的情况下,基于所述目标分配组合返回执行重新确定所述路由组件的输入位宽和输出位宽的步骤。
一种可能的实施方式中,所述分配模块1305还用于:在返回执行的次数超过预设次数的情况下,停止执行循环过程,并发送第一报警信息。
一种可能的实施方式中,所述分配模块1305还可用于:基于各片上组件的带宽需求,对所述拓扑结构中各路由组件的输入带宽和输出带宽进行验证,并在验证不通过的情况下,发送第二报警信息。
一种可能的实施方式中,所述添加模块1303还可用于:响应目标器件添加操作指令,在所述拓扑结构中添加目标器件。其中,所述目标器件包括先入先出存储单元和/或网络速率适配器。
本公开实施例提供的用于确定片上网络拓扑结构的装置,通过对第一连接关系进行简化处理,可以使得确定片上网络拓扑结构时的效率更高;通过基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构,可以自动生成拓扑结构,从而有效提高构建网络拓扑的效率。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
基于同一技术构思,本公开实施例还提供了一种计算机设备。参照图14所示,为本公开实施例提供的计算机设备1400的结构示意图,计算机设备1400包括处理器1401、存储器1402、和总线1403。其中,存储器1402用于存储执行指令,包括内存14021和外部存储器14022。内存14021也可称为内存储器,用于暂时存放处理器1401中的运算数据以及与硬盘等外部存储器14022交换的数据,处理器1401通过内存14021与外部存储器14022进行数据交换,当计算机设备1400运行时,处理器1401与存储器1402之间通过总线1403通信,使得处理器1401在执行以下指令:获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
本公开实施例还提供了一种芯片,包括片上组件和路由组件。其中,所述路由组件和片上组件之间的网络拓扑结构,可以是基于本公开任一实施例中所述用于确定片上网络拓扑结构的方法确定的。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的用于确定片上网络拓扑结构的方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的用于确定片上网络拓扑结构的方法的步骤,具体可参见上述方法实施例,在此不再赘述。其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (22)

  1. 一种用于确定片上网络拓扑结构的方法,其特征在于,包括:
    获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;
    基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;
    基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
  2. 根据权利要求1所述的方法,其特征在于,所述片上组件的属性信息包括以下至少一个:
    该片上组件的带宽需求,或,
    该片上组件所能访问的地址空间范围。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系,包括:
    对所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第一聚类结果;
    基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第一聚类结果,确定所述多个片上组件对应的第二连接关系,包括:
    对所述第一聚类结果中,带宽需求符合第二预设条件的片上组件进行聚类,得到第二聚类结果;
    基于所述第二聚类结果,确定所述多个片上组件对应的第二连接关系。
  5. 权利要求2所述的方法,其特征在于,所述基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系,包括:
    对带宽需求符合第二预设条件的片上组件进行聚类,得到第三聚类结果;
    基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述第三聚类结果,确定所述多个片上组件对应的第二连接关系,包括:
    对所述第三聚类结果中,对所能访问的地址空间范围符合第一预设条件的片上组件进行聚类,得到第四聚类结果;
    基于所述第四聚类结果,确定所述多个片上组件对应的第二连接关系。
  7. 根据权利要求1~6任一所述的方法,其特征在于,所述基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,包括:
    获取所述路由组件的属性信息,其中,所述路由组件的属性信息包括所述路由组件的最大输入数量和最大输出数量,所述最大输入数量和所述最大输出数量用于表示所述路由组件连接的片上组件的数量;
    基于所述路由组件的属性信息和所述第二连接关系,确定所述路由组件的类型以及部署位置;
    按照所述路由组件的类型和部署位置,在所述片上系统中添加路由组件。
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述方法还包括:
    针对所述拓扑结构中的任一路由组件,确定该路由组件与数据端构成的候选数据链路,所述数据端为进行数据发送的片上组件或进行数据接收的片上组件;
    确定所述候选数据链路完全相同的多个第一路由组件;
    基于各所述第一路由组件连接的输入输出数量,对所述多个第一路由组件进行整合;
    基于整合得到的第一目标路由组件,调整所述拓扑结构。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述方法还包括:
    基于所述拓扑结构中各路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中各路由组件的输入位宽和输出位宽;
    基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述拓扑结构中各路由组件的初始时钟频率和所述片上组件的带宽需求,确定所述拓扑结构中各路由组件的输入 位宽和输出位宽,包括:
    确定所述拓扑结构中与所述片上组件直接连接的一个或多个第二路由组件;
    针对每个所述第二路由组件,
    基于与所述第二路由组件连接的第一片上组件的带宽需求,确定所述第二路由组件的输入位宽;以及,
    基于所述第二路由组件的初始时钟频率和所述第一片上组件的带宽需求,确定所述第二路由组件的输出位宽;
    基于各所述第二路由组件的输出位宽,确定所述拓扑结构中除各所述第二路由组件外的其他路由组件的输入位宽和输出位宽。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述第二路由组件的初始时钟频率和所述第一片上组件的带宽需求,确定所述第二路由组件的输出位宽,包括:
    基于所述第一片上组件的带宽需求,确定所述第二路由组件的输入带宽;
    基于所述第二路由组件的初始时钟频率和所述第二路由组件的输入带宽,确定所述第二路由组件的输出位宽。
  12. 根据权利要求9至11任一所述的方法,其特征在于,所述基于所述拓扑结构中各路由组件的输入位宽和输出位宽,为所述拓扑结构中各路由组件分配时钟域,包括:
    基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域,
    其中,为所述拓扑结构中各路由组件分配的时钟域跨时钟域的位宽之和最小。
  13. 根据权利要求12所述的方法,其特征在于,所述基于所述拓扑结构和所述拓扑结构中各路由组件的输入位宽、输出位宽,为所述拓扑结构中各路由组件分配时钟域,包括:
    基于所述拓扑结构,确定至少一种候选分配组合,不同的候选分配组合用于为所述拓扑结构中各路由组件分配不同的时钟域;
    基于所述拓扑结构中各路由组件的输入位宽和输出位宽,确定各候选分配组合下跨时钟域的位宽之和,并将位宽之和最小的候选分配组合确定为目标分配组合,以根据所述目标分配组合为所述拓扑结构中各路由组件分配时钟域。
  14. 根据权利要求13所述的方法,其特征在于,所述基于所述拓扑结构,确定至少一种候选分配组合,包括:
    基于所述拓扑结构,对所述拓扑结构中的片上组件和路由组件进行聚合处理;
    基于所述聚合处理的结果,确定至少一种候选分配组合。
  15. 根据权利要求9至14任一所述的方法,其特征在于,在为所述拓扑结构中各路由组件分配时钟域之后,所述方法还包括:
    基于为所述路由组件分配的时钟域对应的目标时钟频率和所述片上组件的带宽需求,重新确定所述路由组件的输入位宽和输出位宽;
    基于所述拓扑结构和重新确定的所述路由组件的输入位宽、输出位宽,重新确定多种候选分配组合,并确定各候选分配组合下跨时钟域的位宽之和;
    从重新确定的多种候选分配组合中,确定位宽之和最小的目标分配组合;并
    在所述目标分配组合与已分配的时钟域不同的情况下,基于所述目标分配组合,返回执行重新确定所述路由组件的输入位宽和输出位宽的步骤。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    在返回执行的次数超过预设次数的情况下,停止执行循环过程,并
    发送第一报警信息。
  17. 根据权利要求1至16任一所述的方法,其特征在于,在基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构之后,所述方法还包括:
    基于各所述片上组件的带宽需求,对所述拓扑结构中各路由组件的输入带宽和输出带宽进行验证,并
    在所述验证不通过的情况下,发送第二报警信息。
  18. 根据权利要求1至17任一所述的方法,其特征在于,所述方法还包括:
    响应目标器件添加操作指令,在所述拓扑结构中添加目标器件;
    其中,所述目标器件包括以下至少一个:先入先出存储单元或网络速率适配器。
  19. 一种芯片,包括片上组件和路由组件;其中,所述路由组件和所述片上组件之间的网络拓扑结构,为基于权利要求1至18任一项所述的用于确定片上网络拓扑结构的方法确定的。
  20. 一种用于确定片上网络拓扑结构的装置,其特征在于,包括:
    获取模块,用于获取片上系统的多个片上组件的第一连接关系和所述多个片上组件的属性信息;
    简化模块,用于基于所述多个片上组件的属性信息,对所述第一连接关系进行简化处理,得到所述多个片上组件对应的第二连接关系;
    添加模块,用于基于所述第二连接关系在所述片上系统中添加用于连接所述多个片上组件的路由组件,得到所述多个片上组件对应的拓扑结构。
  21. 一种计算机设备,包括处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当所述计算机设备运行时,所述处理器与所述存储器之间通过所述总线通信,
    其中,所述机器可读指令被所述处理器执行时执行如权利要求1至18任一项所述的用于确定片上网络拓扑结构的方法的步骤。
  22. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至18任一项所述的用于确定片上网络拓扑结构的方法的步骤。
PCT/CN2022/086325 2021-08-31 2022-04-12 用于确定片上网络拓扑结构的方法、装置及芯片 WO2023029487A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111014880.4A CN113778938B (zh) 2021-08-31 2021-08-31 片上网络拓扑结构的确定方法、装置及芯片
CN202111014880.4 2021-08-31

Publications (1)

Publication Number Publication Date
WO2023029487A1 true WO2023029487A1 (zh) 2023-03-09

Family

ID=78840279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086325 WO2023029487A1 (zh) 2021-08-31 2022-04-12 用于确定片上网络拓扑结构的方法、装置及芯片

Country Status (2)

Country Link
CN (1) CN113778938B (zh)
WO (1) WO2023029487A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778938B (zh) * 2021-08-31 2024-03-12 上海阵量智能科技有限公司 片上网络拓扑结构的确定方法、装置及芯片
CN116418682A (zh) * 2021-12-30 2023-07-11 苏州盛科通信股份有限公司 数据筛选拓扑结构的生成方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844014A (zh) * 2016-03-22 2016-08-10 广东工业大学 基于芯片设计流程和应用设计流程的片上网络编码优化方法
US20170060809A1 (en) * 2015-05-29 2017-03-02 Netspeed Systems Automatic generation of physically aware aggregation/distribution networks
US20170063626A1 (en) * 2015-06-18 2017-03-02 Netspeed Systems System and method for grouping of network on chip (noc) elements
US20210168038A1 (en) * 2019-07-22 2021-06-03 Arm Limited Network-On-Chip Topology Generation
CN112905523A (zh) * 2019-12-04 2021-06-04 北京希姆计算科技有限公司 一种芯片及核间数据传输方法
CN113778938A (zh) * 2021-08-31 2021-12-10 上海阵量智能科技有限公司 片上网络拓扑结构的确定方法、装置及芯片

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945043B (zh) * 2010-09-06 2012-03-28 华南理工大学 基于IPv6的下一代互联网拓扑发现系统及实现方法
CN102457425A (zh) * 2010-10-25 2012-05-16 北京系统工程研究所 大规模虚拟网络拓扑生成方法
CN111104775B (zh) * 2019-11-22 2023-09-15 核芯互联科技(青岛)有限公司 一种片上网络拓扑结构及其实现方法
CN113055297B (zh) * 2019-12-26 2022-09-27 中国移动通信集团天津有限公司 网络拓扑发现方法及装置
CN113194004B (zh) * 2021-05-20 2023-04-07 中国工商银行股份有限公司 网络拓扑构建方法与装置、网络变更处理方法与装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060809A1 (en) * 2015-05-29 2017-03-02 Netspeed Systems Automatic generation of physically aware aggregation/distribution networks
US20170063626A1 (en) * 2015-06-18 2017-03-02 Netspeed Systems System and method for grouping of network on chip (noc) elements
CN105844014A (zh) * 2016-03-22 2016-08-10 广东工业大学 基于芯片设计流程和应用设计流程的片上网络编码优化方法
US20210168038A1 (en) * 2019-07-22 2021-06-03 Arm Limited Network-On-Chip Topology Generation
CN112905523A (zh) * 2019-12-04 2021-06-04 北京希姆计算科技有限公司 一种芯片及核间数据传输方法
CN113778938A (zh) * 2021-08-31 2021-12-10 上海阵量智能科技有限公司 片上网络拓扑结构的确定方法、装置及芯片

Also Published As

Publication number Publication date
CN113778938B (zh) 2024-03-12
CN113778938A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
US9569579B1 (en) Automatic pipelining of NoC channels to meet timing and/or performance
US9294354B2 (en) Using multiple traffic profiles to design a network on chip
US9590813B1 (en) Supporting multicast in NoC interconnect
WO2023029487A1 (zh) 用于确定片上网络拓扑结构的方法、装置及芯片
US9471726B2 (en) System level simulation in network on chip architecture
US9785732B2 (en) Verification low power collateral generation
US9529400B1 (en) Automatic power domain and voltage domain assignment to system-on-chip agents and network-on-chip elements
US9477280B1 (en) Specification for automatic power management of network-on-chip and system-on-chip
US9160627B2 (en) Multiple heterogeneous NoC layers
US10218581B2 (en) Generation of network-on-chip layout based on user specified topological constraints
US20150036536A1 (en) AUTOMATIC NoC TOPOLOGY GENERATION
US10218580B2 (en) Generating physically aware network-on-chip design from a physical system-on-chip specification
US20140177473A1 (en) Hierarchical asymmetric mesh with virtual routers
US11023377B2 (en) Application mapping on hardened network-on-chip (NoC) of field-programmable gate array (FPGA)
US10313269B2 (en) System and method for network on chip construction through machine learning
US10298485B2 (en) Systems and methods for NoC construction
US10547514B2 (en) Automatic crossbar generation and router connections for network-on-chip (NOC) topology generation
US20180183672A1 (en) System and method for grouping of network on chip (noc) elements
US10469338B2 (en) Cost management against requirements for the generation of a NoC
US20180198682A1 (en) Strategies for NoC Construction Using Machine Learning
US9774498B2 (en) Hierarchical asymmetric mesh with virtual routers
US10084725B2 (en) Extracting features from a NoC for machine learning construction
Wang et al. A hybrid on-chip network with a low buffer requirement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862633

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE