WO2011070913A1

WO2011070913A1 - On-chip parallel processing system and communication method

Info

Publication number: WO2011070913A1
Application number: PCT/JP2010/070957
Authority: WO
Inventors: 雅規上久保
Original assignee: 日本電気株式会社
Priority date: 2009-12-07
Filing date: 2010-11-25
Publication date: 2011-06-16
Also published as: JP5673554B2; JPWO2011070913A1

Abstract

Disclosed is an on-chip parallel processing system in which a plurality of routers and a plurality of nodes are arranged on a chip, each of the plurality of nodes belongs to any of a plurality of partitions that divide the chip into a plurality of regions, each of the plurality of nodes is connected to any one of the plurality of routers by way of a communication medium and each of the plurality of routers is connected to an adjacent router by way of the communication medium, and the plurality of routers relay packets including data that is transmitted and received in communication between the plurality of nodes. First and second communication channels are set in the communication medium. Each of the plurality of routers and each of the plurality of nodes transmit and receive packets in a first communication channel in communication between nodes belonging to the same partition, and transmit and receive packets in a second communication channel in communication between nodes belonging to mutually different partitions.

Description

On-chip parallel processing system and communication method

The present invention relates to an on-chip parallel processing system and a communication method for processing a plurality of applications in parallel.

It is not far from the realization of an on-chip parallel processing system that operates on a system-on-chip due to demands for further improvement in the processing capacity of electronic devices and efforts to miniaturize the semiconductor manufacturing process. An on-chip parallel processing system is a system in which several tens or more processors are arranged on one chip.

If this is realized, the processing capability of the electronic device can be improved. Furthermore, if the operating voltage can be lowered by lowering the operating frequency of the processor, the power consumption can be reduced. Hereinafter, system-on-chip is referred to as SoC.

Currently, the bus method is used as a connection method on a chip in SoC. In the SoC, the processor is connected to a memory, a peripheral device interface, and the like by a bus.

Here, if the number of processors in the on-chip parallel processing system increases, the amount of data flowing on the chip is expected to increase by the increased number.

In the bus method, data is transmitted and received after a communication path is established between the communication transmission side and the reception side. Therefore, the processing time overhead is large. That is, the bus method is not suitable as a connection method in an on-chip parallel processing system in which a large amount of data flows on a chip.

In contrast to this bus method, a network-on-chip (hereinafter referred to as NoC) is considered as a connection method for the next generation. In the NoC architecture, the communication transmission side can continuously transmit data without waiting for the establishment of a communication path with the communication reception side. Therefore, NoC is suitable as a connection method in an on-chip parallel processing system in which a large amount of data flows on a chip.

In the NoC architecture, a chip is composed of a plurality of nodes, a plurality of routers, and a link connecting them. That is, in NoC, communication is performed via a network, and this network terminates at a node via a router and a link.

Each of the plurality of nodes includes one or more processors, memories, peripheral device interfaces, and the like. Each of the plurality of nodes includes a network interface for performing communication with a router in the network.

The network interface packetizes data transmitted from each of a plurality of nodes and transmits it to the network. In addition, it receives a packet transmitted from another node, extracts data contained in the received packet, and outputs it to a block (processor, memory, peripheral device interface, etc.) in the node.

In SoC, a plurality of applications are already mounted on one chip, and parallel processing of the plurality of applications is realized.

In order to realize parallel processing of multiple applications, it is necessary to suppress interference between multiple applications. Specifically, it is necessary for each of a plurality of applications to operate while stably satisfying a predetermined performance, regardless of a combination of a plurality of applications mounted on one chip. This is a characteristic necessary for maintaining the quality of a product using a chip. For example, even if an application performs an illegal operation due to an application defect or an act of a malicious person, it is desirable that the influence does not reach other applications as much as possible.

It is effective to provide a plurality of partitions obtained by dividing a chip into a plurality of areas as a method for suppressing interference between a plurality of applications.

In the on-chip parallel processing system by NoC, each partition includes one or more nodes and a network connecting them. At this time, one application can be executed across a plurality of partitions, and a plurality of applications can be executed in one partition.

Furthermore, a requirement for partitions is that the boundaries of partitions can be changed dynamically. This provides a degree of freedom in designing a device using the SoC and a cost reduction effect.

The fact that the partition boundary can be changed dynamically means that the partition boundary can be determined regardless of the physical arrangement of the processor, memory, etc. on the chip.

This makes it possible to cope with changes in specifications and changes in the allocation of necessary hardware resources during application design. If this is applied, the same chip can be used for a plurality of different products.

For example, Patent Document 1 discloses an on-chip parallel processing system capable of realizing such a partition.

The on-chip parallel processing system disclosed in Patent Document 1 is a system in which a plurality of nodes are connected on-chip by NoC. The network is divided into a plurality of partitions, and one or more applications are executed on the one or more partitions.

Patent Document 1 describes that in defining a partition, it is sufficient that exclusive access to a unique physical memory address space is allocated to routers and IP blocks in the partition.

Patent Document 1 describes the following.

Each router along the partition boundary checks the source address and destination address of the communication packet received by the router. It then drops the addressed packet to a network location within that partition that occurred outside that partition. It also drops packets addressed to network locations outside that partition that occurred within that partition.

Further, in the technique disclosed in Patent Document 1, a virtual channel is set. This virtual channel is realized by a network interface controller and a router. The communication type is recorded in one field in the network packet format by the network interface controller, and then transmitted to the router.

In the technique disclosed in Patent Document 1, the router includes a routing logic, a virtual channel control logic, and a virtual channel buffer. In the virtual channel control logic, the communication type assigned to the packet received by each router is examined. In order to transmit a received packet to a router adjacent to the router, the received packet is placed in an outgoing virtual channel buffer that is a buffer for transmitting a packet of the communication type.

In Patent Document 1, a communication command is exemplified as a communication type. Specifically, the communication command includes an IP inter-block network address base message, a request message, a request response message, an invalidation message directed to the cache, a memory load message, a memory store message, a pair Memory load response message.

Here, if the router receives a large number of packets in a short time, the buffer in the router may become full, and it may not be possible to store more packets in the buffer. In this case, data transfer between routers is delayed.

When the next packet arrives at the router in a state where the data transfer between the routers is delayed, either a measure to discard the delayed packet or a measure to stop only the data transfer between routers without discarding the packet One of the following is generally taken.

In NoC, if the former of the above two measures is taken, an increase in processing overhead for retransmission of discarded packets tends to be a problem. An increase in overhead is, for example, an increase in chip size or an increase in processing time. Therefore, in NoC, the latter treatment is easily taken.

In the technique disclosed in Patent Document 1, the latter measure is taken. At this time, Patent Document 1 describes that the above virtual channel is used to prevent data transfer in other communications from being delayed.

JP 2009-129447 A

As described above, in order to suppress interference between multiple applications, it is effective to divide the chip into multiple partitions. However, when the chip is divided into partitions, it must be avoided that a plurality of applications cannot operate while stably satisfying a predetermined performance.

For this purpose, it is necessary to maintain independence in the design of each of a plurality of applications and the design between the plurality of applications.

Here, the partition boundary is a reference for the boundary between a plurality of applications. Therefore, in order to maintain the above independence, it is necessary to suppress interference between a plurality of partitions.

In other words, it is desirable that one communication hardly affects the other communication in communication between nodes belonging to the same partition and communication between nodes belonging to different partitions.

In the technique disclosed in Patent Document 1 described above, the same resource is used in the router regardless of communication between nodes belonging to the same partition or communication between nodes belonging to different partitions. For this reason, resource contention between these communications is likely to occur in the router.

As a means for avoiding this, Patent Document 1 describes a method of introducing a virtual channel. However, as long as communication between nodes belonging to the same partition and communication between nodes belonging to different partitions are not distinguished in the router, they are still likely to influence each other.

As an example, consider a NoC that supports two virtual channels. At this time, it is assumed that the buffer is filled with communication data between nodes belonging to different partitions, and packet transfer on one virtual channel is temporarily disabled. Even in this case, communication between nodes belonging to the same partition can be continued on the other virtual channel. However, there is no guarantee that the other virtual channel is not occupied by communication data between nodes belonging to different partitions.

That is, in communication between nodes belonging to the same partition and communication between nodes belonging to different partitions, one communication tends to affect the other communication. Thereby, depending on the communication status of one communication, the communication speed of the other communication may be reduced.

In the technique disclosed in Patent Document 1 described above, when a packet with an invalid address is transmitted from a node, the packet is relayed by several routers, and the router at the partition boundary. It will not be destroyed until you get to.

As a result, the transfer function of the router is occupied due to unnecessary packets. In particular, when a packet with an invalid address is continuously transmitted from a certain node, hardware resources in the router on the path until the packet is discarded are continuously used. That is, the hardware resource cannot be used for other communications, and the communication speed may be reduced.

In addition, if an unauthorized communication occurs in an application executed in one partition, the data of the application executed in another partition may be leaked or altered by the unauthorized communication. . Examples of unauthorized communication include those that occur due to application defects and those that are intentionally generated by a malicious person.

As described above, when the technique disclosed in Patent Document 1 is used, there is a possibility of a decrease in communication speed, data leakage, and alteration. Accordingly, in the on-chip parallel processing system, there is a problem that each of the plurality of applications cannot operate while stably satisfying a predetermined performance.

An object of the present invention is to provide an on-chip parallel processing system and a communication method that allow each of a plurality of applications to operate while stably satisfying a predetermined performance.

In order to achieve the above object, the on-chip parallel processing system of the present invention provides:
A plurality of routers and a plurality of nodes are arranged on a chip, each of the plurality of nodes belongs to one of a plurality of partitions obtained by dividing the chip into a plurality of regions, and each of the plurality of nodes is the plurality of the plurality of nodes. Each of the plurality of routers is connected to a router adjacent to the router via a communication medium, and the plurality of routers includes data transmitted and received by communication between the plurality of nodes. An on-chip parallel processing system that relays packets,
First and second communication channels are set in the communication medium,
In the communication between nodes belonging to the same partition, each of the plurality of routers and each of the plurality of nodes transmits / receives a packet through the first communication channel, and communicates between nodes belonging to the partitions different from each other. Then, packets are transmitted and received through the second communication channel.

In order to achieve the above object, the communication method of the present invention comprises:
A plurality of routers and a plurality of nodes are arranged on a chip, each of the plurality of nodes belongs to one of a plurality of partitions obtained by dividing the chip into a plurality of regions, and each of the plurality of nodes is the plurality of the plurality of nodes. Each of the plurality of routers is connected to a router adjacent to the router via a communication medium, and the plurality of routers includes data transmitted and received by communication between the plurality of nodes. A communication method in an on-chip parallel processing system that relays packets,
In communication between nodes belonging to the same partition, each of the plurality of routers and each of the plurality of nodes, the first communication among the first and second communication channels set in the communication medium. In communication between nodes that belong to the partitions that are different from each other by transmitting and receiving packets on the channel, the communication has a transmission and reception process for transmitting and receiving packets on the second communication channel.

According to the present invention, in communication between nodes belonging to the same partition, packets are transmitted and received on the first communication channel, and in communication between nodes belonging to different partitions, packets are transmitted and received on the second communication channel. The

This makes it possible to suppress a decrease in communication speed and data leakage and falsification. Therefore, in the on-chip parallel processing system, each of a plurality of applications can operate while stably satisfying a predetermined performance.

It is a block diagram which shows the structure of one Embodiment of the on-chip parallel processing system of this invention. It is a block diagram which shows an example of a structure of the router shown in FIG. FIG. 2 is a block diagram illustrating an example of a configuration of a node illustrated in FIG. 1. FIG. 4 is a block diagram illustrating an example of a configuration of a network interface unit illustrated in FIG. 3. 5 is a flowchart for explaining an operation when the configuration register shown in FIG. 4 stores partition definition information. 5 is a flowchart for explaining an operation when a node transmits a packet including data in the on-chip parallel processing system shown in FIGS. 5 is a flowchart for explaining an operation when a node receives a packet in the on-chip parallel processing system shown in FIGS. 5 is a flowchart for explaining an operation when a router receives a packet containing data in the on-chip parallel processing system shown in FIGS.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 1 is a block diagram showing a configuration of an embodiment of an on-chip parallel processing system of the present invention.

The on-chip parallel processing system 101 of the present embodiment includes a plurality of routers 102 and a plurality of nodes 103 as shown in FIG. The plurality of routers 102 and the plurality of nodes 103 are arranged on one chip.

Further, as shown in FIG. 1, each of the plurality of nodes 103 is connected to one of the plurality of routers 102 by a link 104 which is a communication medium. Each of the plurality of routers 102 is connected to the router 102 adjacent to the router 102 by a link 104. Each of the plurality of routers 102 relays a packet including data transmitted / received by communication between the plurality of nodes 103.

Hereinafter, the router 102 to which each of the plurality of nodes 103 is connected is referred to as a connection router, and the node 103 connected to each of the plurality of routers 102 is referred to as a connection node. A router 102 adjacent to each of the plurality of routers 102 is referred to as an adjacent router.

Further, in the on-chip parallel processing system 101 of the present embodiment, as shown in FIG. 1, a plurality of partitions 10 are set by dividing the chip into a plurality of areas. Each of the plurality of nodes 103 belongs to one of the plurality of partitions 10.

In the link 104, two communication channels, a local channel that is a first communication channel and a global channel that is a second communication channel, are set. The local channel and the global channel are logical virtual channels. Although the number of communication channels set for the link 104 is two here, the number of communication channels set for the link 104 is not limited to two.

The local channel is used for relaying a packet including data transmitted / received by communication between nodes 103 belonging to the same partition 10. On the other hand, the global channel is used for relaying packets including data transmitted and received by communication between nodes 103 belonging to different partitions 10.

FIG. 2 is a block diagram showing an example of the configuration of the router 102 shown in FIG.

As shown in FIG. 2, the router 102 shown in FIG. 1 has a local buffer 201-1 as a first buffer, a global buffer 201-2 as a second buffer, a local buffer 201-1 and A packet input unit 204 connected to the global buffer 201-2, a packet output switch 202, and a routing control unit 203 are provided.

Here, since there are two communication channels, the local channel and the global channel, set for the link 104, two buffers, a local buffer 201-1 and a global buffer 201-2 are provided. Yes. However, as many buffers as the number of communication channels set in the link 104 are provided.

Further, a plurality of packet input units 204, local buffers 201-1 and global buffers 201-2 are provided in a total number of adjacent routers and connection nodes. Note that FIG. 2 shows a case where the total number of adjacent routers and connection nodes is five as an example.

The packet input unit 204 receives a packet transmitted from the adjacent router or connection node via the link 104. Then, it is determined whether the received packet is a packet transmitted on the local channel or a packet transmitted on the global channel. As a result of the determination, if the received packet is a packet transmitted through the local channel, the packet input unit 204 outputs the received packet to the local buffer 201-1. On the other hand, if the result of determination is that the received packet is a packet transmitted on the global channel, the packet input unit 204 outputs the received packet to the global buffer 201-2.

The local buffer 201-1 accepts the packet output from the packet input unit 204, and temporarily stores the accepted packet.

The global buffer 201-2 receives the packet output from the packet input unit 204, and temporarily stores the received packet.

The routing control unit 203 outputs a packet transmission instruction for transmitting a packet to the packet output switch 202 based on a predetermined timing and a switching pattern.

The packet output switch 202 includes a plurality of output units (not shown) corresponding to the total number of adjacent routers and connection nodes. Then, according to the packet transmission instruction output from the routing control unit 203, the packet output switch 202 changes the packet temporarily stored in the local buffer 201-1 and the global buffer 201-2 according to the packet destination. Send to neighboring router or connecting node. At this time, the packet output switch 202 specifically transmits the packet from the output unit corresponding to the destination of the packet. At this time, the packet temporarily stored in the local buffer 201-1 is transmitted through the local channel, and the packet temporarily stored in the global buffer 201-2 is transmitted through the global channel.

Referring to FIG. 1 again, each of the plurality of nodes 103 becomes a communication start point or end point, and communicates with a node 103 other than its own node. At that time, each of the plurality of nodes 103 transmits and receives a packet including data to and from the connection router.

FIG. 3 is a block diagram showing an example of the configuration of the node 103 shown in FIG.

As shown in FIG. 3, the node 103 shown in FIG. 1 includes a network interface unit 301, processors 302-1 to 302-n as block units, a local memory 303, and a peripheral device interface 304. Note that these components are not necessarily common to the plurality of nodes 103. For example, a node 103 that does not include the peripheral device interface 304 and a node 103 that does not include the local memory 303 may exist.

In addition, the network interface unit 301, the processors 302-1 to 302-n, the local memory 303, and the peripheral device interface 304 are connected via a shared bus 305.

The processors 302-1 to 302-n, the local memory 303, and the peripheral device interface 304 generate data, and output the generated data to the network interface unit 301 via the shared bus 305. Further, the processors 302-1 to 302-n, the local memory 303, and the peripheral device interface 304 receive data output from the network interface unit 301 via the shared bus 305.

The network interface unit 301 receives data output from the processors 302-1 to 302-n constituting the block unit via the shared bus 305. Then, a packet including the received data is generated and transmitted to the connecting router. Further, the network interface unit 301 receives a packet transmitted from the connection router and extracts data included in the received packet. Then, the extracted data is output to the processors 302-1 to 302-n via the shared bus 305 according to the destination of the extracted data.

FIG. 4 is a block diagram showing an example of the configuration of the network interface unit 301 shown in FIG.

As shown in FIG. 4, the network interface unit 301 shown in FIG. 3 includes a partition identification unit 401, a configuration register 402, a local packet generation unit 403, a global packet generation unit 404, a global packet analysis unit 405, A local packet analysis unit 406, switches 407 and 408, a data input / output unit 409, and a packet input / output unit 410 are provided.

The configuration register 402 receives partition definition information transmitted from a predetermined processor among the plurality of processors 302-1 to 302-n included in the predetermined node 103. Then, the received partition definition information is stored. Hereinafter, a processor that transmits partition definition information is referred to as a main processor.

Note that the partition definition information includes information for generating partition setting information, transmission permission information, and reception permission information, which will be described later.

Here, the operation when the configuration register 402 shown in FIG. 4 stores partition definition information will be described.

FIG. 5 is a flowchart for explaining the operation when the configuration register 402 shown in FIG. 4 stores partition definition information. The operation shown in the flowchart of FIG. 5 is executed when the on-chip parallel processing system shown in FIGS. 1 to 4 is started.

When the on-chip parallel processing system shown in FIGS. 1 to 4 is activated, first, the main processor executes the activation code of the system (step S1).

Next, the main processor changes the bit value for permitting reading from the configuration register 402 and writing to the configuration register 402 by transmitting a signal to each of the plurality of nodes 103. As a result, reading from the configuration register 402 and writing to the configuration register 402 from other than the main processor are prohibited (step S2).

Next, the main processor transmits partition definition information set for each of the plurality of nodes 103 and stored in advance in a nonvolatile memory or the like to the configuration registers 402 of the plurality of nodes 103.

The configuration register 402 that has received the partition definition information transmitted from the main processor stores the received partition definition information (step S3).

Then, the main processor returns the bit value changed in step S2 to the original value by transmitting a signal to each of the plurality of nodes 103. As a result, the prohibition of reading from the configuration register 402 from other than the main processor and writing to the configuration register 402 is released (step S4).

The partition definition information is stored in the configuration register 402 by the operation as described above. As a result, even if the program executed in each of the plurality of nodes 103 is illegal, it is possible to avoid rewriting the partition definition information with incorrect contents. As a method for recognizing the main processor, for example, a method in which a predetermined authentication code indicating that the packet is transmitted from the main processor is included in the header of a packet transmitted from the main processor.

Further, when the partition definition information is changed during the operation of the on-chip parallel processing system 101, the partition definition information can be rewritten by performing the operations in steps S2 to S4 described above. That is, the partition definition information stored in the configuration register 402 can be changed even when the on-chip parallel processing system 101 is operating.

At this time, the time required for changing the partition definition information is made shorter than the time during which the operation of the on-chip parallel processing system 101 can be interrupted. As a method for realizing this, for example, there is a method for allowing two partition definition information to be stored in the configuration register 402. In this method, it is possible to change the partition information in a very short time by acquiring the changed partition information while using one partition definition information.

Referring to FIG. 4 again, the partition identification unit 401 generates partition setting information, transmission permission information, and reception permission information based on the partition definition information stored in the configuration register 402. Then, the partition identifying unit 401 outputs the generated partition setting information to the switch 407. Further, the partition identification unit 401 outputs the generated transmission permission information to the global packet generation unit 404. In addition, the partition identification unit 401 outputs the generated reception permission information to the global packet analysis unit 405.

Here, the partition setting information is information indicating a node in the partition which is the node 103 belonging to the same partition as the own node. The transmission permission information is information indicating a transmission permission node that is the node 103 that is permitted to transmit data from the own node. The reception permission information is information indicating a reception permitted node that is the node 103 permitted as a transmission source of data received by the own node.

The data input / output unit 409 receives data output from the processors 302-1 to 302-n constituting the block unit via the shared bus 305, and outputs the received data to the switch 407.

The switch 407 receives data output from the data input / output unit 409. Then, the switch 407 includes the destination data 103 included in the received data, the destination information indicating the destination node 103 of the data, and the partition setting information output from the partition identification unit 401. It is determined whether it is an inner node. If the destination node 103 of the received data is an intra-partition node as a result of the determination, the switch 407 outputs the received data to the local packet generation unit 403. On the other hand, as a result of the determination, when the destination node 103 of the received data is not an intra-partition node, the switch 407 outputs the received data to the global packet generation unit 404.

The local packet generator 403 receives data output from the switch 407 and generates a packet including the received data. Then, the generated packet is transmitted from the packet input / output unit 410 to the connected router through the local channel.

The global packet generation unit 404 receives the data output from the switch 407. Then, the global packet generation unit 404 determines whether the destination node 103 of the received data is a transmission-permitted node from the destination information included in the received data and the transmission permission information output from the partition identification unit 401. judge. If the destination node 103 of the received data is a transmission-permitted node as a result of the determination, the global packet generation unit 404 generates a packet including the received data. Then, the generated packet is transmitted from the packet input / output unit 410 to the connection router through the global channel. On the other hand, if it is determined that the destination node 103 of the received data is not a transmission-permitted node, the global packet generation unit 404 discards the received data.

The packet input / output unit 410 receives a packet transmitted from the connected router via the link 104. Then, it is determined whether the received packet transmitted on the local channel is a packet transmitted on the global channel. As a result of the determination, if the received packet is a packet transmitted through the local channel, the packet input / output unit 410 outputs the received packet to the local packet analysis unit 406. On the other hand, as a result of the determination, if the received packet is a packet transmitted through the global channel, the received packet is output to the global packet analysis unit 405.

The local packet analysis unit 406 receives the packet output from the packet input / output unit 410. Then, the data included in the received packet is extracted, and the extracted data is output to the switch 408. As a result, the extracted data is output to the processors 302-1 to 302-n constituting the block unit.

The global packet analysis unit 405 receives the packet output from the packet input / output unit 410. Then, the global packet analysis unit 405 includes the transmission source information indicating the node 103 of the data transmission source included in the received packet and the reception permission information output from the partition identification unit 401, and It is determined whether the node 103 is a reception permitted node. As a result of the determination, if the node 103 that is the transmission source of the data is a reception-permitted node, the global packet analysis unit 405 extracts data included in the received packet. Then, the extracted data is output to the switch 408. As a result, the extracted data is output to the processors 302-1 to 302-n constituting the block unit. On the other hand, as a result of the determination, if the node 103 that is the transmission source of the data is not a reception permitted node, the global packet analysis unit 405 discards the data.

If more detailed control is required for permission of data transmission / reception, the partition definition information may include information for determining whether data can be transmitted / received for each of the processors 302-1 to 302-n included in the node 103. Good.

In addition, information for determining whether data can be transmitted / received for each task in the processors 302-1 to 302-n may be included in the partition definition information. Information for determining whether data can be transmitted / received for each task is provided, for example, to an operation system (hereinafter referred to as an OS) in which the task is executed, so that transmission / reception is rejected by the OS. Is also possible.

The operation of the on-chip parallel processing system 101 configured as described above will be described below.

First, an operation when the node 103 transmits a packet including data in the on-chip parallel processing system 101 shown in FIGS. 1 to 4 will be described.

FIG. 6 is a flowchart for explaining an operation when the node 103 transmits a packet including data in the on-chip parallel processing system 101 shown in FIGS.

First, the processors 302-1 to 302-n constituting the block unit generate data to be transmitted to the other nodes 103 (step S21).

Next, the processors 302-1 to 302-n and the like that generated the data output the generated data to the network interface unit 301 via the shared bus 305.

The data input / output unit 409 of the network interface unit 301 that has received the data output from the processors 302-1 to 302-n or the like via the shared bus 305 outputs the received data to the switch 407.

The switch 407 receives data output from the data input / output unit 409. Then, based on the destination information included in the received data and the partition setting information output from the partition identification unit 401, it is determined whether the destination node 103 of the received data is an intra-partition node (step S22).

As a result of the determination in step S22, if the destination node 103 of the received data is an intra-partition node, the switch 407 outputs the received data to the local packet generation unit 403.

The local packet generator 403 that has received the data output from the switch 407 generates a packet including the received data (step S23).

Then, the local packet generation unit 403 transmits the generated packet from the packet input / output unit 410 to the connection router through the local channel (step S24).

On the other hand, as a result of the determination in step S22, if the destination node 103 of the received data is not an intra-partition node, the switch 407 outputs the received data to the global packet generation unit 404.

The global packet generation unit 404 receives the data output from the switch 407. Then, the global packet generation unit 404 determines whether the destination node 103 of the received data is a transmission-permitted node from the destination information included in the received data and the transmission permission information output from the partition identification unit 401. Determination is made (step S25).

As a result of the determination in step S25, if the destination node 103 of the received data is a transmission-permitted node, the global packet generation unit 404 generates a packet including the received data (step S26).

Then, the global packet generation unit 404 transmits the generated packet from the packet input / output unit 410 to the connection router through the global channel (step S27).

On the other hand, as a result of the determination in step S25, if the destination node 103 of the received data is not a transmission-permitted node, the global packet generation unit 404 discards the received data (step S28).

Then, the global packet generation unit 404 notifies the discard information indicating that the received data is discarded to a predetermined node among the plurality of nodes 103 (step S29). Here, the predetermined node is a node having a processor that manages data discard information in the entire on-chip parallel processing system 101. This processor may be the same as or different from the main processor described above. As another form, in each of the plurality of nodes 103, a processor that manages the discard information is determined among the processors 302-1 to 302-n, and the discard information is managed for each of the plurality of nodes 103. Also good.

Next, the operation when the node 103 receives a packet in the on-chip parallel processing system shown in FIGS. 1 to 4 will be described.

FIG. 7 is a flowchart for explaining the operation when the node 103 receives a packet in the on-chip parallel processing system 101 shown in FIGS.

First, the packet input / output unit 410 of the network interface unit 301 receives a packet transmitted from the connected router (step S41).

Then, the packet input / output unit 410 determines whether the received packet transmitted on the local channel is a packet transmitted on the global channel (step S42).

As a result of the determination in step S42, when the received packet is a packet transmitted through the local channel, the packet input / output unit 410 outputs the received packet to the local packet analysis unit 406.

The local packet analysis unit 406 that has received the packet output from the packet input / output unit 410 extracts data included in the received packet (step S43).

Then, the local packet analysis unit 406 outputs the extracted data to the switch 408. As a result, the extracted data is output to the processors 302-1 to 302-n constituting the block unit (step S44).

On the other hand, as a result of the determination in step S42, when the received packet is transmitted through the global channel, the packet input / output unit 410 outputs the received packet to the global packet analysis unit 405.

The global packet analysis unit 405 receives the packet output from the packet input / output unit 410. Then, based on the transmission source information of the data included in the received packet and the reception permission information output from the partition identification unit 401, it is determined whether the node 103 that is the transmission source of the data included in the received packet is a reception permitted node. To do. (Step S45).

If the result of determination in step S45 is that the node 103 that is the transmission source of data contained in the received packet is a reception-permitted node, the global packet analysis unit 405 extracts data from the received packet (step S46).

Then, the global packet analysis unit 405 outputs the extracted data to the switch 408. Thereby, the extracted data is output to the processors 302-1 to 302-n constituting the block unit (step S47).

On the other hand, as a result of the determination in step S45, if the node 103 that is the transmission source of the data included in the received packet is not a reception permitted node, the global packet analysis unit 405 discards the received packet (step S48).

Then, the global packet analysis unit 405 notifies discard information indicating that the data included in the received packet is discarded to a predetermined node among the plurality of nodes 103 (step S49). In this case, as another form, in each of the plurality of nodes 103, a processor that manages the discard information is determined among the processors 302-1 to 302-n, and the discard information is managed for each of the plurality of nodes 103. It may be.

Next, the operation when the router 102 receives a packet containing data in the on-chip parallel processing system 101 shown in FIGS. 1 to 4 will be described.

FIG. 8 is a flowchart for explaining the operation when the router 102 receives a packet containing data in the on-chip parallel processing system 101 shown in FIGS.

First, the packet input unit 204 of the router 102 receives a packet transmitted from the adjacent router or connection node via the link 104 (step S61).

Then, the packet input unit 204 determines whether the received bucket has been transmitted through the local channel or the global channel (step S62).

If the result of determination in step S62 is that the received packet is a packet transmitted on the local channel, the packet input unit 204 outputs the received packet to the local buffer 201-1.

The local buffer 201-1 that has received the packet output from the packet input unit 204 temporarily stores the received packet (step S63).

On the other hand, if the result of the determination in step S62 is that the received packet is a packet transmitted on the global channel, the packet input unit 204 outputs the received packet to the global buffer 201-2.

The global buffer 201-2 that has received the packet output from the packet input unit 204 temporarily stores the received packet (step S64).

Next, in accordance with the packet transmission instruction output from the routing control unit 203, the packet output switch 202 changes the packet temporarily stored in the local buffer 201-1 and the global buffer 201-2 according to the packet destination. From the output unit to the adjacent router or connection node.

At this time, the packet output switch 202 transmits the packet temporarily stored in the local buffer 201-1 through the local channel (step S65).

Further, the packet output switch 202 transmits the packet temporarily stored in the global buffer 201-2 through the global channel (step S66).

As described above, in the present embodiment, in communication between nodes belonging to the same partition 10, packets are transmitted / received on the local channel, and in communication between nodes belonging to different partitions 10, packets are transmitted / received on the global channel. The

This makes it possible to suppress a decrease in communication speed and data leakage and falsification. Therefore, in the on-chip parallel processing system 101, each of a plurality of applications can operate while stably satisfying a predetermined performance.

Also, each of the plurality of nodes 103 discards the data without transmitting it when the destination of the data is not a transmission-permitted node. Thereby, unnecessary packets are not relayed by the plurality of routers 102, and a decrease in the communication speed of the on-chip parallel processing system 101 can be suppressed. Thereby, the stable operation | movement of an application can be made still more reliable.

Note that the on-chip parallel processing system of the present invention can be applied to a semiconductor chip that mainly controls the entire system in applications such as a mobile phone and a portable multimedia playback device. In addition, there is another semiconductor chip that performs main control in applications such as a personal computer and a server, and it can also be applied as an accelerator that executes partial processing in a subordinate position. Of course, the main semiconductor chip and the subordinate semiconductor chip do not need to be physically separated, and the on-chip parallel processing system of the present invention can be constituted by a part on the semiconductor chip.

Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2009-277406 filed on Dec. 7, 2009, the entire disclosure of which is incorporated herein.

Claims

A plurality of routers and a plurality of nodes are arranged on a chip, each of the plurality of nodes belongs to one of a plurality of partitions obtained by dividing the chip into a plurality of regions, and each of the plurality of nodes is the plurality of the plurality of nodes. Each of the plurality of routers is connected to a router adjacent to the router via a communication medium, and the plurality of routers includes data transmitted and received by communication between the plurality of nodes. An on-chip parallel processing system that relays packets,
First and second communication channels are set in the communication medium,
In the communication between nodes belonging to the same partition, each of the plurality of routers and each of the plurality of nodes transmits / receives a packet through the first communication channel, and communicates between nodes belonging to the partitions different from each other. Then, the on-chip parallel processing system which transmits / receives a packet by the said 2nd communication channel.
The on-chip parallel processing system according to claim 1,
Each of the plurality of nodes is
A block unit for generating data and outputting the generated data;
Receiving the data output from the block unit, performing a first determination to determine whether the destination of the received data is an intra-partition node belonging to the same partition as the node, and the first determination As a result, when the destination of the data is the node in the partition, the packet including the data is transmitted to the router connected to the node through the first communication channel, and the result of the first determination is If the destination of the data is not the intra-partition node, a second determination is made to determine whether the destination of the data is a transmission-permitted node that is permitted to transmit data from the node, and the second As a result of the determination, if the destination of the data is the transmission-permitted node, a packet including the data is transmitted before the connection It has a network interface unit for transmitting in the second communication channel to the router, and
Each of the plurality of routers is
A packet input unit for receiving a packet transmitted from the router or the node connected to the router;
A first buffer for storing a packet transmitted on the first communication channel among packets received by the packet input unit;
A second buffer for storing a packet transmitted by the second communication channel among the packets received by the packet input unit;
The packet stored in the first buffer is transmitted on the first communication channel to the router or the node connected to the router according to the destination of the packet, and stored in the second buffer. A packet output switch that transmits a packet to the router or the node connected to the router through the second communication channel according to a destination of the packet;
The on-chip parallel processing system according to claim 2,
The network interface unit discards the data if the destination of the data is not the transmission-permitted node as a result of the second determination.
The on-chip parallel processing system according to claim 3,
The network interface unit is an on-chip parallel processing system that notifies a predetermined node of the plurality of nodes when the data is discarded.
The on-chip parallel processing system according to any one of claims 2 to 4,
The network interface unit receives partition definition information that is transmitted from a predetermined node of the plurality of nodes and includes information for performing the first and second determinations, and includes the received partition definition information in the received partition definition information. An on-chip parallel processing system that performs the first and second determinations based on the first and second determinations.
The on-chip parallel processing system according to any one of claims 1 to 5,
The on-chip parallel processing system, wherein the first and second communication channels are logical virtual channels.
A plurality of routers and a plurality of nodes are arranged on a chip, each of the plurality of nodes belongs to one of a plurality of partitions obtained by dividing the chip into a plurality of regions, and each of the plurality of nodes is the plurality of the plurality of nodes. Each of the plurality of routers is connected to a router adjacent to the router via a communication medium, and the plurality of routers includes data transmitted and received by communication between the plurality of nodes. A communication method in an on-chip parallel processing system that relays packets,
In communication between nodes belonging to the same partition, each of the plurality of routers and each of the plurality of nodes, the first communication among the first and second communication channels set in the communication medium. A communication method including a transmission / reception process for transmitting and receiving a packet on the second communication channel in communication between nodes belonging to different partitions from each other by transmitting and receiving a packet on a channel.
The communication method according to claim 7,
The transmission / reception process includes
A process in which each of the plurality of nodes generates data;
A first determination process in which each of the plurality of nodes determines whether a destination of the generated data is an intra-partition node belonging to the same partition as the node;
As a result of the first determination process, when each of the plurality of nodes is a node in the partition, a packet including the data is transmitted to the router connected to the node. Processing to send on the communication channel;
When each of the plurality of nodes is determined as a result of the first determination processing that the destination of the data is not the intra-partition node, the destination of the data is a transmission-permitted node that is permitted to transmit data from the node. A second determination process for determining whether there is,
As a result of the second determination process, when the destination of the data is the transmission-permitted node, each of the plurality of nodes transmits the packet including the data to the router connected to the node. Processing to send on the communication channel;
Each of the plurality of routers receives a packet transmitted from the router or the node connected to the router;
Each of the plurality of routers sends a packet transmitted on the first communication channel among the received packets to the first buffer among the first and second buffers provided in the router. Process to remember,
Each of the plurality of routers stores, in the second buffer, a packet transmitted on the second communication channel among the received packets.
A process in which each of the plurality of routers transmits the packet stored in the first buffer to the router or the node connected to the router via the first communication channel according to a destination of the packet; ,
A process in which each of the plurality of routers transmits the packet stored in the second buffer to the router or the node connected to the router through the second communication channel according to the destination of the packet; , Including a communication method.