WO2012137339A1 - 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 - Google Patents
情報処理装置、並列計算機システムおよび演算処理装置の制御方法 Download PDFInfo
- Publication number
- WO2012137339A1 WO2012137339A1 PCT/JP2011/058832 JP2011058832W WO2012137339A1 WO 2012137339 A1 WO2012137339 A1 WO 2012137339A1 JP 2011058832 W JP2011058832 W JP 2011058832W WO 2012137339 A1 WO2012137339 A1 WO 2012137339A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- cache memory
- stored
- information processing
- memory
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0813—Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
Definitions
- the present invention relates to an information processing apparatus, a parallel computer system, and an arithmetic processing apparatus control method.
- a parallel computer system in which a plurality of information processing apparatuses transmit and receive data to each other and execute arithmetic processing.
- a parallel computer system in which a plurality of information processing apparatuses that do not share a memory space are connected to each other via an interconnection network.
- An information processing apparatus included in such a parallel computer system is used for computations between a main memory that is a main storage device that stores data used for computations, an arithmetic processing apparatus that performs computations, and other information processing apparatuses.
- a communication device that transmits and receives data to be transmitted.
- a communication apparatus included in such an information processing apparatus transmits / receives data related to calculation to / from another information processing apparatus via an interconnection network, and stores the received data in a main memory.
- the arithmetic processing unit operates at a higher speed than the frequency used when reading data from the main memory outside the arithmetic processing unit. Therefore, when the data used for the arithmetic is stored in the main memory, the arithmetic processing unit Arithmetic processing cannot be executed more efficiently than data stored in a cache memory inside the processing device. For this reason, the arithmetic processing unit has a cache memory that can read and write data at a higher speed than the main memory, and stores the data used for the calculation in the cache memory, so that the data at the time of the calculation is stored. Speeds up reading and efficiently executes arithmetic processing.
- the general communication apparatus when a general communication apparatus receives data from another information processing apparatus, the general communication apparatus causes the arithmetic processing apparatus to execute a series of processes related to data reception as an interrupt process for the arithmetic processing.
- the arithmetic processing unit executes a series of processes relating to data reception as an interrupt process, it saves data stored in a large number of arithmetic registers, setting registers, etc. when the process is switched, and restores the saved data. Increases communication delay.
- a parallel computer system a plurality of information processing apparatuses are interconnected so that a communication delay between the information processing apparatuses falls within a predetermined delay time.
- the arithmetic processing unit included in the parallel computer system executes arithmetic processing after waiting for reception of data transmitted from another information processing device, and repeats processing for transmitting the execution result of the arithmetic processing to the other information processing device. .
- the arithmetic processing unit executes a series of processes related to data reception as an interrupt process, and when the communication delay accompanying the process switching is increased, the efficiency of the calculation process in the parallel computer system is deteriorated. .
- the arithmetic processing device performs a polling process that repeatedly reads out the memory address in which the data is stored. Since the arithmetic processing device that performs such polling processing does not switch between processing related to data reception and arithmetic processing, communication delay is reduced and calculation processing efficiency is maintained.
- the arithmetic processing unit directly acquires the data received by the communication device without using the data reception buffer, the communication delay can be reduced as compared with the case of acquiring the data through the data reception buffer.
- the amount of data transmitted / received between the information processing apparatuses is large, it is not realistic to newly provide the arithmetic processing apparatus with a data reception buffer. For this reason, a technique for storing data received by a communication device in a cache memory of an arithmetic processing device is known.
- An information processing apparatus to which such a technique is applied directly stores data received by the communication apparatus from another information processing apparatus in a cache memory included in the arithmetic processing unit. For this reason, the arithmetic processing unit can read out the data used for the operation from the cache memory at a high speed, thereby reducing the communication delay.
- the information processing apparatus stores the received data in the cache memory when new data is received when the data used for the calculation is stored in the cache memory. Data may be discharged from the cache memory. In such a case, the information processing apparatus reads the data discharged from the cache memory from the main memory in order to execute the calculation, so that the calculation process cannot be executed efficiently and the calculation processing speed decreases. I will let you.
- the technology disclosed in the present application has been made in view of the above-described problems, and suppresses a decrease in calculation processing speed.
- the information processing apparatus constitutes a parallel computer system including a plurality of information processing apparatuses.
- the information processing apparatus uses a main storage device that holds data, a cache memory unit that holds a part of the data held in the main storage device, and data that is held in the main storage device or the cache memory unit.
- an arithmetic processing unit having an arithmetic processing unit for performing arithmetic operations.
- the information processing apparatus determines whether the data received from another information processing apparatus is data that the arithmetic processing apparatus is waiting for, and the received data is data that the arithmetic processing apparatus is waiting for. If it is determined, the communication device stores the received data in the cache memory unit. When the communication device determines that the received data is data that the arithmetic processing device has not waited for, the communication device stores the received data in the main storage device.
- the technology disclosed in the present application suppresses a decrease in calculation processing speed.
- FIG. 1 is a diagram for explaining the parallel computer system according to the first embodiment.
- FIG. 2 is a diagram for explaining an example of a memory address.
- FIG. 3 is a schematic diagram illustrating an example of a cache memory according to the first embodiment.
- FIG. 4 is a schematic diagram illustrating an example of a communication apparatus according to the first embodiment.
- FIG. 5 is a schematic diagram illustrating an example of a packet generated by the packet generation unit according to the first embodiment.
- FIG. 6 is a flowchart for explaining the flow of processing executed by the communication apparatus according to the first embodiment.
- FIG. 7 is a diagram for explaining an example of a parallel computer system according to the second embodiment.
- FIG. 8 is a diagram for explaining the communication apparatus according to the second embodiment.
- FIG. 1 is a diagram for explaining the parallel computer system according to the first embodiment.
- FIG. 2 is a diagram for explaining an example of a memory address.
- FIG. 3 is a schematic diagram illustrating an example of a cache memory according to the first
- FIG. 9 is a diagram for explaining an example of a parallel computer system according to the third embodiment.
- FIG. 10 is a diagram for explaining the parallel computer system according to the fourth embodiment.
- FIG. 11 is a schematic diagram illustrating an example of a communication apparatus according to the fourth embodiment.
- FIG. 12 is a diagram for explaining an example of the write destination address table.
- FIG. 13 is a flowchart for explaining the flow of processing executed by the communication apparatus according to the fourth embodiment.
- FIG. 14 is a schematic diagram illustrating an example of a parallel computer system according to the fifth embodiment.
- FIG. 15 is a schematic diagram illustrating an example of a communication apparatus according to the fifth embodiment.
- FIG. 16 is a diagram for explaining the parallel computer system according to the third embodiment.
- FIG. 1 is a diagram for explaining the parallel computer system according to the first embodiment.
- the parallel computer system 1 has a plurality of information processing apparatuses 2, 2a and a bus 8 for connecting the information processing apparatuses 2, 2a.
- the information processing apparatuses 2 and 2 a are described, but the parallel computer system 1 may include a larger number of information processing apparatuses.
- the parallel computer system 1 may have an arbitrary number of information processing apparatuses.
- the information processing apparatus 2a performs the same processing as the information processing apparatus 2, and the description thereof is omitted.
- the information processing device 2 includes a processor 3, a memory 6, and a communication device 10.
- the processor 3, the memory 6, and the communication device 10 are each connected by a bus included in the information processing device 2.
- the processor 3 is an arithmetic processing device that executes arithmetic processing.
- the processor 3 has a processor core 4 that performs operations.
- the processor core 4 has a cache memory 5.
- the memory 6 is a main memory included in the information processing apparatus 2 and holds data used by the processor core 4 for calculation.
- FIG. 2 is a diagram for explaining an example of a memory address.
- the memory address indicating the storage area of the memory 6 is, for example, a 40-bit memory address when the processor 3 has a 40-bit memory address space.
- the upper “34-N” bit in the range shown in FIG. 2A is used as a tag
- the lower “6” bits in the range shown in FIG. 2C can be divided as an offset.
- an arbitrary number of bits can be set as the size of the index, and in the following description, “N” bits are set as the index.
- the cache memory 5 is a storage device that stores data used by the processor core for calculation, and can input and output data at a higher speed than the memory 6.
- FIG. 3 is a schematic diagram illustrating an example of a cache memory according to the first embodiment.
- the cache memory 5 is a cache memory having 2 N cache lines with a line size of 64 bytes and an overall storage capacity of 2 N + 6 bytes.
- cache line numbers “0” to “2 N ⁇ 1” are assigned to the respective cache lines.
- Each cache line has 2-bit status data indicated by (A) in FIG. 3, “34-N” -bit tag data indicated by (B) in FIG. 3, and 64 data indicated by (C) in FIG.
- Byte data is stored.
- one bit of the status data is a Valid bit indicating whether or not the data stored in the corresponding cache line is valid. For example, when “1” is stored in the Valid bit, it indicates that the data stored in the corresponding cache line is valid, and when “0” is stored in the Valid bit, Indicates that the data stored in the cache line is invalid.
- the other bit of the status data is a Dirty bit that is information for maintaining the identity between the data stored in the corresponding cache line and the data stored in the memory 6. For example, when “1” is stored in the Dirty bit, it indicates that the data stored in the corresponding cache line has been updated by the processor core 4 and therefore the data needs to be written back to the memory 6. . When “0” is stored in the Dirty bit, the data stored in the corresponding cache line is not updated by the processor core 4 and is stored in the memory 6 and the data stored in the cache line. Indicates that it is the same as the data. For example, the data stored in the cache line having the Valid bit “1” and the Dirty bit “1” is valid data, and is stored in the memory 6 that is the cache source for rewriting by the processor core 4. It is not the same as the stored data.
- the cache memory 5 employs a direct map method, and when the data in the memory 6 is cached, the cache memory 5 is stored in a cache line corresponding to the index of the memory address where the cache source data is stored. For example, if the index of the memory address where the cache source data is stored is “i”, the cache memory 5 stores it in the cache line with the cache line number “i”.
- the cache memory 5 may employ a set associative method having a plurality of cache ways.
- the processor core 4 is an arithmetic processing unit that performs an operation using data. Specifically, the processor core 4 executes arithmetic processing using data stored in the memory 6 or the cache memory 5. Further, the processor core 4 acquires data stored in the memory 6 and stores the acquired data in the cache memory 5. That is, the processor core 4 holds the data stored in the memory 6 in the cache memory 5. Then, the processor core 4 executes arithmetic processing using the data stored in the cache memory 5.
- the communication device 10 receives the data transmitted from the other arithmetic processing device. Wait until That is, the processor core 4 executes a polling process for waiting for data of calculation results from other information processing measures. Then, when the communication device 10 receives data to be subjected to polling processing, the processor core 4 acquires the received data and stores the acquired data in the cache memory 5 and the memory 6.
- the processor core 4 executes the following processing. That is, the processor core 4 stores the received data in the memory 6 and stores the received data in the cache memory 5. That is, when the processor core 4 receives data from the communication device 10 as data to be stored in the cache memory 5, the processor core 4 stores the received data in the cache memory 5 and the memory 6.
- the processor core 4 when the processor core 4 stores the received data in the cache memory 5, the information for maintaining the identity between the data stored in the cache memory 5 and the data stored in the memory 6. Refer to Then, the processor core 4 discharges the data stored in the cache memory 5 to the memory 6 based on the information for maintaining the referenced identity, and then stores the received data in the cache memory 5. Thereafter, the processor core 4 executes arithmetic processing using the data stored in the cache memory 5, that is, data to be polled.
- the processor core 4 transmits the calculated data to another information processing apparatus as a result of the calculation, the processor core 4 transmits information indicating the information processing apparatus of the transmission destination and the calculated data to the communication apparatus 10. At this time, the processor core 4 determines whether or not the calculated data is data waiting for a processor core included in another information processing apparatus. If the processor core 4 determines that the calculated data is data that the processor core of another information processing apparatus is waiting for, the processor core 4 notifies the communication apparatus 10 that the processor core is waiting for data. Send.
- the processor core 4 executes the following processing. That is, the processor core 4 refers to the valid bit and the dirty bit that are the state data of the cache line corresponding to the index of the memory address storing the data. Then, if the referenced Valid bit is “1” and the Dirty bit is “1”, the processor core 4 executes the following processing.
- the processor core 4 uses the cached data stored in the cache memory 5 to update the cache source data stored in the memory 6 to the latest data. Then, the processor core 4 updates the referenced Valid bit from “1” to “0”, and then stores the data received from the communication device 10 in the memory address of the memory 6 received together with the data.
- the processor core 4 updates the Valid bit to “0” when the referenced Valid bit is “0”, or when the Valid bit is “1” and the Dirty bit is “0”.
- the received data is stored in the memory 6.
- the processor core 4 receives data from the communication device 10 as data stored in the cache memory 5 .
- the processor core 4 refers to the Valid bit, Dirty bit, and tag data of the cache line that stores the received data. That is, the processor core 4 refers to the Valid bit and Dirty bit of the cache line corresponding to the index of the memory address storing the received data.
- the processor core 4 stores the received data in the memory 6 when the referenced Valid bit is “0” or the referenced tag data does not match the received tag data of the memory address.
- the processor core 4 executes the following process when the referenced Valid bit is “1” and the referenced tag data matches the tag of the memory address storing the received data. That is, the processor core 4 stores the data received from the communication device 10 in the cache line that refers to the status data and the tag data.
- the communication device 10 receives the packetized data from another information processing device such as the information processing device 2 a via the bus 8.
- the packetized data stores data and the memory address of the memory 6 that stores the data.
- the communication apparatus 10 determines whether the received data is data that the processor core 4 is waiting for.
- the communication device 10 determines that the received data is data that the processor core 4 is waiting for, the communication device 10 transmits the data to the processor core 4 as data stored in the cache memory 5. That is, the communication device 10 stores the received data in the cache memory 5 and the memory 6 by transmitting the data to the processor core 4 as data stored in the cache memory 5.
- the communication device 10 determines that the received data is not the data that the processor core 4 is waiting for, the communication device 10 transmits the data to the processor core 4 as data stored in the memory 6. That is, the communication device 10 stores the received data in the main memory 6 by transmitting the data to the processor core 4 as data to be stored in the memory 6.
- the communication device 10 when the communication device 10 receives the data and the information indicating the other information processing device 2 a of the transmission destination from the processor core 4, the communication device 10 packetizes the received data and transmits the other data of the transmission destination via the bus 8. The packet is transmitted to the information processing apparatus 2a.
- the communication device 10 receives from the processor core 4 that the data of the processor included in the other information processing device 2a is waiting, the communication device 10 packetizes the received data and is subject to polling processing. Is added to the packet. And the communication apparatus 10 transmits the packet which added control information to the other information processing apparatus 2a of a transmission destination.
- FIG. 4 is a schematic diagram illustrating an example of a communication apparatus according to the first embodiment.
- the communication device 10 includes a packet generation unit 11, a packet transmission unit 12, a packet reception unit 13, a determination unit 14, and a storage unit 15.
- the packet generation unit 11 executes the following processing when transmitting data that the processor core of another information processing apparatus 2a is waiting for. That is, the packet generation unit 11 packetizes the data to be transmitted, and stores the memory address of the memory included in the other information processing apparatus 2a of the transmission destination and the other information processing apparatus 2a of the transmission destination in the packet. . Further, the packet generator 11 adds control information to the effect of writing to the cache memory of the processor core of the other information processing apparatus 2a to the packet. Then, the packet generation unit 11 transmits the generated packet to the packet transmission unit 12.
- FIG. 5 is a diagram for explaining an example of a packet generated by the packet generation unit according to the first embodiment.
- the conventional packet has a header portion in which an address indicating another information processing apparatus 2a of the transmission destination is stored, and a data portion in which data is stored.
- the packet generator 11 generates a packet shown in FIG. Specifically, as shown in FIG. 5C, the packet generator 11 adds a 1-bit flag area for storing control information between the header and data of the packet.
- the packet generator 11 stores “1” in the flag area as control information when the data to be transmitted is data that the processor core of the other information processing apparatus 2a of the transmission destination waits. Further, when the data to be transmitted is not the data that the processor core of the transmission destination information processing apparatus waits for, the packet generator 11 stores “0” in the flag area as control information. It is assumed that the packet storing the control information described above generates not only the information processing apparatus 2 but also other information processing apparatuses such as the information processing apparatus 2a.
- the packet transmitter 12 when receiving the packet generated by the packet generator 11, the packet transmitter 12 receives another information processing apparatus 2a via the bus 8 as shown in FIG. To another information processing apparatus as a transmission destination.
- the packet receiving unit 13 When the packet receiving unit 13 receives a packet via the bus 8 as shown in FIG. 4C, the packet receiving unit 13 transfers the received packet to the determining unit 14. The determination unit 14 determines whether “1” is stored in the flag area of the received packet.
- the determination unit 14 determines that the data stored in the packet is data stored in the cache memory 5. Further, when “0” is stored in the flag area of the packet, the determination unit 14 determines that the data stored in the packet is data stored in the memory 6. Thereafter, the determination unit 14 transmits the determined content and the data stored in the packet to the storage unit 15.
- the storage unit 15 executes the following processing. That is, the storage unit 15 transmits the data stored in the packet to the processor core 4 as data to be stored in the cache memory 5 and the memory 6 as shown in FIG. In addition, when the determination unit 14 determines that the data stored in the packet is data to be stored in the memory 6, the storage unit 15 uses the processor core as data to be stored in the memory 6. 4 to send.
- the storage unit 15 when “1” is stored in the flag area of the packet, the storage unit 15 sends the data received from the determination unit 14 and a notification that the data is stored in the cache memory 5 to the processor core. 4 to send.
- the storage unit 15 transmits data to the processor core 4 when “0” is stored in the flag area of the packet. That is, the storage unit 15 transmits data received as data to be stored in the memory 6 to the processor core 4.
- the communication device 10 executes the following processing when transmitting data that is waiting for a processor core of another information processing device 2a, that is, data to be polled. That is, the communication device 10 stores “1” as control information in the flag area of the packet to be transmitted, and transmits the control information to the other information processing device 2a of the transmission destination.
- a communication apparatus included in another information processing apparatus such as the information processing apparatus 2a also transmits a packet storing “1” in the flag area when transmitting data that the processor core 4 is waiting for.
- the communication device 10 transmits the data stored in the received packet to the processor core 4 as data to be written in the cache memory 5.
- the processor core 4 caches the data received from the communication device 10 when the data stored in the memory address to which the received data is to be written among the data stored in the memory 6 is cached in the cache memory 5. Cache in memory 5. For this reason, the processor core 4 can read the waiting data from the cache memory 5 instead of the memory 6, and therefore can efficiently execute the arithmetic processing.
- the communication device 10 transmits the data stored in the received packet to the processor core 4 as data to be written in the memory 6.
- the processor core 4 writes data into the memory 6. That is, the information processing apparatus 2 stores the data that the processor core 4 is waiting for in the cache memory 5 of the processor core 4 only when the data received from the other information processing apparatus 2a is received, and stores other data in the memory 6 Store. For this reason, as a result of preventing the data used for the calculation stored in the cache memory 5 from being discharged by the data that is unknown to be used for the calculation, the parallel computer system 1 can prevent the calculation processing speed from being lowered. .
- the processor 3, the processor core 4, the packet generation unit 11, the packet transmission unit 12, the packet reception unit 13, the determination unit 14, and the storage unit 15 are realized by an integrated circuit such as an LSI (Large Scale Integrated circuit).
- the memory 6 is a semiconductor memory device such as a RAM (Random Access Memory), a ROM (Read Only Memory), and a flash memory.
- the cache memory 5 is an internal memory of the processor core 4.
- FIG. 6 is a flowchart for explaining the flow of processing executed by the communication apparatus according to the first embodiment.
- the communication device 10 starts processing with a reception of a packet whose destination is the information processing device 2 via the bus 8 as a trigger.
- the communication apparatus 10 determines whether “1” is stored in the flag area of the received packet (step S101). When “1” is stored in the flag area of the received packet (Yes in step S101), the communication device 10 uses the data stored in the received packet as a data to be stored in the cache memory 5 as a processor. It transmits to the core 4 (step S102). On the other hand, if “0” is stored in the flag area of the received packet (No at step S101), the communication device 10 uses the processor core as data to be stored in the memory 6 as data stored in the received packet. 4 (step S103). Thereafter, the communication device 10 ends the process.
- the information processing apparatus 2 prevents the data stored in the cache memory 5 from being discharged when the processor core 4 receives data that is not used for calculation.
- the processor core of each information processing device 2, 2a can execute efficient processing using data stored in the cache memory, thereby preventing a reduction in calculation processing speed. can do.
- Each information processing device 2, 2 a stores control information indicating that the processor core is waiting in a packet to be transmitted when transmitting data that the processor core of another information processing device is waiting for. .
- Each information processing device 2, 2 a stores the data of the received packet in the cache memory 5 when the control information is stored in the received packet.
- Each information processing device 2, 2 a stores the data of the received packet in the memory 6 when the control information is not stored in the received packet. Therefore, each of the information processing apparatuses 2 and 2a can easily determine whether or not the received data should be stored in the cache memory.
- FIG. 7 is a diagram for explaining an example of a parallel computer system according to the second embodiment.
- the parallel computer system 1 a includes a plurality of information processing devices 2 b and 2 c. Note that the information processing apparatus 2c performs the same processing as the information processing apparatus 2b, and a description thereof will be omitted.
- the information processing apparatus 2b has a processor 3a. Further, the processor 3a has a plurality of processor cores 4a to 4c. An arbitrary number of processor cores in the processor 3a can be applied.
- the processor cores 4a to 4c have cache memories 5a to 5c, respectively.
- the processor cores 4b and 4c are assumed to perform the same functions as the processor core 4a, and the description thereof is omitted.
- the cache memories 5b and 5c are assumed to perform the same functions as the cache memory 5a, and the description thereof is omitted.
- the cache memory 5a has a plurality of cache lines for storing state data, tag data, and data.
- the cache memory 5 stores the identity information indicating the relationship between the data stored in the same cache line as the tag information, the data stored in the other cache memories 5b and 5c, and the data stored in the memory 6. Is done.
- the cache memory 5 stores information indicating the state of the cache line based on the MESI protocol (Illinois protocol) as the state data. Specifically, when “M: Modify” is stored in the state data, the data stored in the same cache line is exclusively cached, and the cached data is updated by the processor core 4a. Indicates that the status has been updated.
- MESI protocol Illinois protocol
- the processor core 4a stores the received data in the memory 6 when the communication device 10a receives data that the processor cores 4a to 4c have not waited for. In addition, the processor core 4a holds in the cache memory 5a the data received by the communication device 10a that is waiting for itself and stored in the memory address of the memory 6 where the received data is stored. If so, the following processing is executed.
- the processor core 4a determines whether or not the data held in the cache memory 5a is cached by the other processor cores 4b and 4c. If it is determined that the data is held, the processor core 4a stores the received data in the memory 6 To store. When the processor core 4a determines that the data held in the cache memory 5a is not cached by the other processor cores 4b and 4c, the processor core 4a stores the received data only in the cache memory 5a.
- the processor core 4a executes the following processing. That is, the processor core 4a refers to the cache line status data corresponding to the index of the memory address of the memory 6 storing the received data in the cache memory 5a.
- the processor core 4a writes the data stored in the same cache line as the referenced state data to the memory 6.
- the processor core 4a updates the referenced state data from “M” to “I”. Thereafter, the processor core 4 a stores the received data in the memory 6. If the referenced state data is other than “M”, the processor core 4 a updates the referenced state data to “I” and stores the received data in the memory 6.
- the processor core 4a executes the following processing. That is, the processor core 4a refers to the cache line state data and the tag data corresponding to the memory address index of the memory 6 storing the received data in the cache memory 5a.
- the processor core 4a receives the received data when “I” is stored in the referenced status data, or when the tag of the memory address storing the received data does not match the referenced tag data. Is stored in the memory 6.
- the processor core 4a stores the received data in the memory 6 when "S” is stored in the referenced status data, and the tag of the memory address storing the received data matches the referenced tag data. And the state data referred to is updated from “S” to “I”.
- the processor core 4a performs the following processing when the referenced status data is “M” or “E” and the tag of the memory address storing the received data matches the referenced tag data. Execute. That is, the processor core 4a stores the received data in the cache memory 5a, and updates the referenced state data to “M”.
- each of the processor cores 4a to 4c receives data from the communication device 10a as data to be stored in the cache memory, the cache line state data and tag data corresponding to the index of the memory address for storing the received data And refer to.
- Each of the processor cores 4a to 4c receives the received data from the referenced status data and the tag data, and is stored in the memory address for storing the received data. It is determined whether or not the data stored in its own cache memory. That is, each of the processor cores 4a to 4c determines whether or not polling processing is performed on its own cache memory.
- the communication device 10a When the communication device 10a receives a packet from another information processing device such as the information processing device 2c, the communication device 10a determines whether “1” is stored in the flag area of the received packet. Then, when “1” is stored in the flag area of the received packet, the communication device 10a executes the following processing. That is, the communication device 10a transmits the received data as data to be held in the cache memory to the processor cores 4a to 4c.
- the communication device 10a transmits the received data to the processor cores 4a to 4c as data to be stored in the memory 6.
- FIG. 8 is a diagram for explaining the communication apparatus according to the second embodiment.
- the communication device 10a includes a packet generation unit 11, a packet transmission unit 12, a packet reception unit 13, a determination unit 14, and a storage unit 15a.
- the storage unit 15a receives the data received as data to be cached in the cache memory as illustrated in FIG. Is transmitted to each of the processor cores 4a to 4c. That is, when the determination unit 14 determines that “1” is stored in the flag area of the packet, the storage unit 15a holds the data stored in the memory address of the memory 6 that stores the received data. The received data is stored in the cache memory.
- the storage unit 15a receives the data stored in the memory 6 as illustrated in FIG.
- the transmitted data is transmitted to each of the processor cores 4a to 4c. That is, the storage unit 15 a stores the received data in the memory 6 when the determination unit 14 determines that “0” is stored in the flag area of the packet.
- the information processing apparatus 2b includes the processor 3a having the plurality of processor cores 4a to 4c. Each of the processor cores 4a to 4c has cache memories 5a to 5c, respectively.
- the information processing device 2b determines whether “1” is stored as control information in the flag area of the received packet. To do.
- the information processing device 2b stores the data stored in the memory address of the memory 6 that stores the received packet data in the cache memory that caches the data. Stores received packet data. Further, when “0” is stored as the control information, the information processing apparatus 2 b stores the received packet data in the memory 6.
- the information processing apparatus 2b when the information processing apparatus 2b receives data to be polled by the processor cores 4a to 4c, the information processing apparatus 2b can directly store the received data in the cache memories 5a to 5c. Therefore, the information processing apparatus 2b can efficiently perform the arithmetic processing executed by the processor 3a even when the processor 3a is a multi-core processor having a plurality of processor cores 4a to 4c.
- the information processing apparatus 2b when the information processing apparatus 2b receives data that is not subject to polling processing by the processor cores 4a to 4c, the information processing apparatus 2b stores the received data in the memory 6. For this reason, the information processing device 2b can prevent the data used by the processor cores 4a to 4c from being output from the cache memories 5a to 5c by the received data. Therefore, the parallel computer system 1a can cause the processor 3a to execute efficient arithmetic processing without reducing the calculation processing speed.
- each cache memory 5a to 5c stores identity information indicating the relationship between the data held in itself, the data held in the other cache memories 5a to 5c, and the data stored in the memory 6.
- identity information indicating the relationship between the data held in itself, the data held in the other cache memories 5a to 5c, and the data stored in the memory 6.
- each of the processor cores 4a to 4c receives data as data to be stored in the cache memory, each of the processor cores 4a to 4c includes the data cached in its own cache memory 5a to 5c and the data at the memory address for storing the received data. Each piece of identity information is retained.
- each of the processor cores 4a to 4c maintains the identity of the cached data and the data stored in the memory 6 based on the identity information stored in its own cache memory 5a to 5c. Thereafter, each of the processor cores 4a to 4c stores the data received from the communication device 10 in the cache memories 5a to 5c. For this reason, the parallel computer system 1a retains data coherency even when each of the processor cores 4a to 4c has its own cache memory 5a to 5c, and executes appropriate arithmetic processing on each of the information processing devices 2b and 2c. Can be made.
- FIG. 9 is a diagram for explaining an example of a parallel computer system according to the third embodiment.
- the parallel computer system 1b includes a plurality of information processing apparatuses 2e and 2f. Note that the information processing device 2f performs the same processing as the information processing device 2e, and a description thereof will be omitted.
- the information processing apparatus 2e has a processor 3b.
- the processor 3b includes a plurality of processor cores 4d to 4f and a level 2 cache memory 7 shared by the processor cores 4d to 4f.
- Each of the processor cores 4d to 4f has level 1 cache memories 5d to 5f, respectively. Note that the processor cores 4e and 4f exhibit the same functions as the processor core 4d, and a description thereof will be omitted.
- Each level 1 cache memory 5d to 5f has a plurality of cache lines having a line size of 64 bytes.
- Each level 1 cache memory 5d to 5f has “2 N1 ” cache lines, and “2” bits of status data, “34-N 1 ” bits of tag data, and 64 bytes of data are stored in each cache line. It is a cache memory of “2 N1 + 6 ” bytes to be stored.
- N 1 is the size of the index that each level 1 cache memory 5d to 5f associates with its own cache line.
- the information stored in the cache lines of the level 1 cache memories 5d to 5f is the same information as the information stored in the cache lines of the cache memories 5a to 5c.
- the state data stored in each cache line of each level 1 cache memory 5d to 6f is identity information indicating the following identity. That is, the status data indicates the identity of data stored in the same cache line, data stored in the other level 1 cache memories 5d to 5f, and data stored in the level 2 cache memory 7.
- the level 2 cache memory 7 has “2 N2 ” cache lines each storing 64-bit data.
- the level 2 cache memory 7 is a “2 N2 + 6 ” byte cache memory that stores “2” bit status data, “34-N 2 ” bit tag data, and 64 byte data in each cache line.
- N 2 is the size of the index that the level 2 cache memory 7 associates with its own cache line.
- the state data stored in each cache line of the level 2 cache memory 7 indicates the identity between the data stored in the same cache line and the data stored in the memory 6.
- the level 1 cache memories 5d to 5f and the level 2 cache memory 7 are direct map type cache memories. For example, when each of the level 1 cache memories 5d to 5f and the level 2 cache memory 7 holds the data stored in the memory address “i” of the memory 6, the cache line whose cache line number is “i” is stored. The data shall be held in
- the processor core 4d stores the received data in the memory 6 when the communication device 10a receives data that the processor cores 4d to 4f are not waiting for. Further, the processor core 4d executes the following process when the communication device 10a receives the data that the processor core 4d is waiting for. That is, the processor core 4d determines whether the data stored in the storage area indicated by the memory address storing the received data in the memory 6 is cached in the level 1 cache memory 5d as the primary cache memory.
- the processor core 4d determines that the data stored in the storage area of the memory 6 indicated by the memory address is held in the level 1 cache memory 5d, the received data is stored in the level 1 cache memory 5d. Cache. If the processor core 4d determines that the data stored in the storage area of the memory 6 indicated by the memory address is not held in the level 1 cache memory 5d, the processor core 4d stores the received data in the memory 6.
- the processor core 4d when the processor core 4d receives data from the communication device 10a as data to be stored in the memory 6, the processor core 4d performs the following processing. That is, the processor core 4d refers to the state data stored in the cache line corresponding to the index of the memory address in which the received data is stored in the level 1 cache memory 5d. If the referenced state data is “M”, the processor core 4d writes the data in the same cache line as the referenced state data to the memory 6, and changes the referenced state data from “M” to “I”. Update. Then, the processor core 4 d stores the received data in the memory 6.
- the processor core 4d executes the following processing. That is, the processor core 4d refers to the status data and the tag data among the cache lines corresponding to the memory address index for storing the received data. Then, the processor core 4d stores the received data in the memory when “I” is stored in the referenced state data or when the referenced tag data is different from the tag of the memory address storing the received data. 6 is stored.
- the processor core 4d stores “S” in the referenced status data, and if the referenced tag data matches the tag of the memory address storing the received data, the received data is stored in the memory 6.
- the stored state data is updated from “S” to “I”.
- the processor core 4d performs the following processing. Execute. That is, the processor core 4d stores the received data in its own cache memory, that is, the level 1 cache memory 5d, and updates the referenced state data to “M”. At this time, the processor core 4 d does not store the received data in the memory 6.
- the processor core 4 d When the received data is stored in the memory 6, the processor core 4 d performs a process of holding data coherency that is data consistency between the cache memories 5 a to 5 c and the main memory 6 before writing. Execute. That is, the processor core 4d refers to the state data and tag data of the cache line corresponding to the index of the memory address storing the received data in the level 2 cache memory 7. If the referenced state data is “M” and the tag data matches the tag of the memory address storing the received data, the processor core 4d is stored in the same cache line as the referenced state data. The stored data is stored in the memory 6. Thereafter, the processor core 4 d updates the referenced state data from “M” to “I”, and further stores the received data in the memory 6.
- the processor core 4d receives the received data as data to be stored in the level 1 cache memory 5d when the communication device 10a receives the information that any of the processor cores 4d to 4f is subject to polling processing. .
- the processor core 4d determines whether or not the data stored in the storage area of the memory 6 that stores the received data is cached exclusively in the level 1 cache memory 5d. If the data stored in the storage area of the memory 6 storing the received data is exclusively cached in the level 1 cache memory 5d, the processor core 4d stores the received data in the level 1 cache memory. Cache to 5d.
- the processor core 4d has cached data at a memory address where data to be polled is stored. For this reason, when the information processing device 2e receives the data that the processor core 4d is waiting for by the communication device 10a, the information processing device 2e stores the received data not in the memory 6 but in the level 1 cache memory 5d. As a result, the parallel computer system 1b can cause the information processing apparatuses 2e and 2f to execute processing efficiently.
- the information processing apparatus 2e includes the processor 3b having the plurality of processor cores 4d to 4f. Each of the processor cores 4d to 4f has level 1 cache memories 5d to 5f, respectively.
- the information processing device 2e receives a packet from another information processing device such as the information processing device 2f, the information processing device 2e determines whether or not “1” is stored as control information in the flag area of the received packet. To do.
- the information processing device 2e When “1” is stored as the control information, the information processing device 2e stores the received data in the level 1 cache memories 5d to 5f that cache the data at the memory address storing the received data. Store. Further, when “0” is stored as the control information, the information processing device 2 e stores the received packet data in the memory 6.
- the information processing apparatus 2e when the information processing apparatus 2e receives data to be polled by the processor cores 4d to 4f, the information processing apparatus 2e can directly store the received data in the level 1 cache memories 5d to 5f. For this reason, the information processing device 2e can efficiently perform the arithmetic processing executed by the processor 3b.
- the information processing apparatus 2e when the information processing apparatus 2e receives data that is not subject to polling by the processor cores 4d to 4f, the information processing apparatus 2e stores the received data in the memory 6. For this reason, the information processing device 2e can prevent the data used by the processor cores 4d to 4f from being processed from the level 1 cache memories 5d to 5f by the received data. For this reason, the parallel computer system 1b can perform efficient arithmetic processing without reducing the calculation processing speed.
- Each of the level 1 cache memories 5d to 5f indicates the relationship between the data cached therein, the data stored in the other level 1 cache memories 5d to 5f, and the data stored in the level 2 cache memory 7.
- First identity information which is data is stored.
- the level 2 cache memory 7 stores second identity information, which is state data indicating the relationship between the data cached in itself and the data cached in the memory 6.
- Each of the processor cores 4d to 4f has the data stored in the level 1 cache memories 5d to 5f, the data stored in the level 2 cache memory 7 and the memory based on the first identity information and the second identity information. 6 holds the identity with the data stored in 6.
- the parallel computer system 1b appropriately maintains the data identity even when the processor cores 4d to 4f have their own level 1 cache memories 5d to 5f and share the level 2 cache memory 7. Therefore, appropriate arithmetic processing can be executed.
- FIG. 10 is a diagram for explaining the parallel computer system according to the fourth embodiment.
- the parallel computer system 1c includes a plurality of information processing apparatuses 2f and 2g. Although omitted in FIG. 10, the parallel computer system 1c may include a larger number of information processing apparatuses.
- the information processing device 2g executes the same processing as the information processing device 2f, and the following description is omitted. Also, components that execute the same processes as those in the first to fourth embodiments are denoted by the same reference numerals, and the following description is omitted.
- the information processing device 2f includes a processor 3c, a memory 6, and a communication device 10b.
- the processor 3c has a processor core 4g.
- the processor core 4 g has a cache memory 5.
- the communication device 10 b has a write destination address table 16.
- the processor core 4g executes the same processing as the processor core 4 according to the first embodiment. That is, when the processor core 4 g receives data from the communication device 10 b as data to be stored in the memory 6, the processor core 4 g performs the same processing as the processor core 4 and stores the received data in the memory 6. When the processor core 4 g receives data from the communication device 10 b as data to be stored in the cache memory 5, the processor core 4 g performs the same processing as the processor core 4 and stores the received data in the cache memory 5. Therefore, the description of the process in which the processor core 4g stores data in the cache memory 5 or the memory 6 is omitted.
- the processor core 4g when executing the polling process for waiting for received data, the processor core 4g immediately stores the memory address of the memory 6 for storing the data to be polled in the write destination address table 16 of the communication device 10b. sign up. Specifically, the processor core 4g transmits the memory address of the memory 6 that stores data to be polled to the communication device 10b, and writes the transmitted memory address to the update unit 17 of the communication device 10b described later. It is stored in the destination address table 16.
- the processor core 4g receives data to be polled from the communication device 10c, and stores the data to be polled in the cache memory 5 when the received data is stored in the cache memory 5. This is notified to the communication device 10c. For example, the processor core 4g transmits to the communication device 10b the memory address of the memory 6 storing the data stored in the cache memory 5 together with the fact that the data to be polled is stored in the cache memory 5.
- the communication device 10 b includes a write destination address table 16 that holds a control address for controlling writing of data to the cache memory 5. Then, when the write destination address of the data received from another information processing apparatus such as the information processing apparatus 2g matches the control address held in the write destination address table 16, the communication apparatus 10b performs the following processing. Execute. In other words, the communication device 10b transmits the received data to the processor core 4g as data to be stored in the cache memory 5.
- the communication device 10b transmits the received data to the processor core 4g as data to be stored in the memory 6. To do.
- the communication device 10b executes the following processing. That is, the communication device 10 b deletes the memory address of the memory 6 that stores the data stored in the cache memory 5 from the write destination address table 16.
- the communication device 10b receives from the processor core 4g that the data to be polled is stored in the cache memory 5 and the memory address of the memory 6 that stores the data stored in the cache memory 5. In such a case, the communication device 10 b searches the received memory address from the write destination address table 16 and deletes the searched memory address from the write destination address table 16.
- FIG. 11 is a schematic diagram illustrating an example of a communication apparatus according to the fourth embodiment.
- the communication device 10 b includes a packet generation unit 11, a packet transmission unit 12, a packet reception unit 13, a determination unit 14 a, a storage unit 15, a write destination address table 16, and an update unit 17.
- the write destination address table 16 holds a memory address in which data targeted for polling processing of the processor core 4g is stored.
- FIG. 12 is a diagram for explaining an example of the write destination address table.
- the write destination address table 16 has N line memories with line numbers “0” to “N ⁇ 1” for storing memory addresses.
- a valid bit is stored in a 1-bit area shown in FIG. 12A, and a memory address is stored in a 64-bit range shown in FIG.
- the valid bit is a bit indicating whether or not the memory address stored in the corresponding line memory is valid data. For example, when “0” is stored in the valid bit, it indicates that the data at the memory address stored in the corresponding line memory is invalid. For example, when “1” is stored in the valid bit, it indicates that the data at the memory address stored in the corresponding line memory is valid.
- the write destination address table 16 is a semiconductor memory element such as a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory (flash memory), or the like.
- the determination unit 14 a When the determination unit 14 a receives a packet from the packet reception unit 13, the determination unit 14 a acquires the memory address of the memory 6 that stores the data stored in the received packet. The determination unit 14 a determines whether the same memory address as the acquired memory address is stored in the write destination address table 16.
- the determination unit 14 a compares the memory address stored in the line memory corresponding to the valid bit “1” and the acquired memory address among the line memories included in the write destination address table 16. When the memory address stored in the line memory in which “1” is stored in the valid bit matches the acquired memory address, the determination unit 14a caches the data stored in the received packet. It is determined that the data is stored in the memory 5.
- the determination unit 14a determines that the valid bit “1” and the memory address stored in the corresponding line memory do not match the acquired memory address, the data stored in the received packet is stored in the memory. 6 is determined to be data to be stored.
- the update unit 17 when the update unit 17 receives the memory address of the memory 6 storing the data to be polled from the processor core 4g, the update unit 17 sets the received memory address to the write destination address. Add to table 16.
- the updating unit 17 sets the memory address of the memory 6 that stores the data to be polled as the write destination address. Delete from table 16.
- the update unit 17 when the update unit 17 receives a memory address of the memory 6 that stores data to be polled from the processor core 4g, the update unit 17 selects a corresponding effective bit from among the line memories included in the write destination address table 16. A line memory having “0” is selected. Then, the updating unit 17 stores the memory address received from the processor core 4g in the selected line memory and updates the valid bit of the selected line memory to “1”.
- the update unit 17 receives from the processor core 4g that the data to be polled is stored in the cache memory 5 and the memory address of the memory 6 that stores the data stored in the cache memory 5, The following processing is executed. That is, the update unit 17 is a line memory in which the corresponding effective bit is “1” among the line memories included in the write destination address table 16, and stores the address received from the processor core 4g. Search for. Then, the updating unit 17 updates the valid bit corresponding to the searched line memory to “0”.
- the update unit 17 is an electronic circuit.
- an integrated circuit such as ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array), CPU (Central Processing Unit), MPU (Micro Processing Unit), or the like is applied.
- FIG. 13 is a flowchart for explaining the flow of processing executed by the communication apparatus according to the fourth embodiment.
- the communication device 10 b starts processing with the reception of a packet whose destination is the information processing device 2 f via the bus 8 as a trigger.
- the communication device 10b determines whether or not the write destination address of the data stored in the received packet matches the memory address stored in the write destination address table 16 (step S201). That is, the communication device 10 b determines whether or not the memory address of the memory 6 that stores the received data is registered in the write destination address table 16.
- the communication device 10b executes the following processing. To do. That is, the communication device 10b transmits the received packet data to the processor core 4g as data to be stored in the cache memory 5 (step S202).
- the communication device 10b executes the following processing. To do. That is, the communication device 10b transmits the received packet data to the processor core 4g as data stored in the memory 6 (step S203). Thereafter, the communication device 10b ends the process.
- the information processing apparatus 2f includes the write destination address table 16 that holds the memory address in which the processor core 4g stores data to be polled. Further, when the processor core 4g executes the polling process, the information processing apparatus 2f stores the memory address in which the data targeted for the polling process is stored in the write destination address table 16.
- the information processing device 2f determines whether or not the memory address for storing the received data is stored in the write destination address table 16. judge. If the information processing device 2 f determines that the memory address for storing the received data is stored in the write destination address table 16, the information processing device 2 f stores the received data in the cache memory 5. If the information processing apparatus 2 f determines that the memory address for storing the received data is not stored in the write destination address table 16, the information processing apparatus 2 f stores the received data in the memory 6.
- the information processing apparatus 2f prevents the data stored in the cache memory 5 from being discharged when the processor core 4g receives data that is not used for the calculation.
- the processor cores of the information processing devices 2f and 2g can execute efficient processing using the data stored in the cache memory, thereby preventing a reduction in calculation processing speed. can do.
- FIG. 14 is a schematic diagram illustrating an example of a parallel computer system according to the fifth embodiment.
- the parallel computer system 1 d includes a plurality of information processing devices including information processing devices 2 h and 2 i.
- each information processing apparatus such as the information processing apparatus 2i executes the same processing as the information processing apparatus 2h, and the description thereof is omitted.
- symbol is attached
- the information processing device 2h includes a processor 3d having a plurality of processor cores 4h to 4j, a memory 6, and a communication device 10c. Each of the processor cores 4h to 4j has cache memories 5a to 5c.
- the communication device 10c has a write destination address table 16a.
- the processor core 4 h stores the received data in the memory 6 when receiving data to be stored in the memory 6 from the communication device 10 c. Further, when receiving data to be stored in the cache memory 5a from the communication device 10c, the processor core 4h stores the received data in the cache memory 5a.
- the processor core 4h executes the following process when storing the received data in the cache memory 5a or the memory 6. That is, similarly to the processor core 4a, the processor core 4h holds coherency between the data stored in the cache memory 5a and the data stored in the memory 6 based on the identity information stored in the cache memory 5a. .
- the processor core 4h holds data coherency, which is the consistency of data stored in the cache memory 5a and the memory 6, according to the state data stored in each cache line of the cache memory 5a. To do. Thereafter, the processor core 4h stores the received data in the cache memory 5a or the memory 6.
- the detailed processing in which the processor core 4h holds the coherency of the data stored in the cache memory 5a and the memory 6 is the same as the processing executed by the processor core 4a, and the description thereof is omitted.
- the processor core 4h when executing the polling process, immediately writes the data to be polled into the write destination address table 16 of the communication device 10c. Register the destination address. At this time, the processor core 4h registers write destination cache memory identification information indicating its own cache memory 5a in association with the memory address together with the write destination address.
- the communication device 10c has a write destination address table 16a that holds a memory address for storing received data and write destination cache memory identification information for identifying each of the cache memories 5a to 5c in association with each other. And the communication apparatus 10c performs the following processes, when receiving the packet from other information processing apparatuses, such as the information processing apparatus 2i similarly to the communication apparatus 10b in connection with Example 4. That is, the communication device 10c determines whether or not the same memory address as the memory address for storing the received packet is stored in the write destination address table 16a.
- the communication device 10c executes the following processing. That is, the communication device 10c determines the processor core having the cache memory indicated by the write destination cache memory identification information associated with the same memory address as the memory address storing the received packet. Then, the communication device 10c transmits the received data as data to be stored in the cache memory to the determined processor core.
- the communication device 10c uses the received data as the data to be stored in the memory 6 as each processor core. Send to 4h-4j.
- FIG. 15 is a schematic diagram illustrating an example of a communication apparatus according to the fifth embodiment.
- the units included in the communication device 10c those that perform the same processing as the units illustrated in the first to fourth embodiments are denoted by the same reference numerals and description thereof is omitted.
- the communication device 10c includes a storage unit 15b, a determination unit 14b, a write destination address table 16a, and an update unit 17a. Similar to the write destination address table 16 according to the fourth embodiment, the write destination address table 16a holds a memory address for storing data to be subjected to polling processing. Furthermore, the write destination address table 16a holds the memory address to be held in association with the write destination cache memory identification information for identifying the cache memory included in the processor core that performs the polling process.
- the write destination address table 16 a includes a plurality of line memories in which storage areas for storing write destination cache memory identification information are further added to the respective line memories of the write destination address table 16.
- the write destination address table 16a stores the associated memory address and write destination cache memory identification information in the same line memory.
- the determination unit 14b acquires the memory address of the memory 6 that stores the data stored in the packet received by the packet reception unit 13. Then, the determination unit 14b determines whether the same memory address as the acquired memory address is stored in the write destination address table 16a.
- the determination unit 14b executes the following process. That is, the determination unit 14b acquires the write destination cache memory identification information stored in association with the memory address stored in the write destination address table 16a. Then, the determination unit 14b determines that the data stored in the received packet is data to be stored in the cache memory indicated by the acquired write destination cache memory identification information.
- the determining unit 14b stores the data stored in the received packet in the memory 6. judge.
- the storage unit 15b When the determination unit 14b determines that the received data is data stored in the cache memory, the storage unit 15b performs the following processing. That is, the storage unit 15b transmits the received data as data to be stored in the cache memory to the processor core having the cache memory indicated by the write destination cache memory identification information acquired by the determination unit 14b. For example, if the storage unit 15b determines that the data received by the determination unit 14b is data to be stored in the cache memory 5a, the storage unit 15b transmits the received data to the processor core 4h.
- the storage unit 15b determines that the data stored in the packet received by the determination unit 14b is data to be stored in the memory 6, the storage unit 15b uses the received data as data to be stored in the memory 6 as a processor core. Send to 4h-4j.
- the update unit 17a receives a memory address for storing data to be polled from each of the processor cores 4h to 4j. At the same time, the updating unit 17a receives write destination cache memory identification information indicating the cache memories 5a to 5c included in the processor cores 4h to 4j. In such a case, the updating unit 17a stores the received data and the write destination cache memory identification information in one memory line included in the write destination address table 16a.
- the update unit 17a stores the data to be polled from the processor cores 4h to 4j in the cache memory, and executes the following processing when the memory address is received. To do. That is, the update unit 17a searches the write destination address table 16a for a line memory in which the received memory address is stored, and updates the valid bit corresponding to the searched line memory to “0”.
- the information processing apparatus 2h having such units executes the following process when the processor core 4h executes the polling process. That is, the information processing device 2h stores the memory address for storing the data to be polled in the write destination address table 16a in association with the write destination cache memory identification information indicating the cache memory 5a.
- the information processing device 2h receives a packet from the information processing device 2i or the like, the information processing device 2h determines whether or not the memory address for storing the data stored in the received packet is stored in the write destination address table 16a. Determine.
- the information processing apparatus 2h determines that the processor core that is performing the polling process uses the processor core 4h based on the write destination cache memory identification information indicating the cache memory 5a that is stored in association with the memory address that stores the received data. It is determined that Then, the information processing apparatus 2h transmits the received data to the processor core 4h and stores it in the cache memory 5a.
- the information processing device 2h stores the data in the write destination address table 16a in association with the memory address for storing the data to be polled and the write destination cache memory identification information indicating the cache memory. Then, the information processing apparatus 2 determines whether or not the memory address for storing the received packet data is stored in the write destination address table 16a, and if it is stored, executes the following processing. That is, the information processing apparatus 2h stores the received data in the cache memory indicated by the write destination cache memory identification information stored in association with each other.
- the information processing device 2h can directly store the received data in the cache memories 5a to 5c when the data to be polled by the processor cores 4h to 4j is received. Therefore, the parallel computer system 1d can efficiently perform arithmetic processing even when the processors included in the information processing apparatuses 2h and 2i are multi-core processors having a plurality of processor cores.
- the information processing apparatus 2h is stored in the cache memories 5a to 5c when the processor cores 4h to 4j receive data that is not used for calculation. Prevent data discharge. As a result, the parallel computer system 1d can prevent a decrease in calculation processing speed.
- the information processing apparatus 2h uses the identity information stored in the cache memory 5a to identify the data stored in the cache memory 5a and the data stored in the memory 6. Hold. For this reason, the parallel computer system 1d can cause each information processing device 2h, 2i to execute an appropriate process.
- FIG. 16 is a diagram for explaining the parallel computer system according to the third embodiment.
- the parallel computer system 1e has a plurality of information processing devices such as information processing devices 2j and 2k. Note that the information processing device 2k and the like perform the same processing as the information processing device 2j, and a description thereof will be omitted.
- the information processing apparatus 2j includes a processor 3e having a plurality of processor cores 4k to 4m and a level 2 cache memory 7 shared by the processor cores 4k to 4m. Each of the processor cores 4k to 4m has level 1 cache memories 5d to 5f, respectively. Note that the processor cores 4l and 4m execute the same processing as the processor core 4k, and the following description is omitted.
- the processor core 4k receives the received data as data to be stored in the level 1 cache memory 5d when the data received by the communication device 10d is stored in the level 1 cache memory 5d. To store. Further, the processor core 4 k stores the received data in the memory 6 when the data received by the communication device 10 d is received as data to be stored in the memory 6.
- the processor core 4k executes the same processing as the processor core 4d according to the third embodiment when storing data in the level 1 cache memory 5d or the level 2 cache memory 7. In other words, the processor core 4k maintains the identity of the data stored in the level 1 cache memory 5d, the level 2 cache memory 7, and the memory 6 based on the first identity information and the second identity information.
- the processor core 4k transmits to the communication device 10d the memory address for storing the data to be polled and the write destination cache memory identification information indicating the level 1 cache memory 5d. To do. That is, the processor core 4k associates the memory address for storing the data to be polled with the write destination cache memory identification information indicating the level 1 cache memory 5d, and stores them in the write destination address table 16b.
- the processor core 4k when the data to be polled is stored in the level 1 cache memory 5d, the processor core 4k indicates that the data has been stored in the level 1 cache memory 5d and the memory address for storing the data. Send to. That is, the processor core 4k deletes, from the write destination address table 16b, a memory address or the like that stores data to be polled.
- the communication device 10d has a write destination address table 16b. Similar to the write destination address table 16a, the following information is stored in the write address table 16b in association with each other. That is, in the write destination address table 16b, the memory address for storing the data to be polled and the write destination cache memory identification information for identifying each level 1 cache memory 5d to 5f are stored in association with each other. Yes.
- the communication device 10d executes the following processing when receiving a packet from another information processing device such as the information processing device 2k. That is, the communication device 10d determines whether or not the same memory address as the memory address for storing the received packet is stored in the write destination address table 16b.
- the communication device 10d executes the following processing when the same memory address as the memory address for storing the received packet is stored in the write destination address table 16b. That is, the communication device 10d determines a processor core having a level 1 cache memory indicated by the write destination cache memory identification information associated with the same memory address as the memory address storing the received packet. Then, the communication device 10d transmits the received data as data to be stored in the cache memory to the determined processor core.
- the communication device 10d uses the received data as the data to be stored in the memory 6 for each processor core. Send to 4k-4f.
- the information processing apparatus 2j having such units executes the following process when the processor core 4k executes the polling process. That is, the information processing device 2j associates the memory address for storing the data to be polled with the write destination cache memory identification information indicating the level 1 cache memory 5d, and stores them in the write destination address table 16b.
- the information processing device 2j receives a packet from another information processing device such as the information processing device 2k, the memory address for storing the data stored in the received packet is stored in the write destination address table 16b. It is determined whether or not.
- the information processing apparatus 2j determines that the processor core performing the polling process is based on the write destination cache memory identification information indicating the level 1 cache memory 5d stored in association with the memory address storing the received data. It is determined that the core 4k. Then, the information processing device 2j transmits the received data to the processor core 4k and stores it in the level 1 cache memory 5d.
- the information processing device 2j executes the following processing using the first identity information and the second identity information stored in the level 1 cache memories 5d to 5f and the level 2 cache memory 7. That is, the information processing device 2j maintains the identity of the data stored in each of the level 1 cache memories 5d to 5f, the level 2 cache memory 7, and the memory 6, and then receives the received data for each level 1 cache memory 5d. 5f or stored in the memory 6.
- the information processing device 2j associates the memory address storing the data to be polled with the write destination cache memory identification information indicating the level 1 cache memory included in the processor core that executes the polling process. Add and remember. When the memory address for storing the received data matches the stored memory address, the information processing device 2j receives the data received in the level 1 cache memory indicated by the write destination cache memory identification information stored in association therewith. Is stored. Therefore, the parallel computer system 1e can efficiently perform arithmetic processing on each of the processor cores 4k to 4m.
- the information processing apparatus 2j when the information processing apparatus 2j receives data that is not the target of the polling process, the information processing apparatus 2j stores the received data in the memory 6. For this reason, the parallel computer system 1e can perform efficient arithmetic processing without reducing the calculation processing speed.
- the information processing device 2j executes the following process using the first identity information stored in the level 1 cache memories 5d to 5f and the second identity information stored in the level 2 cache memory 7. . That is, the information processing apparatus 2j appropriately maintains the identity of the data stored in the level 1 cache memories 5d to 5f, the level 2 cache memory 7, and the memory 6. As a result, the parallel computer system 1e can execute an appropriate arithmetic processing device.
- Each parallel computer system 1, 1 c in the above description has a processor core 4, 4 g having a cache memory 5.
- each parallel computer system 1, 1c may include a processor core having a level 1 cache memory and a processor having a level 2 cache memory.
- the processor cores of the parallel computer systems 1 and 1c maintain data coherency by using the first identity information and the second identity information stored in the level 1 cache memory and the level 2 cache memory. It is good.
- the information processing apparatus determines whether or not the received data is data that is subject to polling processing, and if it is determined that it is data that is subject to polling processing, stores the received data in the cache memory To do. Further, when the information processing apparatus determines that the received data is not the data to be polled, the information processing apparatus stores the received data in the main memory. That is, the information processing apparatus can have any configuration as long as it is an information processing apparatus that performs such processing.
- Each of the parallel computer systems 1 to 1e described above has a plurality of information processing apparatuses that exhibit the same function.
- the embodiment is not limited to this, and may be a parallel computer system including an arbitrary information processing apparatus in each embodiment. That is, the information processing apparatus determines whether or not the data of the received packet is data to be polled based on whether or not “1” is stored as control information in the received packet. Then, it is determined whether or not the memory address is stored in the write destination address table. Then, the information processing apparatus may determine that the data is subject to polling processing when any of the conditions is satisfied.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
次に、図6を用いて、通信装置10が実行する処理の流れについて説明する。図6は、実施例1に関わる通信装置が実行する処理の流れを説明するためのフローチャートである。図6に示す例では、通信装置10は、バス8を介して情報処理装置2を送信先とするパケットを受信したことをトリガとして処理を開始する。
上述したように、情報処理装置2は、他の情報処理装置2aからデータを受信した場合には、受信したデータをプロセッサコア4が待ち合わせているか否かを判定する。そして、情報処理装置2は、受信したデータをプロセッサコア4が待ち合わせていると判定した場合には、受信したデータをキャッシュメモリ5に格納する。また、情報処理装置2は、受信したデータをプロセッサコア4が待ち合わせていないと判定した場合には、受信したデータをキャッシュメモリ5ではなくメモリ6に格納する。
上述したように、情報処理装置2bは、複数のプロセッサコア4a~4cを有するプロセッサ3aを有する。また、各プロセッサコア4a~4cは、それぞれキャッシュメモリ5a~5cを有する。そして、情報処理装置2bは、情報処理装置2c等の他の情報処理装置からパケットを受信した場合には、受信したパケットのフラグ領域に制御情報として「1」が格納されているか否かを判定する。そして、情報処理装置2bは、制御情報として「1」が格納されている場合には、受信したパケットのデータを格納するメモリ6のメモリアドレスに格納されていたデータをキャッシュしているキャッシュメモリに受信したパケットのデータを格納する。また、情報処理装置2bは、制御情報として「0」が格納されている場合には、受信したパケットのデータをメモリ6に格納する。
上述したように、情報処理装置2eは、複数のプロセッサコア4d~4fを有するプロセッサ3bを有する。また、各プロセッサコア4d~4fは、それぞれレベル1キャッシュメモリ5d~5fを有する。そして、情報処理装置2eは、情報処理装置2f等の他の情報処理装置からパケットを受信した場合には、受信したパケットのフラグ領域に制御情報として「1」が格納されているか否かを判定する。
次に、図13を用いて、通信装置10bが実行する処理の流れについて説明する。図13は、実施例4に関わる通信装置が実行する処理の流れを説明するためのフローチャートである。図13に示す例では、通信装置10bは、バス8を介して情報処理装置2fを送信先とするパケットを受信したことをトリガとして処理を開始する。
上述したように、情報処理装置2fは、プロセッサコア4gがポーリング処理の対象とするデータを格納するメモリアドレスを保持する書き込み先アドレステーブル16を有する。また、情報処理装置2fは、プロセッサコア4gがポーリング処理を実行する場合には、ポーリング処理の対象となるデータが格納されるメモリアドレスを書き込み先アドレステーブル16に格納する。
上述したように、情報処理装置2hは、ポーリング処理の対象となるデータを格納するメモリアドレスとキャッシュメモリを示す書込先キャッシュメモリ識別情報と対応付けて書き込み先アドレステーブル16aに格納する。そして、情報処理装置2は、受信したパケットのデータを格納するメモリアドレスが書き込み先アドレステーブル16aかに格納されているか否かを判別し、格納されている場合には以下の処理を実行する。すなわち、情報処理装置2hは、対応付けて格納されていた書込先キャッシュメモリ識別情報が示すキャッシュメモリに受信したデータを格納する。
上述したように、情報処理装置2jは、ポーリング処理の対象となるデータを格納するメモリアドレスと、ポーリング処理を実行するプロセッサコアが有するレベル1キャッシュメモリを示す書込先キャッシュメモリ識別情報とを対応付けて記憶する。そして、情報処理装置2jは、受信したデータを格納するメモリアドレスが記憶したメモリアドレスと一致する場合には、対応付けて記憶した書込先キャッシュメモリ識別情報が示すレベル1キャッシュメモリに受信したデータを格納する。このため、並列計算機システム1eは、各プロセッサコア4k~4mに効率的に演算処理を実行することができる。
上述した説明における各並列計算機システム1、1cは、キャッシュメモリ5を有するプロセッサコア4、4gを有していた。しかし、実施例はこれに限定されるものではなく、例えば、各並列計算機システム1、1cは、レベル1キャッシュメモリを有するプロセッサコアとレベル2キャッシュメモリを有するプロセッサを有しても良い。そして、並列計算機システム1、1cが有するプロセッサコアは、レベル1キャッシュメモリおよびレベル2キャッシュメモリに格納された第1同一性情報と第2同一性情報とを用いて、データのコヒーレンシを保持することとしてもよい。
上述した各並列計算機システム1~1eは、それぞれ同様の機能を発揮する複数の情報処理装置を有していた。しかし、実施例はこれに限定されるものではなく、各実施例における任意の情報処理装置をそなえた並列計算機システムであってもよい。つまり、情報処理装置は、受信したパケットに制御情報として「1」が格納されているか否かに基づいて、受信したパケットのデータがポーリング処理の対象となるデータであるか否かを判定するとともに、書き込み先アドレステーブルにメモリアドレスが格納されているか否かを判定する。そして、情報処理装置は、いずれかの条件を満たした場合には、ポーリング処理の対象となるデータであると判定してもよい。
2~2k 情報処理装置
3~3e プロセッサ
4~4m プロセッサコア
5~5c キャッシュメモリ
5d~5f レベル1キャッシュメモリ
6 メモリ
7 レベル2キャッシュメモリ
10~10d 通信装置
11 パケット生成部
12 パケット送信部
13 パケット受信部
14~14b 判定部
15~15b 格納部
16、16a 書き込み先アドレステーブル
17、17a 更新部
Claims (12)
- 複数の情報処理装置を備える並列計算機システムを構成する情報処理装置において、
データを保持する主記憶装置と、
前記主記憶装置に保持されたデータの一部を保持するキャッシュメモリ部と、前記主記憶装置又は前記キャッシュメモリ部に保持されたデータを用いて演算を行う演算処理部とを有する演算処理装置と、
他の情報処理装置から受信したデータが、前記演算処理装置が待ち合わせているデータであるか否かを判定し、前記受信したデータが前記演算処理装置が待ち合わせているデータであると判定した場合には、前記受信したデータを前記キャッシュメモリ部に格納し、前記受信したデータが前記演算処理装置が待ち合わせていないデータであると判定した場合には、前記受信したデータを前記主記憶装置に格納する通信装置と
を有することを特徴とする情報処理装置。 - 前記情報処理装置において、
前記通信装置は、
他の情報処理装置の演算処理装置が待ち合わせているデータに、前記他の情報処理装置の演算処理装置が有するキャッシュメモリ部に書き込む旨の制御情報を付加したデータを前記他の情報処理装置に送信する送信部と、
前記他の情報処理装置から受信したデータに、制御情報が付加されているか否かを判定する判定部と、
受信したデータに制御情報が付加されていると前記判定部が判定した場合に当該受信したデータを前記キャッシュメモリ部に格納し、前記受信したデータに制御情報が付加されていないと判定部が判定した場合に当該受信したデータを前記主記憶装置に格納する格納部と、
を有することを特徴とする請求項1記載の情報処理装置。 - 前記情報処理装置において、
前記演算処理装置は、
キャッシュメモリ部を有する演算処理部を複数有し、
前記格納部は、受信したデータに前記制御情報が付加されていると前記判定部が判定した場合、前記複数の演算処理部のキャッシュメモリ部のうち、前記受信したデータの書込先アドレスが示す主記憶装置の格納領域に格納されているデータを保持しているキャッシュメモリ部に前記受信したデータを格納することを特徴とする請求項2に記載の情報処理装置。 - 前記情報処理装置において、
前記キャッシュメモリ部は、自身が保持するデータと他のキャッシュメモリ部が保持するデータと前記主記憶装置が保持するデータとの関係を示す同一性情報を前記データと対応付けて記憶し、
前記演算処理部は、前記データを前記主記憶装置または自身のキャッシュメモリ部に格納する場合には、自身のキャッシュメモリ部が記憶する前記同一性情報に基づいて、自身のキャッシュメモリ部に格納されたデータと前記主記憶装置に格納されたデータとの同一性を保持し、受信したデータを前記主記憶装置または自身のキャッシュメモリ部に格納することを特徴とする請求項3記載の情報処理装置。 - 前記情報処理装置において、
前記演算処理装置はさらに、
前記複数の演算処理部が共有する共有キャッシュメモリ部を有し、
前記キャッシュメモリ部は、自身が保持するデータと他のキャッシュメモリ部が保持するデータと前記共有キャッシュメモリ部が保持するデータとの関係を示す第1同一性情報を前記データと対応付けて記憶し、
前記共有キャッシュメモリ部は、自身が保持するデータと前記主記憶装置が保持するデータとの関係を示す第2同一性情報を前記データと対応付けて記憶し、
前記演算処理部は、前記データを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納する場合には、自身のキャッシュメモリ部が記憶する前記第1同一性情報と、前記共有キャッシュメモリ部が記憶する第2同一性情報とに基づいて、前記自身のキャッシュメモリ部に格納されたデータと前記共有キャッシュメモリ部に格納されたデータと前記主記憶装置に格納されたデータとの同一性を保持し、その後、受信したデータを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納することを特徴とする請求項3記載の情報処理装置。 - 前記情報処理装置において、
前記通信装置は、
前記キャッシュメモリ部への書き込みを制御する制御アドレスを保持するアドレス保持部をさらに有し、
他の情報処理装置から受信したデータの書込先アドレスが、前記アドレス保持部に保持された制御アドレスと一致する場合、前記受信したデータを前記キャッシュメモリ部に書き込むことを特徴とする請求項1記載の情報処理装置。 - 前記情報処理装置において、
前記演算処理部は、前記演算処理部が待ち合わせるデータの書込先アドレスを、前記制御アドレスとして前記アドレス保持部に保持させることを特徴とする請求項6記載の情報処理装置。 - 前記情報処理装置において、
前記演算処理装置は、
キャッシュメモリ部を有する演算処理部を複数有し、
前記複数の演算処理部の各々は、待ち合わせるデータの書込先アドレスを前記制御アドレスとして前記アドレス保持部に保持させるとともに、前記複数の演算処理部のキャッシュメモリ部のうち前記待ち合わせるデータの書込先であるキャッシュメモリ部を識別する書込先キャッシュメモリ識別情報を、前記アドレス保持部に保持させることを特徴とする請求項6または7記載の情報処理装置。 - 前記情報処理装置において、
前記キャッシュメモリ部は、自身が保持するデータと他のキャッシュメモリ部が保持するデータと前記主記憶装置が保持するデータとの関係を示す同一性情報を前記データと対応付けて記憶し、
前記演算処理部は、前記データを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納する場合には、自身のキャッシュメモリ部が記憶する前記同一性情報に基づいて、自身のキャッシュメモリ部に格納されたデータと前記主記憶装置に格納されたデータとの同一性を保持し、その後、受信したデータを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納することを特徴とする請求項8記載の情報処理装置。 - 前記情報処理措置において、
前記演算処理装置はさらに、
前記複数の演算処理部が共有する共有キャッシュメモリ部を有し、
前記キャッシュメモリ部は、自身が保持するデータと他のキャッシュメモリ部が保持するデータと前記共有キャッシュメモリ部が保持するデータとの関係を示す第1同一性情報を前記データと対応付けて記憶し、
前記共有キャッシュメモリ部は、自身が保持するデータと前記主記憶装置が保持するデータとの関係を示す第2同一性情報を前記データと対応付けて記憶し、
前記演算処理部は、前記データを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納する場合には、自身のキャッシュメモリ部が記憶する前記第1同一性情報と、前記共有キャッシュメモリ部が記憶する第2同一性情報とに基づいて、前記自身のキャッシュメモリ部に格納されたデータと前記共有キャッシュメモリ部に格納されたデータと前記主記憶装置に格納されたデータとの同一性を保持するとともに、受信したデータを前記主記憶装置、または、前記自身のキャッシュメモリ部に格納することを特徴とする請求項8記載の情報処理装置。 - 複数の情報処理装置を有する並列計算機システムであって、
前記情報処理装置は、
データを保持する主記憶装置と、
前記主記憶装置に保持されたデータの一部を保持するキャッシュメモリ部と、前記主記憶装置又は前記キャッシュメモリ部に保持されたデータを用いて演算を行う演算処理部とを有する演算処理装置と、
他の情報処理装置から受信したデータが、前記演算処理装置が待ち合わせているデータであるか否かを判定し、前記受信したデータが前記演算処理装置が待ち合わせているデータであると判定した場合には、前記受信したデータを前記キャッシュメモリ部に格納する通信装置と
を有することを特徴とする並列計算機システム。 - データを保持する主記憶装置と、前記主記憶装置に保持されたデータの一部を保持するキャッシュメモリ部を有するとともに、前記主記憶装置又は前記キャッシュメモリ部に保持されたデータを用いて演算を行う演算処理部を有する、並列計算機システムに含まれる演算処理装置の制御方法であって、
前記情報処理装置が有する通信装置が、前記並列計算機システムが有する他の情報処理装置から受信したデータが、前記演算処理装置が待ち合わせているデータであるか否かを判定し、
前記通信装置が、前記受信したデータが前記演算処理装置が待ち合わせているデータであると判定した場合には、前記受信したデータを前記キャッシュメモリ部に格納することを特徴とする演算処理装置の制御方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2011/058832 WO2012137339A1 (ja) | 2011-04-07 | 2011-04-07 | 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 |
EP11863039.1A EP2696289B1 (en) | 2011-04-07 | 2011-04-07 | Information processing device, parallel computer system, and computation processing device control method |
CN201180070697.4A CN103502959B (zh) | 2011-04-07 | 2011-04-07 | 信息处理装置以及并行计算机系统 |
JP2013508690A JP5621918B2 (ja) | 2011-04-07 | 2011-04-07 | 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 |
US14/047,059 US9164907B2 (en) | 2011-04-07 | 2013-10-07 | Information processing apparatus, parallel computer system, and control method for selectively caching data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2011/058832 WO2012137339A1 (ja) | 2011-04-07 | 2011-04-07 | 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/047,059 Continuation US9164907B2 (en) | 2011-04-07 | 2013-10-07 | Information processing apparatus, parallel computer system, and control method for selectively caching data |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012137339A1 true WO2012137339A1 (ja) | 2012-10-11 |
Family
ID=46968774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/058832 WO2012137339A1 (ja) | 2011-04-07 | 2011-04-07 | 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9164907B2 (ja) |
EP (1) | EP2696289B1 (ja) |
JP (1) | JP5621918B2 (ja) |
CN (1) | CN103502959B (ja) |
WO (1) | WO2012137339A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2879058A1 (en) * | 2013-11-29 | 2015-06-03 | Fujitsu Limited | Parallel computer system, control method of parallel computer system, information processing device, arithmetic processing device, and communication control device |
JP2022550686A (ja) * | 2019-09-27 | 2022-12-05 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 統合キャッシュを有するアクティブブリッジチップレット |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5664039B2 (ja) * | 2010-09-08 | 2015-02-04 | 富士通株式会社 | リダクション演算装置、処理装置及びコンピュータシステム |
DE102013219543A1 (de) * | 2013-09-27 | 2015-04-02 | Siemens Aktiengesellschaft | Kommunikationsgerät und Verfahren zur Kommunikation zwischen einem Kommunikationsgerät und einer zentralen Einrichtung |
US9697126B2 (en) * | 2014-11-25 | 2017-07-04 | Qualcomm Incorporated | Generating approximate usage measurements for shared cache memory systems |
US20170116154A1 (en) * | 2015-10-23 | 2017-04-27 | The Intellisis Corporation | Register communication in a network-on-a-chip architecture |
JP7139719B2 (ja) * | 2018-06-26 | 2022-09-21 | 富士通株式会社 | 情報処理装置、演算処理装置及び情報処理装置の制御方法 |
JP2023085819A (ja) * | 2021-12-09 | 2023-06-21 | 富士通株式会社 | パケット制御装置及びパケット制御方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06314239A (ja) * | 1993-04-28 | 1994-11-08 | Hitachi Ltd | プロセッサシステム |
JPH1139214A (ja) | 1997-07-24 | 1999-02-12 | Toshiba Corp | マルチプロセッサシステムの共有メモリ制御方式 |
JP2002185470A (ja) * | 2000-12-19 | 2002-06-28 | Nec Corp | Lan接続システム |
JP2002278834A (ja) * | 2001-03-21 | 2002-09-27 | Nec Corp | キャッシュメモリ装置およびそれを含むデータ処理装置 |
WO2007110898A1 (ja) | 2006-03-24 | 2007-10-04 | Fujitsu Limited | マルチプロセッサシステムおよびマルチプロセッサシステムの動作方法 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4349871A (en) * | 1980-01-28 | 1982-09-14 | Digital Equipment Corporation | Duplicate tag store for cached multiprocessor system |
US4847804A (en) * | 1985-02-05 | 1989-07-11 | Digital Equipment Corporation | Apparatus and method for data copy consistency in a multi-cache data processing unit |
JP3200757B2 (ja) * | 1993-10-22 | 2001-08-20 | 株式会社日立製作所 | 並列計算機の記憶制御方法および並列計算機 |
US5745728A (en) * | 1995-12-13 | 1998-04-28 | International Business Machines Corporation | Process or renders repeat operation instructions non-cacheable |
JP3288261B2 (ja) * | 1997-06-19 | 2002-06-04 | 甲府日本電気株式会社 | キャッシュシステム |
JP2000010860A (ja) * | 1998-06-16 | 2000-01-14 | Hitachi Ltd | キャッシュメモリ制御回路及びプロセッサ及びプロセッサシステム及び並列プロセッサシステム |
JP2002197073A (ja) * | 2000-12-25 | 2002-07-12 | Hitachi Ltd | キャッシュ一致制御装置 |
US6757785B2 (en) * | 2001-11-27 | 2004-06-29 | International Business Machines Corporation | Method and system for improving cache performance in a multiprocessor computer |
US20040117590A1 (en) * | 2002-12-12 | 2004-06-17 | International Business Machines Corp. | Aliasing support for a data processing system having no system memory |
US20040117587A1 (en) * | 2002-12-12 | 2004-06-17 | International Business Machines Corp. | Hardware managed virtual-to-physical address translation mechanism |
CN101689141B (zh) * | 2007-06-20 | 2012-10-17 | 富士通株式会社 | 高速缓存装置、运算处理装置及其控制方法 |
US8266386B2 (en) * | 2007-10-30 | 2012-09-11 | International Business Machines Corporation | Structure for maintaining memory data integrity in a processor integrated circuit using cache coherency protocols |
JP5482197B2 (ja) * | 2009-12-25 | 2014-04-23 | 富士通株式会社 | 演算処理装置、情報処理装置及びキャッシュメモリ制御方法 |
JP2011198091A (ja) * | 2010-03-19 | 2011-10-06 | Toshiba Corp | 仮想アドレスキャッシュメモリ、プロセッサ及びマルチプロセッサシステム |
JP6040840B2 (ja) * | 2013-03-29 | 2016-12-07 | 富士通株式会社 | 演算処理装置、情報処理装置及び情報処理装置の制御方法 |
-
2011
- 2011-04-07 EP EP11863039.1A patent/EP2696289B1/en active Active
- 2011-04-07 WO PCT/JP2011/058832 patent/WO2012137339A1/ja active Application Filing
- 2011-04-07 JP JP2013508690A patent/JP5621918B2/ja active Active
- 2011-04-07 CN CN201180070697.4A patent/CN103502959B/zh active Active
-
2013
- 2013-10-07 US US14/047,059 patent/US9164907B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06314239A (ja) * | 1993-04-28 | 1994-11-08 | Hitachi Ltd | プロセッサシステム |
JPH1139214A (ja) | 1997-07-24 | 1999-02-12 | Toshiba Corp | マルチプロセッサシステムの共有メモリ制御方式 |
JP2002185470A (ja) * | 2000-12-19 | 2002-06-28 | Nec Corp | Lan接続システム |
JP2002278834A (ja) * | 2001-03-21 | 2002-09-27 | Nec Corp | キャッシュメモリ装置およびそれを含むデータ処理装置 |
WO2007110898A1 (ja) | 2006-03-24 | 2007-10-04 | Fujitsu Limited | マルチプロセッサシステムおよびマルチプロセッサシステムの動作方法 |
Non-Patent Citations (2)
Title |
---|
RAM HUGGAHALLI; RAVI IYER; SCOTT TETRICK: "Direct Cache Access for High Bandwidth Network I/O", ISCA '05 PROCEEDINGS OF THE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE |
See also references of EP2696289A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2879058A1 (en) * | 2013-11-29 | 2015-06-03 | Fujitsu Limited | Parallel computer system, control method of parallel computer system, information processing device, arithmetic processing device, and communication control device |
US9542313B2 (en) | 2013-11-29 | 2017-01-10 | Fujitsu Limited | Parallel computer system, control method of parallel computer system, information processing device, arithmetic processing device, and communication control device |
JP2022550686A (ja) * | 2019-09-27 | 2022-12-05 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 統合キャッシュを有するアクティブブリッジチップレット |
JP7478229B2 (ja) | 2019-09-27 | 2024-05-02 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 統合キャッシュを有するアクティブブリッジチップレット |
Also Published As
Publication number | Publication date |
---|---|
US9164907B2 (en) | 2015-10-20 |
JP5621918B2 (ja) | 2014-11-12 |
EP2696289A4 (en) | 2014-02-19 |
CN103502959B (zh) | 2016-01-27 |
JPWO2012137339A1 (ja) | 2014-07-28 |
US20140040558A1 (en) | 2014-02-06 |
EP2696289A1 (en) | 2014-02-12 |
CN103502959A (zh) | 2014-01-08 |
EP2696289B1 (en) | 2016-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5621918B2 (ja) | 情報処理装置、並列計算機システムおよび演算処理装置の制御方法 | |
JP5939305B2 (ja) | 情報処理装置,並列計算機システム及び情報処理装置の制御方法 | |
US7194517B2 (en) | System and method for low overhead message passing between domains in a partitioned server | |
CN109726163B (zh) | 一种基于spi的通信系统、方法、设备和储存介质 | |
US7941613B2 (en) | Shared memory architecture | |
US9977750B2 (en) | Coherent memory interleaving with uniform latency | |
US8055805B2 (en) | Opportunistic improvement of MMIO request handling based on target reporting of space requirements | |
TW201339836A (zh) | 資訊處理設備、算術裝置及資訊傳送方法 | |
KR20010101193A (ko) | 판독 요청을 원격 처리 노드에 추론적으로 전송하는비정형 메모리 액세스 데이터 처리 시스템 | |
US9753872B2 (en) | Information processing apparatus, input and output control device, and method of controlling information processing apparatus | |
CN107360268B (zh) | 一种数据包处理方法、装置及设备 | |
WO2021114768A1 (zh) | 数据处理装置、方法、芯片、处理器、设备及存储介质 | |
CN115964319A (zh) | 远程直接内存访问的数据处理方法及相关产品 | |
CN103412829A (zh) | 扩大mcu程序地址空间的方法及装置 | |
JP2018045438A (ja) | 並列処理装置、送信プログラム、受信プログラム及びデータ転送方法 | |
CN118069387A (zh) | 一种基于硬件多线程的rdma数据发送队列管理方法及装置 | |
US8447934B2 (en) | Reducing cache probe traffic resulting from false data sharing | |
WO2011148925A1 (ja) | 半導体装置とネットワークルーティング方法とシステム | |
CN114003525B (zh) | 数据传输的方法、模块、装置、设备及存储介质 | |
JP4658064B2 (ja) | 相互接続ネットワークでの効率的な順序保存用の方法及び装置 | |
CN113157610B (zh) | 数据保存方法及装置、存储介质、电子装置 | |
CN115729649A (zh) | 一种数据缓存方法、有限状态机、处理器和存储系统 | |
CN118113130A (zh) | 芯片功耗的管理方法、装置、系统及计算机可读存储介质 | |
JP5958192B2 (ja) | 演算処理装置、情報処理装置、及び演算処理装置の制御方法 | |
CN115114041A (zh) | 众核系统中数据的处理方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11863039 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013508690 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2011863039 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011863039 Country of ref document: EP |