EP2579160A1 - Information processing system and system controller - Google Patents
Information processing system and system controller Download PDFInfo
- Publication number
- EP2579160A1 EP2579160A1 EP10852152.7A EP10852152A EP2579160A1 EP 2579160 A1 EP2579160 A1 EP 2579160A1 EP 10852152 A EP10852152 A EP 10852152A EP 2579160 A1 EP2579160 A1 EP 2579160A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- unit
- cache
- request
- busy
- cpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000010365 information processing Effects 0.000 title claims description 31
- 230000015654 memory Effects 0.000 claims abstract description 186
- 238000012544 monitoring process Methods 0.000 claims abstract description 46
- 238000012545 processing Methods 0.000 claims description 32
- 238000012508 change request Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 34
- 238000000034 method Methods 0.000 description 21
- 230000008569 process Effects 0.000 description 20
- 230000008859 change Effects 0.000 description 18
- 238000007726 management method Methods 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
- G06F12/0828—Cache consistency protocols using directory methods with concurrent directory accessing, i.e. handling multiple concurrent coherency transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0855—Overlapped cache accessing, e.g. pipeline
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1689—Synchronisation and timing concerns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1694—Configuration of memory controller to different memory types
Definitions
- the embodiments discussed herein are related to information processing system and a system controller.
- Parallel processing is effective to improve the processing speed of the information processing system.
- a plurality of processing units CPU: Central Processing Unit
- CPU Central Processing Unit
- a cache memory is provided between the CPU and a main memory.
- the cache memory holds data, its address and status of which the CPU would access among the data stored in the main memory, and is configured of a high-speed and a small-capacity memory.
- the cache memory executes an input and output of the data behalf of the main memory of which the CPU originally accesses.
- the cache memory automatically stores the data and performs an alternative operation of the main memory, program of the CPU is no need to be aware of the cache memory.
- the cache memory is provided in the CPU chip.
- SMP Symmetric Multi-Processing
- one CPU performs a snoop to search contents which are registered in the cache memory of the other CPU.
- synchronization mechanism of the cache memory of the CPU is provided.
- the synchronization mechanism of the cache memory When processing consecutive requests to the same cache address, the synchronization mechanism of the cache memory retries a subsequent request which arrives before an update of the cache management information on an preceding request is not completed. This control is referred to as busy control.
- the SMP system has set a uniform monitoring range of the busy to all CPU chips.
- Patent Document 1 Japanese Laid-Open Patent Publication No. 2008-123333 .
- the CPU chip of which the capacity of the cache memory has been increased are provided for improving of the CPU performance. And it is effective to add new CPU chip in order to improve the performance of existing information processing system. For example, a CPU chip having a larger capacity of the cache memory is added to the existing system which is composed of a CPU chip having a smaller capacity of the cache memory. Or a CPU chip having a smaller capacity of the cache memory is added to the existing system which is composed of a CPU chip having a larger capacity of the cache memory. In this way, there is a high need of the technology in the system operation to reduce unnecessary costs by adding a CPU chip required for processing.
- a system controller that is connected to the plurality of CPU chips has the cache synchronization mechanism.
- the system controller When connecting the plurality of CPU chips which have the cache memory of different capacity to one system controller, in order to take the integrity of the TAG between the plurality of CPUs, the system controller has set same monitoring range of the busy for the plurality of CPU chips.
- information processing system includes a first CPU unit having a first CPU and a first cache memory to store cache tag information and cache data, a second CPU unit having a second CPU and a second cache memory that stores cache tag information and cache data and has a different capacity from the capacity of the first cache memory, and a system controller that is connected to the first CPU unit and the second CPU unit and searches a third cache memory that stores a copy of the cache tag information in the first cache memory and a fourth cache memory that stores a copy of the cache tag information in the second cache memory according to a request of the first cache memory and the second cache memory from the first CPU unit and the second CPU unit.
- the system controller includes a cache synchronization unit that monitors whether or not preceding request and subsequent request requires same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request and a setting unit that sets different monitoring range of the busy between the third cache tag memory and a fourth cache tag memory to the cache synchronization unit.
- the system controller that is connected to a first CPU unit having a first CPU and a first cache memory to store cache tag information and cache data and a second CPU unit having a second CPU and a second cache memory that stores cache tag information and cache data and has a different capacity from the capacity of the first cache memory, includes a cache tag search unit that searches a third cache memory that stores a copy of the cache tag information in the first cache memory and a fourth cache memory that stores a copy of the cache tag information in the second cache memory according to a request of the first cache memory and the second cache memory from the first CPU unit and the second CPU unit.
- a cache synchronization unit that monitors whether or not preceding request and subsequent request requires same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request and a setting unit that sets different monitoring range of the busy between the third cache tag memory and a fourth cache tag memory to the cache synchronization unit.
- the system controller connects to the plurality of CPU units of which has a cache memory of different capacity each other and controls cache synchronization, because monitoring range of contention between the preceding request and the subsequent request are set for each capacity of the cache memory of each CPU unit, it is possible to improve throughput of the CPU unit having the cache memory of larger capacity.
- FIG. 1 is a block diagram of information processing system according to an embodiment.
- FIG. 2 is a block diagram of a system board in FIG. 1 .
- the information processing system in FIG. 1 illustrates a server system connected with a computer network.
- the server system 1 includes a plurality of system boards (SB: System Boards) 1A ⁇ 1P as processing devices, a management board (MMB) 2 as a system controller (SVP: Service Processor) and crossbar switches (or switches) 30A ⁇ 30D.
- SB System Boards
- MMB management board
- SVP Service Processor
- a first switch board 3A mounts a first crossbar switch 30A and a second crossbar switch 30B. And the first crossbar switch 30A connects to the second crossbar switch 30B via buses L1 and L2. Further, a second switch board 3B mounts a third crossbar switch 30C and a fourth crossbar switch 30D. And the third crossbar switch 30C connects to the fourth crossbar switch 30D via buses L3 and L4.
- first crossbar switch 30A connects to the fourth crossbar switch 30D via buses L9 and L10.
- second crossbar switch 30B connects to the third crossbar switch 30C via buses L11 and L12.
- Each of the first, second, third and fourth system boards 1A, 1B, 1C, 1D connect to the first crossbar switch 30A via buses L20, L21, L22 and L23.
- Each of the fifth, sixth, seventh and eighth system boards 1E, 1F, 1G, 1H connect to the second crossbar switch 30B via buses L24, L25, L26 and L27.
- Each of the ninth, tenth, eleventh and twelfth system boards 1I, 1J, 1K, 1L connect to the third crossbar switch 30C via buses L28, L29, L30 and L31.
- Each of the thirteenth, fourteenth, fifteenth and sixteenth system boards 1M, 1N, 1O, 1P connect to the fourth crossbar switch 30D via buses L32, L33, L34 and L35.
- the management board (hereinafter referred to MMB) 2 connects to each of the system boards 1A ⁇ 1P via internal buses L40 and L42.
- the MMB 2 monitors the status, sets status, controls start and stop of each system boards 1A ⁇ 1P.
- the server system is illustrated by an example of configuration of sixteen unit of the system boards 1A ⁇ 1P and four unit of crossbar switches 30A ⁇ 30D.
- the number of the system boards and the number of the crossbar switch are not limited to 16 units and 4 units.
- each of the system boards 1A (1B ⁇ 1P) includes a plurality of CPU chips 10-0 ⁇ 10-3 (as described to "CPU” in FIG. 2 ), the system controller 12 (as described to “SC” in FIG. 2 ) and a main storage unit (as described to "memory” in FIG. 2 ).
- Each of the CPU chips 10-0 ⁇ 10-3 includes a CPU (not shown in FIG. 2 ), a cache memory (as described to "CM” in FIG.2 ) 16-0 ⁇ 16-3 and a cache tag memory (as described to "TAG” in FIG.2 ) 18-0 ⁇ 18-3.
- the cache tag memory (called to CPU cache tag memory) 18-0 ⁇ 18-3 holds cache tag information indicating the status of the cache memory which is mounted on the same CPU chip.
- the system controller 12 connects to each of the CPU chips 10-0 ⁇ 10-3 via buses LA0 ⁇ LA3. And the system controller 12 connects to the main storage unit 14 via a memory bus LM. Further, the system controller 12 connects to the crossbar switch 30A (30B ⁇ 30D). And the system controller 12 controls the communication interface between the system boards 10-0 to 10-3, controls an access to the main storage unit 14 and controls the communication interface with the other system boards via the crossbar switch.
- the system controller 12 is composed of a LSI circuit.
- the system controller 12 includes a cache tag memory (called to a SC cache tag memory) 20-0 ⁇ 20-3 to store a copy of the cache tag information which is stored in the cache tag memory 18-0 ⁇ 18-3 of the CPU chips 10-0 ⁇ 10-3. Because the system controller 12 has a copy of the tag in the CPU cache memory, it is possible to realize a high-speed cache access by the snoop operation in the SMP system.
- a cache tag memory called to a SC cache tag memory
- the CPU cache tag memory 18-0 ⁇ 18-3 stores registration address (also referred to as frame address or tag) and cache status.
- the status of the cache memory is four status of "M" (Modify), "O” (Owner), "S” (Share) and "I” (Invalid).
- the structure of the cache tag memory is data structure of a set associative scheme.
- the set associative scheme is data storage structure which is consisted of a plurality of tags and is able to store data in different addresses to same entry.
- a part of physical address (PA) is utilized as the address of the cache tag memory 18-0 ⁇ 18-3.
- the example depicts that the system controller 12 is able to connect to up to four CPU chips, but it may be configured to a system that the system controller 12 connect to at least two CPU chips.
- the system board 1A (1B ⁇ 1P) is equipped with four CPU chips, it may be configured to mount at least one CPU chip on the system board 1A.
- FIG. 3 is a block diagram illustrating an example of the configuration of the system board in FIG. 2 .
- the system board 1A includes a plurality of CPU chips 10-0 ⁇ 10-3 and the system controller 12A.
- the plurality of CPU chips 10-0 ⁇ 10-3 which are connected to the system controller 12A via the buses LA0 ⁇ LA3 configures computer group (called as domain) that share the same database.
- the system board 1B includes a plurality of CPU chips 10-4 ⁇ 10-7 and a system controller 12B.
- the plurality of CPU chips 10-4 ⁇ 10-7 which are connected to the system controller 12B via the buses LA4 ⁇ LA7 configures another domain which is different from the domain including the CPU chips 10-0 ⁇ 10-3.
- the main storage unit 14 described in FIG. 2 is mounted on each of the system boards 1A and 1B.
- the plurality of CPU chips 10-0 ⁇ 10-3 in which capacity of cache memory is different from each other are mounted on the system board 1A.
- the plurality of CPU chips 10-4 ⁇ 10-7, in which capacity of the cache memory are same, are mounted on the system board 1B.
- the CPU chip 10-0 on the system board 1A mounts the cache memory of a first capacity
- the CPU chips 10-1 ⁇ 10-3 on the system board 1A mounts the cache memory of the second capacity which has two times of the first capacity.
- the CPU chip 10-0 includes a CPU cache tag memory 18-0 of a first capacity (for example, 2K-LINE (2000 lines)) and the CPU chips 10-0 ⁇ 10-3 includes a CPU cache tag memory 18-1 ⁇ 18-3 of a second capacity (for example, 4K-LINE (4000 lines)) which is twice of the first capacity.
- a first capacity for example, 2K-LINE (2000 lines)
- the CPU chips 10-0 ⁇ 10-3 includes a CPU cache tag memory 18-1 ⁇ 18-3 of a second capacity (for example, 4K-LINE (4000 lines)) which is twice of the first capacity.
- the system controller 12 includes the SC cache tag memory 20-0 ⁇ 20-3 for each of the CPU chips 10-0 ⁇ 10-3 and a cache synchronization control unit 22.
- Each of the SC cache tag memory 20-0 ⁇ 20-3 stores a copy of corresponding CPU cache tag memory 18-0 ⁇ 18-3.
- the memory capacity of each of SC cache tag memory 20-0 ⁇ 20-3 is equal to 2K + 2K (as depicted to "2K, 2K" in FIG. 3 ).
- the SC cache tag memory 20-0 corresponding to the CPU chip 10-0 utilizes half area 2K (for example, 2K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-0.
- the SC cache tag memory 20-1 ⁇ 20-3 corresponding to the CPU chips 10-1 ⁇ 10-3 utilize whole area 2K+2K (for example, 4K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-1 ⁇ 18-3.
- the cache synchronization control unit 22A and 22B when processing consecutive requests which require the same cache address, performs busy control which retries the subsequent request arrived before completing the update of the cache management information (TAG) by an preceding request.
- TAG cache management information
- the cache synchronization control unit 22A monitors the busy range of the SC cache tag memory 20-0 within setting value 22-0 of the first range (BUSY2K) and monitors the busy range of the SC cache tag memory 20-1 ⁇ 20-3 within the setting values 22-1 ⁇ 22-3 of the second range (BUSY4K).
- the system controller 12 sets the monitoring range of busy individually for each of CPU chips, it is possible to improve the performance of the CPU chip in the information processing system which is mixed the CPU chips having the cache memory of different capacity. In particular, it is effective when CPU 10-0 is separated from the CPU 10-1 ⁇ 10-3 by the domain.
- FIG. 4 is an explanatory diagram of address map in the main storage unit in FIG. 2 .
- the storage area of the main storage unit 14 are allocated to, for example, the physical address A0 ⁇ A2047 and B0 ⁇ B2047 in the unit of 256 bytes.
- the physical address PA from the CPU is 39 bits [41:3].
- Frame address (or tag) is defined by the upper 23 bits [41:19] of the physical address PA [41:8] and the entry address is defined by the lower 11 bits [18:8] of the physical address PA [41:8].
- the cache tag memory 20-0 ⁇ 20-3 stores a registration address of 23 bits and cache status (STS [7:0]) of 8 bits.
- the registration address is the upper 23 bits [41:19] of the physical address PA.
- the structure of the physical address cache tag memory 20-0 - 20-3 is adopted to the set associative method described above, a part of the physical address (PA) is used as address of the cache tag memory 20-0 ⁇ 20-3 which stores the registration address and the cache status.
- PA physical address
- FIG. 5 is an explanatory diagram of the SC cache tag memory 20-0 in the CPU chip 10-0 in which the capacity of the cache memory is even according to the embodiment.
- FIG. 5 depicts the relationship between indexes, status, the registration address and physical address of the main memory.
- the SC cache tag memory 20-0 is set to 2K-LINE (that is, indicating the registered line to 2000 lines), so 11-bit physical address PA [18:8] is used as the index.
- the busy monitoring range is set to 11-bit physical address PA [18:8] which is 2K-LINE [18:8].
- the registration address of the physical addresses A0 and B0 in the main memory are same.
- the registration address of the physical addresses A1 and B1 in the main memory are same. Therefore, in the cache memory 16-0, when the data of the physical address A0 is locked, the data of the physical address B0 is also locked.
- the index of the busy control are used the same physical address PA [18:8] which is same as that of which the cache capacity is even.
- the registration address PA [41:19] of the physical addresses A0 and B0 are same even though the cache capacity is double, for example, when the data of the physical address A0 has been locked, the data of the physical address B0 is also locked. Therefore, the physical address B0 which is not necessary to lock originally is also busy monitoring target, so the retry process due to unnecessary busy will be occurred and the processing throughput decreases.
- FIG. 7 is a diagram of the relationship between indexes, status, the registration address and physical address of the main memory in the SC cache tag memory 20-1 ⁇ 21-3 in which the capacity of the cache memory is double according to the embodiment.
- the SC cache tag memory 20-1 ⁇ 20-3 is set 4K-LINE (that is, indicating the registration line to 4000 lines), so 12-bit physical address PA [19:8] is used as the index.
- the busy monitoring range is set to 12-bit physical address PA [19:8] which is 4K-LINE.
- the physical address A0 and the physical address B0 are recognized as a separate entry depending on the top bit "19" of the index for the control busy. Therefore, when the physical address A0 is locked, only the physical address A0 is busy, and it is not determined that physical address B0 is busy. Therefore, the setting construction of the busy range as illustrated in FIG. 7 , decrease the occurrence of busy, reduce the retry frequency, and improve processing throughput, compared to FIG. 6 .
- the firmware implemented in the MMB2 as depicted in FIG. 1 performs to set the busy range.
- FIG. 8 is a block diagram illustrating another example of the configuration of the system board in FIG. 2 .
- a cache synchronous control unit 22A monitors the busy range of the SC cache tag memory 20-0 by the set value 24A of a first range (BUSY 2K) and monitors the busy range of the SC cache tag memory 20-1 ⁇ 20-3 by the set value 24B of a second range (BUSY 4K).
- the cache synchronization control unit 22A is provided to two setting registers 24A and 24B, a selection circuit 28 and a selection instruction register 26 which holds a selection instruction from the MMB 2, instead of providing the setting registers 22-0 ⁇ 22-3 for each cache tag memory 20-0 ⁇ 20-3.
- the selection circuit 28 selects either one of the setting registers 24A and 24B according to the selection instruction from the selection instruction register 28 and the selected one is used to monitor the busy range.
- FIG. 9 is a block diagram illustrating the other example of the system configuration in FIG. 3 .
- each of the system boards 1A ⁇ 1H is equipped with a single CPU chip 10-0 ⁇ 10-7.
- the CPU chip 10-0 on the system board 1A is equipped with the cache memory of the first capacity and the CPU chips 10-1 ⁇ 10-7 on the system boards 1B ⁇ 1H are equipped with the cache memory of the second capacity which is double of the first capacity.
- the CPU chip 10-0 includes the CPU cache tag memory 18-0 having a first capacity (for example, 2K-LINE (2000 lines)) and the CPU chip 10-1 ⁇ 10-7 includes the CPU cache tag memory 18-1 ⁇ 18-7 having a second capacity (for example, 4K-LINE (4000 lines)) which is double with the first capacity.
- a first capacity for example, 2K-LINE (2000 lines)
- the CPU chip 10-1 ⁇ 10-7 includes the CPU cache tag memory 18-1 ⁇ 18-7 having a second capacity (for example, 4K-LINE (4000 lines)) which is double with the first capacity.
- a pair of the system controller 12A and 12B is provides to another board (called to system board) which is provided separately from the system board 1A ⁇ 1H.
- the main storage unit is provided to the system boards 12A and 12B.
- the system controllers 12A and 12B includes SC cache tag memories 20-0 ⁇ 20-7 for each of the CPU chips 10-0 ⁇ 10-7 and the cache synchronization control unit 22.
- Each of the SC cache tag memories 20-0 ⁇ 20-7 stores a copy of the CPU cache tag memory 18-0 ⁇ 18-7.
- the memory capacity in each of the SC cache tag memory 20-0 ⁇ 20-7 are same.
- the symbol "2K" in FIG. 9 indicates the memory capacity.
- the SC cache tag memory 20-0 corresponding to the CPU chip 10-0 uses half area 2K (for example, 2K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-0.
- the SC cache tag memories 20-1 ⁇ 20-7 corresponding to the CPU chips 10-1 ⁇ 10-7 use whole area 2K+2K (for example, 4K-LINE) in the SC cache tag memories 20-1 ⁇ 21-7 to the copy area of the CPU cache tag memories 18-1 ⁇ 18-7.
- the cache synchronous control unit 22A monitors the busy range of the SC cache tag memory 20-0 by the set value 22-0 of a first range (BUSY 2K) and monitors the busy range of the SC cache tag memory 20-1 ⁇ 20-3 by the set values 22-1 ⁇ 22-3 of a second range (BUSY 4K).
- the cache synchronization control unit 22B monitors the busy range of the SC cache tag memories 20-4 ⁇ 20-7 by the set values of the second range (BUSY4K).
- FIG. 10 is a block diagram illustrating a configuration of a system controller according to the embodiment.
- FIG. 11 is an explanatory diagram of a busy setting register in a register unit in FIG. 10 .
- same elements as those described in FIG. 1 to FIG. 4 are indicated with the same symbols.
- the system controller 12 includes a command control unit 40, a pipeline unit 42 including a result decision unit 52, a CPU interface unit 44 for each of the CPU chips 10-0 ⁇ 10-3, a memory interface unit 46 for the main storage unit (memory) 14, a cache synchronization mechanism 22, and cache tag memory control units (as described to cache tag memory cont in FIG. 10 ) 21-0 to 21-3.
- the cache synchronization mechanism 22 includes a register unit 58, and an address lock register unit 54 (as described to address lock register in FIG. 10 ) and a busy control unit 56.
- the command control unit 40 after stored the command transferred from the CPU of the CPU chips 10-0 ⁇ 10-3 in the command queue, analyzes the destination of the command and outputs the command to the crossbar switch 30A (as referring to FIG.1 ) or the pipeline unit 42 according to the destination which were analyzed.
- the command is configured in a request packet.
- the request packet includes a VAL bit (Valid signal indicating the effectiveness of the request), 39-bit physical address PA [41:3] of the request and a 4-bit CPU number of the request source.
- the index is a 12-bit [19:08] in one portion of the physical address.
- the pipeline unit 42 includes a plurality of series-connected FF (Flip Flop) circuits 50-0 ⁇ 50-n, and performs a time adjustment for waiting of search processing of the cache tag memory control unit 21-0 ⁇ 21-3. That is, the FF circuits 50-0 ⁇ 50-n shift the command to the subsequent FF circuits 50-1 ⁇ 50-n for each time TP.
- the command control unit 40 transfers the command to the pipeline unit 42.
- the FF circuit 50-0 in the first stage of the pipeline unit 42 receives the command from the command control unit 40.
- the command in the FF circuit 50-0 in the first stage of the pipeline unit 42 is transferred to the FF circuit 48-1 in the cache tag memory control unit 21-0 ⁇ 21-3 and the FF circuit 48-2 in the address lock register unit 54 via a signal line S1.
- the cache tag memory control unit 21-0 ⁇ 21-3 includes SC cache tag memories 20-0 ⁇ 20-3 corresponding to each of the CPU chips 10-0 ⁇ 10-3.
- the cache tag memory control unit 20-0 ⁇ 20-3 includes a TAG updating gate 210, a search unit 212 of the cache tag memory 20-0, a FF circuit 213, a FF circuit 214 which holds the upper bits [41:19] of the physical address PA in the command, and a FF circuit 215 for timing adjustment, a comparison circuit (as described to COMP in FIG.10 ) 216, an output FF circuit 217, and a FF circuit 218 for updating TAG.
- the search unit 212 receives the command from the FF circuit 48-1 via the TAG updating gate 210.
- the search unit 212 extracts the index contained in the command and searches the cache tag memory 20-0 - 20-3 by the index.
- the search result of the search unit 212 is held in the FF circuit 213.
- the FF circuit 214 holds the upper bits [41:19] of the physical address PA in the FF circuit 48-1, and outputs to the comparison circuit 216 via the timing adjustment FF circuit 215.
- the comparison circuit 216 compares the search result of the FF circuit 213 with the upper bits PA [41:19] in the FF circuit 215, and outputs a determination result of a cache hit or miss to the output FF circuit 217.
- the address lock register unit 54 includes an address lock unit and an address contention check unit (as described to ADDRESS in FIG.10 ) 7.
- the address lock register unit 54 includes a lock register set gate 512, a FF circuit 514, a lock register 516 which holds the physical address of the command, an output FF circuit 518, and a FF circuit 500 FF which holds a lock signal register set/reset signal.
- the address contention check unit 7, as detailed in FIG. 13 extracts the index [19:8] and the FULL address [41:3] from the physical address in the command, and performs the address contention check in the busy range that has been notified through the signal line S5 from the register unit 58. Then, the address contention check unit 7, when determined an index busy or a full address busy, outputs a retry request to the command control unit 40 via a signal line S6.
- the result decision unit 52 in the pipeline unit 42 receives the cache search result from each of the cache tag control unit 21-0 ⁇ 21-3 and status of the other system board through a signal line S2 and the FF circuits 51-0 ⁇ 51-m for timing adjustment. And the result decision unit 52 decides the transfer destination of the command in the pipeline unit 42 (the FF circuit 50-n of the last stage) from the cache search result from each of the cache tag control unit 21-0 ⁇ 21-3 and status of the other system board and transfers the command to determined destination via the signal line S3.
- the result decision unit 52 when the data of the command is present in either one of the cache memories 16-0 to 16-1 of the CPU chips 10-0 ⁇ 10-3, transfer the command to the CPU chips 10-0 ⁇ 10-3 which is present the data of the command through the signal line S3 and the CPU interface unit 44.
- the result decision unit 52 when the data of the command does not exist in any cache memory 16-0 to 16-1 of the CPU chips 10-0 ⁇ 10-3, transfers the command to the memory 14 through the signal line S3 and the memory interface unit 46.
- the result decision unit 52 when determined the destination of the command, outputs TAG updating signal to the TAG updating FF circuit 218 in each of the cache tag control unit 21-0 ⁇ 21-3 via the signal line S3 and outputs the lock register reset signal to the FF circuit 500 in the address lock register unit 54 through the signal line S3.
- Each of the cache tag control unit 21-0 to 21-3 allows to input the command in the FF circuit 48-1 to the search unit 212 from the TAG updating gate 210.
- the address lock register unit 54 allow to input the command in the FF circuit 48-2 to the lock address register 516 from the address updating gate 512.
- the register unit 58 includes a busy setting changing unit 23 and busy setting registers 22-0 to 22-3.
- the busy setting changing unit 23 sets the setting value of the busy range which has been transferred from the MMB 2 to the busy setting registers 22-0 to 22-3.
- FIG. 11 illustrates the setting value of the busy range (4K_LINE_MODE) and busy status for each of the CPU 0 ⁇ 3 (10-0 ⁇ 10-3) in the busy setting registers 21-0 ⁇ 21-3.
- the busy mode indicates "2K_LINE BUSY" when the 4K_LINE_MODE in the setting value of the busy range is "0”
- the busy mode indicates "4K_LINE_BUSY" when the "4K_LINE_MODE” in the setting value of the busy range is "1".
- the busy control unit 56 includes an address contention check unit 6 (as described to ADDRESS in FIG.10 ). As described below, the address contention check unit 6 performs a contention check of the index PA [19:8] in the command held by each of the FF circuits 50-4 ⁇ 50-n-1 in the pipeline unit 42 that has received via the signal line S4 within the busy range that received from the register unit 58 via a signal line S5. Then, the busy control unit 56, when determined the index busy, outputs a command retry request to the command control unit 40 via the signal line S8 and inputs and suppresses the command that stored in the queue in the command control unit 40 to the pipeline unit 42.
- the address contention check unit 6 performs a contention check of the index PA [19:8] in the command held by each of the FF circuits 50-4 ⁇ 50-n-1 in the pipeline unit 42 that has received via the signal line S4 within the busy range that received from the register unit 58 via a signal line S5. Then, the busy control unit 56, when determined the
- the address lock status in the address lock register 54 is notified to the busy control unit 56 via a signal line S7.
- a setting enable notification of the busy range is notified to the register unit 58 from the busy control unit 56 by a signal line S9.
- FIG. 12 is a block diagram of the address contention check unit in the busy control unit 56.
- the address contention check unit 6 checks the address contention of the index in the pipeline 42.
- the address contention check unit 6 targets all address which match the index as check and performs the address check for the address there is a possibility of update in a period prior to the determination of the address of the update destination in the cache tag memory 20-0 ⁇ 20-3. Thereby, it is prevented that the subsequent request accesses to the address, which may be updated thereafter, in the cache tag memory 20-0 ⁇ 20-3.
- the address contention check unit 6 includes two comparison circuits 60 and 62, two AND circuits 63 and 64, one OR circuit 65 (OR), and a selection circuit 66.
- a first comparison circuit 60 compares TP04_ index PA [19:8] (that is, a second busy range), of which the FF circuit 50-4 (timing TP4) holds, with TPxx_ index PA [19:8], of which the FF circuits 50-5 ⁇ 50-n-1 connected to later stage of the FF circuit 50-4, hold.
- a second comparison circuit 62 compares TP04_ index PA [18:8] (that is, a first busy range), of which the FF circuit 50-4 (timing TP04) holds, with TPxx_ index PA [18:8], of which the FF circuits 50-5 ⁇ 50-n-1 connected to later stage of the FF circuit 50-4, hold.
- the check timing is TP04 and the check targets are requests that are present in subsequent stages of the TP 04 (the FF circuit 50-4).
- the first AND circuit 63 calculates AND (logical product) of 4K_LINE_MODE signal of 4 bits (the setting value of busy range) as depicted in FIG. 11 and the comparison result (match / mismatch) of the comparison circuit 60.
- the second AND circuit 64 calculates AND (logical product) of an inverted signal of 4K_LINE_MODE signal of 4 bits (the setting value of busy range) as depicted in FIG. 11 and the comparison result (match / mismatch) of the comparison circuit 62.
- the OR circuit 65 calculates logical sum of the results of the operation of both of the AND circuits 63 and 64.
- the selection circuit 66 selects one of the 4-bit arithmetic result of the OR circuit 65 according to the CPU number TP04_CPU [3:0] from the FF circuit 50-4.
- the first comparison circuit 60 checks for contentions in the second busy range and the second comparison circuit 62 checks for contentions in the first busy range.
- the first AND circuit 63 takes out the contention result of the second busy range and the second AND circuit 64 takes out the contention result of the first busy range.
- the selection circuit 66 selects the contention result of the CPU corresponding to the request from the contention results of the first busy range and the second busy range through the OR circuit 65.
- the selection circuit 66 outputs an index busy signal to the command control unit 40 through the signal line S8 in FIG. 10 .
- the index busy is indicated when the index busy signal is "1”
- not index busy is indicated when the index busy signal is "0”.
- the address contention check unit 6 checks the address contention with the preceding request in the pipeline unit 42 and controls the command control unit 40.
- FIG. 13 is a block diagram of the address contention check unit 7 in the address lock register unit 54. As depicted in FIG. 13 , the address contention check unit 7 includes an index contention check unit 7A and a full address contention check unit 7B.
- the index contention check unit 7A determines that the request which is set to the address lock register 516 is a target of the check in a check timing TP01, and performs the contention check for the address of the request source, from the determination of the address in the update destination in the address cache management information (TAG) until completion of processing of the CPU.
- the address of the storing destination is specified by the index, when the data requested by another CPU is transferred to the CPU itself and is stored in the cache memory of the own CPU. Therefore, it is possible to prevent that the subsequent request accesses the address which is stored the requested data.
- the index contention check unit 7A includes two comparison circuits 70 and 72, two AND circuits 73 and 74, one OR circuit 75 and a selection circuit 76.
- the first comparison circuit 70 compares TP01_ index PA [19:8] (a second busy range), in which the FF circuit 50-1 (timing TP01) holds, with REG_ADRS [19:8] in which the address lock register 516 holds.
- the second comparison circuit 72 compares TP01_ index PA [18:8] (a first busy range), in which the FF circuit 50-1 (timing TP01) holds, with REG_ADRS [18:8] in which the address lock register 516 holds.
- the check timing is TP01
- the check target is a request after timing TP01 (the FF circuit 50-1).
- the first AND circuit 73 calculates AND (logical product) of 4K_LINE_MODE signal of 4 bits (the setting value of the busy range) as depicted in FIG. 11 with the comparison result (match / mismatch) of the comparison circuit 70.
- the second AND circuit 74 calculates AND (logical product) of an inverted signal of the 4K_LINE_MODE signal of 4 bits (the setting value of the busy range) as depicted in FIG. 11 and the comparison result (match / mismatch) of the comparison circuit 72.
- the OR circuit 75 calculates logical sum of the results of the operation of both of the AND circuits 73 and 74.
- the selection circuit 76 selects one of the 4-bit arithmetic result of the OR circuit 75 according to the CPU number TP04_CPU [3:0] from the FF circuit 50-1.
- the first comparison circuit 70 checks for contentions in the second busy range and the second comparison circuit 72 checks for contentions in the first busy range. And the first AND circuit 73 takes out the contention result of the second busy range and the second AND circuit 74 takes out the contention result of the first busy range.
- the selection circuit 76 selects the contention result of the CPU corresponding to the request from the contention results of the first busy range and the second busy range through the OR circuit 75.
- the selection circuit 76 outputs an index busy signal to the command control unit 40 through the signal line S6 in FIG. 10 .
- the index busy is indicated when the index busy signal is "1”
- not index busy is indicated when the index busy signal is "0”.
- the full address contention check unit 7B checks the address contention of the subsequent request and the address in the address lock register 516.
- the full address contention check unit 7B performs the address check for the address there is a possibility of update in a period from the determination of the address of the update destination in the cache management information (TAG) tag memory 20-0 ⁇ 20-3 until the completion of the processing of the CPU. Because the address of the update destination is specified and the address is stored in the address lock register, it is prevented that the subsequent request accesses to the address, which is processing in the cache tag memory 20-0 ⁇ 20-3.
- TAG cache management information
- the full address contention check unit 7B includes a comparison circuit 78 that compares the full address TP01_PA [41:0] of which the FF circuit 50-1 (timing TP01) holds with the full address REG_ADRS [41:0] of which the address lock register 516 holds.
- the comparison circuit 78 outputs a full address busy signal to the command control unit 40 through the signal line S6 in FIG. 10 .
- the address lock register unit 54 requests re-enter of the request to the command control unit 40.
- the contention check is performed between the request address which is set in the address lock register 516 and the subsequent requests.
- FIG. 14 is a diagram for explaining the operation of the address contention check in the busy control unit 56 described in FIG. 10 and FIG. 12 .
- FIG. 15 is a diagram for explaining the operation of the address contention check in the address lock address register unit 54 as described in FIG. 10 and FIG. 13 .
- a horizontal axis indicates a time and a vertical axis indicates operations of the registers to be checked and the FF circuit (indicated by "TP01" ⁇ "TPnn” in the Figures).
- the MMB 2 sets the busy mode of each CPUs to the busy setting registers 22-0 to 22-3 in the register unit 58 (referring to FIG. 11 ).
- the 4K_LINE_MODE is set to "0" in 2K_LINE BUSY mode
- the 4K_LINE BUSY mode is set to "1" in 4K_LINE BUSY mode.
- the command control unit 40 enters the request received from the CPU 10-0 ⁇ 10-3 into the pipeline unit 42.
- the requests that have been entered, is also inputted to the cache tag memory control unit 21-0 ⁇ 21-3 and the address lock register unit 54.
- the requests in the pipeline unit 42 reach to the result decision unit 52.
- the request includes VAL bit (Valid signal indicating the effectiveness request).
- VAL bit Value signal indicating the effectiveness request.
- the index is defined 12-bit [19:08] which is one part of the physical address.
- the address contention check unit 6 in the busy control unit 56 performs the address contention check between one request and a preceding request which precedes the one request in the pipeline unit 42.
- the busy check is performed by changing the busy range for each cache tag memory capacity (CPU).
- the busy control unit 56 When it is determined that there is a address contention with the preceding request in the pipeline unit 42 by a result of the address contention check in the busy control unit 56, the busy control unit 56 requests re-enter of the request to the command control unit 40. And, when it is determined that there is not the address contention of the request and the result decision unit 53 determines that update of the TAG (cache management information) is necessary, the busy control unit 56 updates the TAG and sets the full address of the processing target request to the lock register 516.
- FIG. 14 illustrates an example of the address contention when reaching the subsequent request (dotted line) in TP 04 (FF 50-4) in a state that the preceding request (solid line) is present later TP 05 (FF 50-5).
- the example depicts that the indexes of the both of the preceding request and the subsequent request are matched.
- the index address contention check of the pipeline in the contention check unit 6 determines index busy (depicted by "CHK" in the dotted circle in Fig. 14 ).
- the subsequent request (dotted line) is retry processed by the command control unit 40 after determination of the index busy.
- the subsequent request which has been determined the index busy is erased in the pipeline unit 42. Or, the subsequent request is added to a flag indicating the busy and is transferred to the result decision unit 52, and the result decision unit 52 erases the subsequent request.
- FIG. 15 is a timing chart of contention check of the full address/index in the address lock register 54.
- a solid line indicates the preceding request
- a dotted line indicates the subsequent request
- a thick line indicates the completion request of CPU processing.
- the preceding request (the solid line) is determined the result of the contention (here, the determination of no contention) at TPnn stage and the update address of the cache tag information TAG is determined. Then, the TAG in the cache tag memory is updated and the updated address is set to the address lock register unit 54 by the preceding request at time TPnn +2. And updating the TAG and the set of updated address to the address lock register 54 are performed to extract the required information from the information contained in the preceding request packet.
- the contention check unit 7 in the above-mentioned address lock register unit 54 is performed the full address contention.
- the contention check unit 7 determines to the full address busy. After this, the subsequent request is retry processed by the command control unit 40.
- the contention check unit 7 performs the index contention check between the request in the address lock register 516 and the subsequent request at the stage TP01.
- contention check unit 7 uses the index or the full address to perform the contention check depend on the command of the request. In other words, even one of the index and the full address may be used, or both of the index and the full address may be used to the contention check.
- the result decision unit 52 determines the request as a reset of the address lock register, and the address lock register 516 is reset by the result of the determination at the stage TPnn+2.
- FIG. 16 is a flow diagram of a cache synchronization process according to the embodiment.
- (S10) A firmware program installed in the MMB 2, when receiving the start instruction of the information processing system, turns off the power of the system, then turns on the power of the system.
- the firmware program in the MMB2 performs processing of initialization of the system board 1A (1B ⁇ 1P) after power-on.
- the firmware program in the MMB 2 obtains specification information for the CPU 10-0 ⁇ 10-3, which are mounted on the system board 1A (1B ⁇ 1P), and sets the busy range values in the busy range setting registers 22-0 ⁇ 22-3 in the system controller 12 based on the specification information.
- the firmware program in the MMB2 allows the operation of the system board 1A (1B ⁇ 1P) (as depicted as mode 1 in FIG. 16 ).
- the firmware program of the MMB2 performs an initialization process after a reboot of the system board 1A (1B ⁇ 1P).
- the firmware program in the MMB 2 obtains specification information for the CPU 10-0 ⁇ 10-3, which are mounted on the system board 1A (1B ⁇ 1P), and sets the busy range values in the busy range setting registers 22-0 ⁇ 22-3 in the system controller 12 based on the specification information.
- the firmware program in the MMB2 allows the operation of the system board 1A (1B ⁇ 1P) (as depicted as mode 2 in FIG. 16 ).
- the firmware program of the MMB 2 sets and changes the busy range value in the busy range setting registers during the initialization process of the system board after power on and according to the reboot.
- the busy control unit 56 confirms that following conditions (1) and (2) are established, then notifies a changeable notification of the busy range setting to the register unit 58 via the signal line S9.
- the busy setting changing unit 23 in the register unit 58 after receiving the changeable notification of the busy range setting from the busy control unit 56, performs the request of change of the busy range setting from the MMB 2 to the busy setting registers 22-0 ⁇ 22-3. In other words, the busy setting changing unit 23 performs change of setting of the busy monitoring range in a state that there is not a command during the processing in the address lock register unit 54.
- FIG. 17 is a flow diagram of setting change processing of the busy range according to another embodiment.
- the firmware program in the MMB 2 performs processing of initialization the system board 1A (1B ⁇ 1P) after power on.
- the firmware program in the MMB 2 obtains specification information for the CPU 10-0 ⁇ 10-3, which are mounted on the system board 1A (1B ⁇ 1P), and sets the busy range values in the busy range setting registers 22-0 ⁇ 22-3 in the system controller 12 based on the specification information.
- the firmware program in the MMB2 allows the operation of the system board 1A (1B ⁇ 1P).
- the MMB 2 receives the request of the setting change of the busy range during the operation.
- the busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ⁇ TP03 in the pipeline unit 42 and then notifies the changeable notification of the busy range setting to the register unit 58.
- the busy setting changing unit 23 in the register unit 58 performs the change request of the busy range setting from the MMB 2 to the busy setting registers 22-0 ⁇ 22-3.
- the busy setting changing unit 23 notifies the setting value of the busy range to the busy control unit 56 and the address lock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic.
- the busy control unit 56 after changing the busy range setting, performs control to resume the entering of the command to the pipeline unit 42 for the command control unit 40 and becomes a state of release of the suspend. Then, according to the completion of dynamic reconfiguration processing of the firmware program in the MMB 2, the suspend state is released, and the dynamic configuration change process is terminated.
- the busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ⁇ TP03 in the pipeline unit 42 and then notifies the changeable notification of the busy range setting to the register unit 58.
- the busy setting changing unit 23 in the register unit 58 performs the change request of the busy range setting from the MMB 2 to the busy setting registers 22-0 ⁇ 22-3.
- the busy setting changing unit 23 notifies the setting value of the busy range to the busy control unit 56 and the address lock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic.
- the busy control unit 56 after changing the busy range setting, performs control to resume the entering of the command to the pipeline unit 42 for the command control unit 40 and becomes a state of release of the suspend.
- the busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ⁇ TP03 in the pipeline unit 42 and then notifies the changeable notification of the busy range setting to the register unit 58.
- the busy setting changing unit 23 in the register unit 58 performs the change request of the busy range setting from the MMB 2 to the busy setting registers 22-0 ⁇ 22-3.
- the busy setting changing unit 23 notifies the setting value of the busy range to the busy control unit 56 and the address lock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic.
- a combination of even and double capacities of the cache memories are depicted by an example, it is applied to a mixed configuration of the CPUs that have one times capacity and n (n> 2) times capacity of cache memories. In that case, because the occurrence rate of the busy falls to 1/n times, it is possible to further improve the throughput.
- the system controller has been described to the system board which mounts a plurality of CPU chips in the example, but the system controller may be applied to mount a controller board which is connected to the plurality of system boards.
- system controller connects to a plurality of CPU units which has a cache memory of a different capacity each other and controls cache synchronization, since the monitoring range of contention between a preceding request and a subsequent request are set for each capacity of the cache memory, it is possible to improve the throughput of the CPU unit which has a large cache capacity.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
In a system including a plurality of CPU units (10-0 ~ 10-3) having a cache memory (16-0 ~ 16-3) of different capacity each other and a system controller (12) that connects to the plurality of CPUs and controls cache synchronization, the system controller (12) includes a cache synchronization unit (54) which monitors an address contention between a preceding request and a subsequent request and a setting unit (58) which sets different monitoring range of the contention between the preceding request and the subsequent request for each capacity of the cache memory in each of the CPU units.
Description
- The embodiments discussed herein are related to information processing system and a system controller.
- Parallel processing is effective to improve the processing speed of the information processing system. In parallel processing system, a plurality of processing units (CPU: Central Processing Unit) share the process. In addition, in order to improve the processing speed of the CPU, a cache memory) is provided between the CPU and a main memory. The cache memory holds data, its address and status of which the CPU would access among the data stored in the main memory, and is configured of a high-speed and a small-capacity memory. The cache memory executes an input and output of the data behalf of the main memory of which the CPU originally accesses.
- Because the cache memory automatically stores the data and performs an alternative operation of the main memory, program of the CPU is no need to be aware of the cache memory. In recent years, according to an improvement of integration of LSI (Large Scale Integrated) and an increase in the demand speed of the device, the cache memory is provided in the CPU chip.
- In SMP (Symmetric Multi-Processing) system among the parallel processing system, one CPU performs a snoop to search contents which are registered in the cache memory of the other CPU. To prevent interference of the cache memory which is caused the snoop between the CPUs, synchronization mechanism of the cache memory of the CPU is provided.
- When processing consecutive requests to the same cache address, the synchronization mechanism of the cache memory retries a subsequent request which arrives before an update of the cache management information on an preceding request is not completed. This control is referred to as busy control. The SMP system has set a uniform monitoring range of the busy to all CPU chips.
- [Patent Document 1] Japanese Laid-Open Patent Publication No.
2008-123333 - The CPU chip of which the capacity of the cache memory has been increased are provided for improving of the CPU performance. And it is effective to add new CPU chip in order to improve the performance of existing information processing system. For example, a CPU chip having a larger capacity of the cache memory is added to the existing system which is composed of a CPU chip having a smaller capacity of the cache memory. Or a CPU chip having a smaller capacity of the cache memory is added to the existing system which is composed of a CPU chip having a larger capacity of the cache memory. In this way, there is a high need of the technology in the system operation to reduce unnecessary costs by adding a CPU chip required for processing.
- In a case of adding the CPU chip later to existing information processing system, there is a possibility to mix the CPU chips which has a different capacity of the cache memory. In a SMP environment, a system controller that is connected to the plurality of CPU chips has the cache synchronization mechanism. When connecting the plurality of CPU chips which have the cache memory of different capacity to one system controller, in order to take the integrity of the TAG between the plurality of CPUs, the system controller has set same monitoring range of the busy for the plurality of CPU chips.
- However, when the CPU chips with different cache size are mixed in the system, it is fair to draw sufficiently the performance of the CPU having a cache memory of larger capacity, because the monitoring range of the busy in all CPUs is set the value in accordance with the CPU chip having a cache memory of smallest capacity.
- Accordingly, it is an object in one aspect of the invention to provide information processing system and a system controller for improving the performance of the CPUs in the information processing system in which CPU chips including a cache memory of different capacity each other are mixed.
- According to an aspect of the embodiments, information processing system includes a first CPU unit having a first CPU and a first cache memory to store cache tag information and cache data, a second CPU unit having a second CPU and a second cache memory that stores cache tag information and cache data and has a different capacity from the capacity of the first cache memory, and a system controller that is connected to the first CPU unit and the second CPU unit and searches a third cache memory that stores a copy of the cache tag information in the first cache memory and a fourth cache memory that stores a copy of the cache tag information in the second cache memory according to a request of the first cache memory and the second cache memory from the first CPU unit and the second CPU unit. The system controller includes a cache synchronization unit that monitors whether or not preceding request and subsequent request requires same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request and a setting unit that sets different monitoring range of the busy between the third cache tag memory and a fourth cache tag memory to the cache synchronization unit.
- Further, the system controller that is connected to a first CPU unit having a first CPU and a first cache memory to store cache tag information and cache data and a second CPU unit having a second CPU and a second cache memory that stores cache tag information and cache data and has a different capacity from the capacity of the first cache memory, includes a cache tag search unit that searches a third cache memory that stores a copy of the cache tag information in the first cache memory and a fourth cache memory that stores a copy of the cache tag information in the second cache memory according to a request of the first cache memory and the second cache memory from the first CPU unit and the second CPU unit. a cache synchronization unit that monitors whether or not preceding request and subsequent request requires same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request and a setting unit that sets different monitoring range of the busy between the third cache tag memory and a fourth cache tag memory to the cache synchronization unit.
- When the system controller connects to the plurality of CPU units of which has a cache memory of different capacity each other and controls cache synchronization, because monitoring range of contention between the preceding request and the subsequent request are set for each capacity of the cache memory of each CPU unit, it is possible to improve throughput of the CPU unit having the cache memory of larger capacity.
-
-
FIG. 1 is a block diagram of information processing system according to an embodiment; -
FIG. 2 is a block diagram of a system board inFIG. 1 ; -
FIG. 3 is a block diagram illustrating an example of the configuration of the system board inFIG. 2 ; -
FIG. 4 is an explanatory diagram of memory address of main memory inFIG. 2 ; -
FIG. 5 is an explanatory diagram of busy monitoring range of the cache memory having a first capacity of the embodiment inFIG. 3 ; -
FIG. 6 is an explanatory diagram of comparative example which is adopted the busy monitoring range inFIG. 5 to a busy monitoring range of a cache memory having a second capacity; -
FIG. 7 is an explanatory diagram of busy monitoring operation to the cache memory having a second capacity according to the embodiment inFIG. 3 ; -
FIG. 8 is a block diagram of another example of the configuration of the system board inFIG. 2 ; -
FIG. 9 is a block diagram illustrating another example of the system configuration ofFIG. 3 ; -
FIG. 10 is a block diagram illustrating a configuration of a system controller according to the embodiment; -
FIG. 11 is a diagram of explanation of busy setting register in a register unit inFIG. 10 ; -
FIG. 12 is a block diagram of an address contention checking unit in a busy control unit inFIG. 10 ; -
FIG. 13 is a block diagram of an address contention checking unit in an address lock register unit inFIG. 10 ; -
FIG. 14 is an explanatory diagram of an operation of the address contention check in the busy control unit inFIG. 10 andFIG. 12 ; -
FIG. 15 is an explanatory diagram of an operation of the address contention check in the address lock register unit inFIG. 10 andFIG. 13 ; -
FIG. 16 is a flow diagram of a cache synchronization process according to the embodiment; and -
FIG. 17 is a flow diagram of dynamic cache synchronization process according to another embodiment. - Hereinafter, the embodiments will be described in the order of information processing system, a cache synchronization control of the information processing system, a system controller, an address contention check unit, a cache synchronization process, a dynamic cache synchronization process, other embodiments, however, the information processing system and the system controller are not limited to a configuration in these embodiments.
-
FIG. 1 is a block diagram of information processing system according to an embodiment.FIG. 2 is a block diagram of a system board inFIG. 1 . The information processing system inFIG. 1 illustrates a server system connected with a computer network. InFIG. 1 , theserver system 1 includes a plurality of system boards (SB: System Boards) 1A ~ 1P as processing devices, a management board (MMB) 2 as a system controller (SVP: Service Processor) and crossbar switches (or switches) 30A ~ 30D. - In the embodiment of
FIG. 1 , afirst switch board 3A mounts afirst crossbar switch 30A and asecond crossbar switch 30B. And thefirst crossbar switch 30A connects to thesecond crossbar switch 30B via buses L1 and L2. Further, asecond switch board 3B mounts athird crossbar switch 30C and afourth crossbar switch 30D. And thethird crossbar switch 30C connects to thefourth crossbar switch 30D via buses L3 and L4. - In addition, the
first crossbar switch 30A connects to thefourth crossbar switch 30D via buses L9 and L10. And thesecond crossbar switch 30B connects to thethird crossbar switch 30C via buses L11 and L12. - Each of the first, second, third and
fourth system boards first crossbar switch 30A via buses L20, L21, L22 and L23. Each of the fifth, sixth, seventh andeighth system boards second crossbar switch 30B via buses L24, L25, L26 and L27. - Each of the ninth, tenth, eleventh and
twelfth system boards third crossbar switch 30C via buses L28, L29, L30 and L31. Each of the thirteenth, fourteenth, fifteenth andsixteenth system boards fourth crossbar switch 30D via buses L32, L33, L34 and L35. - The management board (hereinafter referred to MMB) 2 connects to each of the
system boards 1A ~ 1P via internal buses L40 and L42. TheMMB 2 monitors the status, sets status, controls start and stop of eachsystem boards 1A ~ 1P. - In
FIG. 1 , the server system is illustrated by an example of configuration of sixteen unit of thesystem boards 1A ~ 1P and four unit ofcrossbar switches 30A ~ 30D. However, the number of the system boards and the number of the crossbar switch are not limited to 16 units and 4 units. - The configuration of the
system boards 1A ~ 1P will be explained by usingFIG. 2 . As illustrated inFIG. 2 , each of thesystem boards 1A (1B ~1P) includes a plurality of CPU chips 10-0 ~ 10-3 (as described to "CPU" inFIG. 2 ), the system controller 12 (as described to "SC" inFIG. 2 ) and a main storage unit (as described to "memory" inFIG. 2 ). Each of the CPU chips 10-0 ~ 10-3 includes a CPU (not shown inFIG. 2 ), a cache memory (as described to "CM" inFIG.2 ) 16-0 ~ 16-3 and a cache tag memory (as described to "TAG" inFIG.2 ) 18-0 ~ 18-3. The cache tag memory (called to CPU cache tag memory) 18-0 ~ 18-3 holds cache tag information indicating the status of the cache memory which is mounted on the same CPU chip. - The
system controller 12 connects to each of the CPU chips 10-0 ~ 10-3 via buses LA0 ~ LA3. And thesystem controller 12 connects to themain storage unit 14 via a memory bus LM. Further, thesystem controller 12 connects to thecrossbar switch 30A (30B ~ 30D). And thesystem controller 12 controls the communication interface between the system boards 10-0 to 10-3, controls an access to themain storage unit 14 and controls the communication interface with the other system boards via the crossbar switch. For example, thesystem controller 12 is composed of a LSI circuit. - The
system controller 12 includes a cache tag memory (called to a SC cache tag memory) 20-0 ~20-3 to store a copy of the cache tag information which is stored in the cache tag memory 18-0 ~ 18-3 of the CPU chips 10-0 ~ 10-3. Because thesystem controller 12 has a copy of the tag in the CPU cache memory, it is possible to realize a high-speed cache access by the snoop operation in the SMP system. - The CPU cache tag memory 18-0 ~ 18-3 stores registration address (also referred to as frame address or tag) and cache status. The status of the cache memory is four status of "M" (Modify), "O" (Owner), "S" (Share) and "I" (Invalid). The structure of the cache tag memory is data structure of a set associative scheme. The set associative scheme is data storage structure which is consisted of a plurality of tags and is able to store data in different addresses to same entry. In addition, a part of physical address (PA) is utilized as the address of the cache tag memory 18-0 ~ 18-3.
- In the embodiment, the example depicts that the
system controller 12 is able to connect to up to four CPU chips, but it may be configured to a system that thesystem controller 12 connect to at least two CPU chips. Although described in the example that thesystem board 1A (1B ~ 1P) is equipped with four CPU chips, it may be configured to mount at least one CPU chip on thesystem board 1A. -
FIG. 3 is a block diagram illustrating an example of the configuration of the system board inFIG. 2 . InFIG. 3 , same elements as that illustrated inFIG. 1 andFIG. 2 are indicated by the same symbols. InFIG. 3 , thesystem board 1A includes a plurality of CPU chips 10-0 ~ 10-3 and thesystem controller 12A. The plurality of CPU chips 10-0 ~ 10-3 which are connected to thesystem controller 12A via the buses LA0 ~ LA3 configures computer group (called as domain) that share the same database. - The
system board 1B includes a plurality of CPU chips 10-4 ~ 10-7 and asystem controller 12B. The plurality of CPU chips 10-4 ~ 10-7 which are connected to thesystem controller 12B via the buses LA4 ~ LA7 configures another domain which is different from the domain including the CPU chips 10-0 ~ 10-3. In addition, themain storage unit 14 described inFIG. 2 is mounted on each of thesystem boards - The plurality of CPU chips 10-0 ~ 10-3 in which capacity of cache memory is different from each other are mounted on the
system board 1A. The plurality of CPU chips 10-4 ~ 10-7, in which capacity of the cache memory are same, are mounted on thesystem board 1B. In the embodiment, the CPU chip 10-0 on thesystem board 1A mounts the cache memory of a first capacity and the CPU chips 10-1 ~ 10-3 on thesystem board 1A mounts the cache memory of the second capacity which has two times of the first capacity. - In accordance with the difference of the capacity of the cache memory, the CPU chip 10-0 includes a CPU cache tag memory 18-0 of a first capacity (for example, 2K-LINE (2000 lines)) and the CPU chips 10-0 ~ 10-3 includes a CPU cache tag memory 18-1 ~ 18-3 of a second capacity (for example, 4K-LINE (4000 lines)) which is twice of the first capacity.
- The
system controller 12 includes the SC cache tag memory 20-0 ~ 20-3 for each of the CPU chips 10-0 ~ 10-3 and a cachesynchronization control unit 22. - Each of the SC cache tag memory 20-0 ~ 20-3 stores a copy of corresponding CPU cache tag memory 18-0 ~ 18-3. The memory capacity of each of SC cache tag memory 20-0 ~ 20-3 is equal to 2K + 2K (as depicted to "2K, 2K" in
FIG. 3 ). - The SC cache tag memory 20-0 corresponding to the CPU chip 10-0 utilizes
half area 2K (for example, 2K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-0. The SC cache tag memory 20-1 ~ 20-3 corresponding to the CPU chips 10-1 ~ 10-3 utilizewhole area 2K+2K (for example, 4K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-1 ~ 18-3. - The cache
synchronization control unit - The cache
synchronization control unit 22A monitors the busy range of the SC cache tag memory 20-0 within setting value 22-0 of the first range (BUSY2K) and monitors the busy range of the SC cache tag memory 20-1 ~ 20-3 within the setting values 22-1 ~ 22-3 of the second range (BUSY4K). - In this way, because the
system controller 12 sets the monitoring range of busy individually for each of CPU chips, it is possible to improve the performance of the CPU chip in the information processing system which is mixed the CPU chips having the cache memory of different capacity. In particular, it is effective when CPU 10-0 is separated from the CPU 10-1 ~ 10-3 by the domain. - In more detail, the cache busy control in the system controller according to the embodiment will be explained.
FIG. 4 is an explanatory diagram of address map in the main storage unit inFIG. 2 . The storage area of themain storage unit 14 are allocated to, for example, the physical address A0 ~ A2047 and B0 ~ B2047 in the unit of 256 bytes. In addition, it is assumed that the physical address PA from the CPU is 39 bits [41:3]. Frame address (or tag) is defined by the upper 23 bits [41:19] of the physical address PA [41:8] and the entry address is defined by the lower 11 bits [18:8] of the physical address PA [41:8]. - As illustrated in
FIG. 5 andFIG. 7 , the cache tag memory 20-0 ~ 20-3 stores a registration address of 23 bits and cache status (STS [7:0]) of 8 bits. The registration address is the upper 23 bits [41:19] of the physical address PA. - Because the structure of the physical address cache tag memory 20-0 - 20-3 is adopted to the set associative method described above, a part of the physical address (PA) is used as address of the cache tag memory 20-0 ~ 20-3 which stores the registration address and the cache status.
-
FIG. 5 is an explanatory diagram of the SC cache tag memory 20-0 in the CPU chip 10-0 in which the capacity of the cache memory is even according to the embodiment.FIG. 5 depicts the relationship between indexes, status, the registration address and physical address of the main memory. The SC cache tag memory 20-0 is set to 2K-LINE (that is, indicating the registered line to 2000 lines), so 11-bit physical address PA [18:8] is used as the index. Then, the busy monitoring range is set to 11-bit physical address PA [18:8] which is 2K-LINE [18:8]. - In this case, because the upper 23 bits of the physical address PA [41:19] are used as the registration address, the registration address of the physical addresses A0 and B0 in the main memory are same. Similarly, the registration address of the physical addresses A1 and B1 in the main memory are same. Therefore, in the cache memory 16-0, when the data of the physical address A0 is locked, the data of the physical address B0 is also locked.
-
FIG. 6 is an explanatory diagram 20-1 ~ 20-3 of the SC cache tag memory 20-1 ~ 20-3 in the CPU chips 10-1 ~ 10-3 in which the capacity of the cache memory is double when setting the busy to 2K-LINE (registration lines =2000) as same asFIG.5 . - That is, as depicted in
FIG. 6 , even though the cache capacity has been doubled, the index of the busy control are used the same physical address PA [18:8] which is same as that of which the cache capacity is even. In this case, the registration address PA [41:19] of the physical addresses A0 and B0 are same even though the cache capacity is double, for example, when the data of the physical address A0 has been locked, the data of the physical address B0 is also locked. Therefore, the physical address B0 which is not necessary to lock originally is also busy monitoring target, so the retry process due to unnecessary busy will be occurred and the processing throughput decreases. -
FIG. 7 is a diagram of the relationship between indexes, status, the registration address and physical address of the main memory in the SC cache tag memory 20-1 ~ 21-3 in which the capacity of the cache memory is double according to the embodiment. The SC cache tag memory 20-1 ~ 20-3 is set 4K-LINE (that is, indicating the registration line to 4000 lines), so 12-bit physical address PA [19:8] is used as the index. Then, the busy monitoring range is set to 12-bit physical address PA [19:8] which is 4K-LINE. - Accordingly, the physical address A0 and the physical address B0 are recognized as a separate entry depending on the top bit "19" of the index for the control busy. Therefore, when the physical address A0 is locked, only the physical address A0 is busy, and it is not determined that physical address B0 is busy. Therefore, the setting construction of the busy range as illustrated in
FIG. 7 , decrease the occurrence of busy, reduce the retry frequency, and improve processing throughput, compared toFIG. 6 . The firmware implemented in the MMB2 as depicted inFIG. 1 performs to set the busy range. -
FIG. 8 is a block diagram illustrating another example of the configuration of the system board inFIG. 2 . InFIG. 8 , the same elements as those described inFIG. 3 are indicated by the same symbols. As illustrated inFIG. 8 , a cachesynchronous control unit 22A monitors the busy range of the SC cache tag memory 20-0 by theset value 24A of a first range (BUSY 2K) and monitors the busy range of the SC cache tag memory 20-1 ~ 20-3 by theset value 24B of a second range (BUSY 4K). - In this case, the cache
synchronization control unit 22A is provided to two settingregisters selection circuit 28 and aselection instruction register 26 which holds a selection instruction from theMMB 2, instead of providing the setting registers 22-0 ~ 22-3 for each cache tag memory 20-0 ~ 20-3. Theselection circuit 28 selects either one of the setting registers 24A and 24B according to the selection instruction from theselection instruction register 28 and the selected one is used to monitor the busy range. -
FIG. 9 is a block diagram illustrating the other example of the system configuration inFIG. 3 . InFIG. 9 , the same elements as those described inFIG. 3 are indicated by the same symbols. As depicted inFIG. 9 , each of thesystem boards 1A ~ 1H is equipped with a single CPU chip 10-0 ~ 10-7. Also in the embodiment, the CPU chip 10-0 on thesystem board 1A is equipped with the cache memory of the first capacity and the CPU chips 10-1 ~ 10-7 on thesystem boards 1B ~ 1H are equipped with the cache memory of the second capacity which is double of the first capacity. - In accordance with the difference of the capacity of the cache memory, the CPU chip 10-0 includes the CPU cache tag memory 18-0 having a first capacity (for example, 2K-LINE (2000 lines)) and the CPU chip 10-1 ~ 10-7 includes the CPU cache tag memory 18-1 ~ 18-7 having a second capacity (for example, 4K-LINE (4000 lines)) which is double with the first capacity.
- A pair of the
system controller system board 1A ~ 1H. The main storage unit is provided to thesystem boards system controllers synchronization control unit 22. - Each of the SC cache tag memories 20-0 ~ 20-7 stores a copy of the CPU cache tag memory 18-0 ~18-7. The memory capacity in each of the SC cache tag memory 20-0 ~ 20-7 are same. the symbol "2K" in
FIG. 9 indicates the memory capacity. - The SC cache tag memory 20-0 corresponding to the CPU chip 10-0
uses half area 2K (for example, 2K-LINE) in the SC cache tag memory 20-0 to the copy area of the CPU cache tag memory 18-0. The SC cache tag memories 20-1 ~ 20-7 corresponding to the CPU chips 10-1 ~ 10-7 usewhole area 2K+2K (for example, 4K-LINE) in the SC cache tag memories 20-1 ~ 21-7 to the copy area of the CPU cache tag memories 18-1 ~ 18-7. - The cache
synchronous control unit 22A monitors the busy range of the SC cache tag memory 20-0 by the set value 22-0 of a first range (BUSY 2K) and monitors the busy range of the SC cache tag memory 20-1 ~ 20-3 by the set values 22-1 ~ 22-3 of a second range (BUSY 4K). - The cache
synchronization control unit 22B monitors the busy range of the SC cache tag memories 20-4 ~ 20-7 by the set values of the second range (BUSY4K). - In this way, even in a configuration that the
system controllers system boards 1A ~ 1H and each of thesystem boards 1A ~ 1H connects to thesystem controllers -
FIG. 10 is a block diagram illustrating a configuration of a system controller according to the embodiment.FIG. 11 is an explanatory diagram of a busy setting register in a register unit inFIG. 10 . InFIG. 10 , same elements as those described inFIG. 1 to FIG. 4 are indicated with the same symbols. - As depicted in
FIG. 10 , thesystem controller 12 includes acommand control unit 40, apipeline unit 42 including aresult decision unit 52, aCPU interface unit 44 for each of the CPU chips 10-0 ~ 10-3, amemory interface unit 46 for the main storage unit (memory) 14, acache synchronization mechanism 22, and cache tag memory control units (as described to cache tag memory cont inFIG. 10 ) 21-0 to 21-3. Thecache synchronization mechanism 22 includes aregister unit 58, and an address lock register unit 54 (as described to address lock register inFIG. 10 ) and abusy control unit 56. - The
command control unit 40, after stored the command transferred from the CPU of the CPU chips 10-0 ~ 10-3 in the command queue, analyzes the destination of the command and outputs the command to thecrossbar switch 30A (as referring toFIG.1 ) or thepipeline unit 42 according to the destination which were analyzed. In the embodiment, the command is configured in a request packet. - The request packet includes a VAL bit (Valid signal indicating the effectiveness of the request), 39-bit physical address PA [41:3] of the request and a 4-bit CPU number of the request source. The index is a 12-bit [19:08] in one portion of the physical address. In addition, a number of the requesting CPU of the 4-bit (
bit 0~3),bit 0 = 1 indicates a request from the CPU0 (10-0), bit1 = 1 indicates a request from the CPU1 (10-1), bit2 = 1 indicates a request from the CPU2 (10-2), and bit3 = 1 indicates a request from the CPU3 (10-3). - The
pipeline unit 42 includes a plurality of series-connected FF (Flip Flop) circuits 50-0 ~ 50-n, and performs a time adjustment for waiting of search processing of the cache tag memory control unit 21-0 ~ 21-3. That is, the FF circuits 50-0 ~ 50-n shift the command to the subsequent FF circuits 50-1 ~ 50-n for each time TP. Thecommand control unit 40 transfers the command to thepipeline unit 42. The FF circuit 50-0 in the first stage of thepipeline unit 42 receives the command from thecommand control unit 40. The command in the FF circuit 50-0 in the first stage of thepipeline unit 42 is transferred to the FF circuit 48-1 in the cache tag memory control unit 21-0 ~ 21-3 and the FF circuit 48-2 in the addresslock register unit 54 via a signal line S1. - The cache tag memory control unit 21-0 ~ 21-3 includes SC cache tag memories 20-0 ~ 20-3 corresponding to each of the CPU chips 10-0 ~ 10-3. In addition, the cache tag memory control unit 20-0 ~ 20-3 includes a
TAG updating gate 210, asearch unit 212 of the cache tag memory 20-0, a FF circuit 213, aFF circuit 214 which holds the upper bits [41:19] of the physical address PA in the command, and aFF circuit 215 for timing adjustment, a comparison circuit (as described to COMP inFIG.10 ) 216, anoutput FF circuit 217, and aFF circuit 218 for updating TAG. - In each of the cache tag memory control unit 21-0 ~ 21-3, the
search unit 212 receives the command from the FF circuit 48-1 via theTAG updating gate 210. Thesearch unit 212 extracts the index contained in the command and searches the cache tag memory 20-0 - 20-3 by the index. The search result of thesearch unit 212 is held in the FF circuit 213. - Also, in the cache tag memory control unit 21-0 to 21-3, the
FF circuit 214 holds the upper bits [41:19] of the physical address PA in the FF circuit 48-1, and outputs to thecomparison circuit 216 via the timingadjustment FF circuit 215. Thecomparison circuit 216 compares the search result of the FF circuit 213 with the upper bits PA [41:19] in theFF circuit 215, and outputs a determination result of a cache hit or miss to theoutput FF circuit 217. - The address
lock register unit 54 includes an address lock unit and an address contention check unit (as described to ADDRESS inFIG.10 ) 7. The addresslock register unit 54 includes a lock register setgate 512, aFF circuit 514, alock register 516 which holds the physical address of the command, anoutput FF circuit 518, and aFF circuit 500 FF which holds a lock signal register set/reset signal. In addition, the addresscontention check unit 7, as detailed inFIG. 13 , extracts the index [19:8] and the FULL address [41:3] from the physical address in the command, and performs the address contention check in the busy range that has been notified through the signal line S5 from theregister unit 58. Then, the addresscontention check unit 7, when determined an index busy or a full address busy, outputs a retry request to thecommand control unit 40 via a signal line S6. - The
result decision unit 52 in thepipeline unit 42 receives the cache search result from each of the cache tag control unit 21-0 ~ 21-3 and status of the other system board through a signal line S2 and the FF circuits 51-0 ~ 51-m for timing adjustment. And theresult decision unit 52 decides the transfer destination of the command in the pipeline unit 42 (the FF circuit 50-n of the last stage) from the cache search result from each of the cache tag control unit 21-0 ~ 21-3 and status of the other system board and transfers the command to determined destination via the signal line S3. - For example, the
result decision unit 52, when the data of the command is present in either one of the cache memories 16-0 to 16-1 of the CPU chips 10-0 ~ 10-3, transfer the command to the CPU chips 10-0 ~ 10-3 which is present the data of the command through the signal line S3 and theCPU interface unit 44. In addition, theresult decision unit 52, when the data of the command does not exist in any cache memory 16-0 to 16-1 of the CPU chips 10-0 ~ 10-3, transfers the command to thememory 14 through the signal line S3 and thememory interface unit 46. - Further, the
result decision unit 52, when determined the destination of the command, outputs TAG updating signal to the TAG updatingFF circuit 218 in each of the cache tag control unit 21-0 ~ 21-3 via the signal line S3 and outputs the lock register reset signal to theFF circuit 500 in the addresslock register unit 54 through the signal line S3. Each of the cache tag control unit 21-0 to 21-3 allows to input the command in the FF circuit 48-1 to thesearch unit 212 from theTAG updating gate 210. In addition, the addresslock register unit 54 allow to input the command in the FF circuit 48-2 to the lock address register 516 from theaddress updating gate 512. - The
register unit 58 includes a busysetting changing unit 23 and busy setting registers 22-0 to 22-3. The busysetting changing unit 23 sets the setting value of the busy range which has been transferred from theMMB 2 to the busy setting registers 22-0 to 22-3.FIG. 11 illustrates the setting value of the busy range (4K_LINE_MODE) and busy status for each of theCPU 0 ~ 3 (10-0 ~ 10-3) in the busy setting registers 21-0 ~21-3. In this example, the busy mode indicates "2K_LINE BUSY" when the 4K_LINE_MODE in the setting value of the busy range is "0", and the busy mode indicates "4K_LINE_BUSY" when the "4K_LINE_MODE" in the setting value of the busy range is "1". - The
busy control unit 56 includes an address contention check unit 6 (as described to ADDRESS inFIG.10 ). As described below, the addresscontention check unit 6 performs a contention check of the index PA [19:8] in the command held by each of the FF circuits 50-4 ~ 50-n-1 in thepipeline unit 42 that has received via the signal line S4 within the busy range that received from theregister unit 58 via a signal line S5. Then, thebusy control unit 56, when determined the index busy, outputs a command retry request to thecommand control unit 40 via the signal line S8 and inputs and suppresses the command that stored in the queue in thecommand control unit 40 to thepipeline unit 42. - In the embodiment, in order to perform a dynamic change control for the busy range described later, the address lock status in the address lock register 54 is notified to the
busy control unit 56 via a signal line S7. In addition, a setting enable notification of the busy range is notified to theregister unit 58 from thebusy control unit 56 by a signal line S9. - Next, the address
contention check unit 6 in thebusy control unit 56 and the addresscontention check unit 7 in the addresslock register unit 54 will be explained.FIG. 12 is a block diagram of the address contention check unit in thebusy control unit 56. - The address
contention check unit 6 checks the address contention of the index in thepipeline 42. The addresscontention check unit 6 targets all address which match the index as check and performs the address check for the address there is a possibility of update in a period prior to the determination of the address of the update destination in the cache tag memory 20-0 ~ 20-3. Thereby, it is prevented that the subsequent request accesses to the address, which may be updated thereafter, in the cache tag memory 20-0 ~ 20-3. - As depicted in
FIG. 12 , the addresscontention check unit 6 includes twocomparison circuits circuits selection circuit 66. Afirst comparison circuit 60 compares TP04_ index PA [19:8] (that is, a second busy range), of which the FF circuit 50-4 (timing TP4) holds, with TPxx_ index PA [19:8], of which the FF circuits 50-5 ~ 50-n-1 connected to later stage of the FF circuit 50-4, hold. - A
second comparison circuit 62 compares TP04_ index PA [18:8] (that is, a first busy range), of which the FF circuit 50-4 (timing TP04) holds, with TPxx_ index PA [18:8], of which the FF circuits 50-5 ~ 50-n-1 connected to later stage of the FF circuit 50-4, hold. In other words, the check timing is TP04 and the check targets are requests that are present in subsequent stages of the TP 04 (the FF circuit 50-4). - The first AND
circuit 63 calculates AND (logical product) of 4K_LINE_MODE signal of 4 bits (the setting value of busy range) as depicted inFIG. 11 and the comparison result (match / mismatch) of thecomparison circuit 60. The second ANDcircuit 64 calculates AND (logical product) of an inverted signal of 4K_LINE_MODE signal of 4 bits (the setting value of busy range) as depicted inFIG. 11 and the comparison result (match / mismatch) of thecomparison circuit 62. - The OR
circuit 65 calculates logical sum of the results of the operation of both of the ANDcircuits selection circuit 66 selects one of the 4-bit arithmetic result of theOR circuit 65 according to the CPU number TP04_CPU [3:0] from the FF circuit 50-4. - That is, the
first comparison circuit 60 checks for contentions in the second busy range and thesecond comparison circuit 62 checks for contentions in the first busy range. And the first ANDcircuit 63 takes out the contention result of the second busy range and the second ANDcircuit 64 takes out the contention result of the first busy range. Theselection circuit 66 selects the contention result of the CPU corresponding to the request from the contention results of the first busy range and the second busy range through theOR circuit 65. - The
selection circuit 66 outputs an index busy signal to thecommand control unit 40 through the signal line S8 inFIG. 10 . For example, the index busy is indicated when the index busy signal is "1", and not index busy is indicated when the index busy signal is "0". When it is determined that there is a subsequent request of which the address contentions to the address of the precedent request in thepipeline unit 42 by a result of the address contention check of the busy control unit 56 (when the index busy), thebusy control unit 56 requests re-enter of the request to thecommand control unit 40. - In this way, the address
contention check unit 6 checks the address contention with the preceding request in thepipeline unit 42 and controls thecommand control unit 40. -
FIG. 13 is a block diagram of the addresscontention check unit 7 in the addresslock register unit 54. As depicted inFIG. 13 , the addresscontention check unit 7 includes an indexcontention check unit 7A and a full addresscontention check unit 7B. - The index
contention check unit 7A determines that the request which is set to theaddress lock register 516 is a target of the check in a check timing TP01, and performs the contention check for the address of the request source, from the determination of the address in the update destination in the address cache management information (TAG) until completion of processing of the CPU. The address of the storing destination is specified by the index, when the data requested by another CPU is transferred to the CPU itself and is stored in the cache memory of the own CPU. Therefore, it is possible to prevent that the subsequent request accesses the address which is stored the requested data. - The index
contention check unit 7A includes twocomparison circuits circuits circuit 75 and aselection circuit 76. - The
first comparison circuit 70 compares TP01_ index PA [19:8] (a second busy range), in which the FF circuit 50-1 (timing TP01) holds, with REG_ADRS [19:8] in which theaddress lock register 516 holds. Thesecond comparison circuit 72 compares TP01_ index PA [18:8] (a first busy range), in which the FF circuit 50-1 (timing TP01) holds, with REG_ADRS [18:8] in which theaddress lock register 516 holds. - In other words, the check timing is TP01, and the check target is a request after timing TP01 (the FF circuit 50-1).
- The first AND
circuit 73 calculates AND (logical product) of 4K_LINE_MODE signal of 4 bits (the setting value of the busy range) as depicted inFIG. 11 with the comparison result (match / mismatch) of thecomparison circuit 70. The second ANDcircuit 74 calculates AND (logical product) of an inverted signal of the 4K_LINE_MODE signal of 4 bits (the setting value of the busy range) as depicted inFIG. 11 and the comparison result (match / mismatch) of thecomparison circuit 72. - The OR
circuit 75 calculates logical sum of the results of the operation of both of the ANDcircuits selection circuit 76 selects one of the 4-bit arithmetic result of theOR circuit 75 according to the CPU number TP04_CPU [3:0] from the FF circuit 50-1. - That is, the
first comparison circuit 70 checks for contentions in the second busy range and thesecond comparison circuit 72 checks for contentions in the first busy range. And the first ANDcircuit 73 takes out the contention result of the second busy range and the second ANDcircuit 74 takes out the contention result of the first busy range. Theselection circuit 76 selects the contention result of the CPU corresponding to the request from the contention results of the first busy range and the second busy range through theOR circuit 75. - The
selection circuit 76 outputs an index busy signal to thecommand control unit 40 through the signal line S6 inFIG. 10 . For example, the index busy is indicated when the index busy signal is "1", and not index busy is indicated when the index busy signal is "0". When it is determined that there is a subsequent request of which the address contentions to the address of the precedent request in thepipeline unit 42 by a result of the address contention check of the address lock register unit 54 (when the index busy), the addresslock register unit 54 requests re-enter of the request to thecommand control unit 40. - Next, the full address
contention check unit 7B checks the address contention of the subsequent request and the address in theaddress lock register 516. The full addresscontention check unit 7B performs the address check for the address there is a possibility of update in a period from the determination of the address of the update destination in the cache management information (TAG) tag memory 20-0 ~ 20-3 until the completion of the processing of the CPU. Because the address of the update destination is specified and the address is stored in the address lock register, it is prevented that the subsequent request accesses to the address, which is processing in the cache tag memory 20-0 ~ 20-3. - The full address
contention check unit 7B includes acomparison circuit 78 that compares the full address TP01_PA [41:0] of which the FF circuit 50-1 (timing TP01) holds with the full address REG_ADRS [41:0] of which theaddress lock register 516 holds. - The
comparison circuit 78 outputs a full address busy signal to thecommand control unit 40 through the signal line S6 inFIG. 10 . When it is determined that there is a subsequent request of which the address contentions to the address of the precedent request in thepipeline unit 42 by a result of the full address contention check of the address lock register unit 54 (when the full address busy), the addresslock register unit 54 requests re-enter of the request to thecommand control unit 40. - In this way, the contention check is performed between the request address which is set in the
address lock register 516 and the subsequent requests. -
FIG. 14 is a diagram for explaining the operation of the address contention check in thebusy control unit 56 described inFIG. 10 andFIG. 12 .FIG. 15 is a diagram for explaining the operation of the address contention check in the address lockaddress register unit 54 as described inFIG. 10 andFIG. 13 . InFIG. 14 andFIG. 15 , a horizontal axis indicates a time and a vertical axis indicates operations of the registers to be checked and the FF circuit (indicated by "TP01" ~ "TPnn" in the Figures). - The operation of the configuration of
FIG. 10 to FIG. 13 will be described below with reference toFIG. 14 andFIG. 15 . First, theMMB 2 sets the busy mode of each CPUs to the busy setting registers 22-0 to 22-3 in the register unit 58 (referring toFIG. 11 ). As described inFIG. 11 , the 4K_LINE_MODE is set to "0" in 2K_LINE BUSY mode, and the 4K_LINE BUSY mode is set to "1" in 4K_LINE BUSY mode. - The
command control unit 40 enters the request received from the CPU 10-0 ~ 10-3 into thepipeline unit 42. The requests that have been entered, is also inputted to the cache tag memory control unit 21-0 ~ 21-3 and the addresslock register unit 54. The requests in thepipeline unit 42 reach to theresult decision unit 52. And the request includes VAL bit (Valid signal indicating the effectiveness request). a 39-bit physical address of the request PA [41:3] and a 4-bit CPU number of request source. The index is defined 12-bit [19:08] which is one part of the physical address. - The address
contention check unit 6 in thebusy control unit 56 performs the address contention check between one request and a preceding request which precedes the one request in thepipeline unit 42. In this case, as described above, the busy check is performed by changing the busy range for each cache tag memory capacity (CPU). - When it is determined that there is a address contention with the preceding request in the
pipeline unit 42 by a result of the address contention check in thebusy control unit 56, thebusy control unit 56 requests re-enter of the request to thecommand control unit 40. And, when it is determined that there is not the address contention of the request and the result decision unit 53 determines that update of the TAG (cache management information) is necessary, thebusy control unit 56 updates the TAG and sets the full address of the processing target request to thelock register 516. - The operation will be explained by the timing chart of the index address contention check in the
pipeline unit 42 ofFIG. 14 . InFIG. 14 , the solid line indicates the preceding request and the dotted line indicates the subsequent request.FIG. 14 illustrates an example of the address contention when reaching the subsequent request (dotted line) in TP 04 (FF 50-4) in a state that the preceding request (solid line) is present later TP 05 (FF 50-5). In other words, the example depicts that the indexes of the both of the preceding request and the subsequent request are matched. - Because the index of the subsequent request (dotted line) matches to the index of the preceding request (solid line), the index address contention check of the pipeline in the
contention check unit 6 determines index busy (depicted by "CHK" in the dotted circle inFig. 14 ). The subsequent request (dotted line) is retry processed by thecommand control unit 40 after determination of the index busy. - Further, the subsequent request which has been determined the index busy is erased in the
pipeline unit 42. Or, the subsequent request is added to a flag indicating the busy and is transferred to theresult decision unit 52, and theresult decision unit 52 erases the subsequent request. - Next, the address contention check in the address
lock register unit 54 will be explained.FIG. 15 is a timing chart of contention check of the full address/index in theaddress lock register 54. InFIG. 15 , a solid line indicates the preceding request, a dotted line indicates the subsequent request and a thick line indicates the completion request of CPU processing. - The preceding request (the solid line) is determined the result of the contention (here, the determination of no contention) at TPnn stage and the update address of the cache tag information TAG is determined. Then, the TAG in the cache tag memory is updated and the updated address is set to the address
lock register unit 54 by the preceding request attime TPnn + 2. And updating the TAG and the set of updated address to the address lock register 54 are performed to extract the required information from the information contained in the preceding request packet. - Then, when the subsequent request has reached the stage TP01, the
contention check unit 7 in the above-mentioned addresslock register unit 54 is performed the full address contention. At this time, when the address which is stored in theaddress lock register 516 match to the full address of the subsequent request, thecontention check unit 7 determines to the full address busy. After this, the subsequent request is retry processed by thecommand control unit 40. - Similarly, the
contention check unit 7 performs the index contention check between the request in theaddress lock register 516 and the subsequent request at the stage TP01. Thecontention check unit 7, when the index is matched, determines that the index busy, and performs retry process of the subsequent request. - In addition, which the
contention check unit 7 use the index or the full address to perform the contention check depend on the command of the request. In other words, even one of the index and the full address may be used, or both of the index and the full address may be used to the contention check. - Thereafter, when the processing of the CPU of the request source is completed and the request of the completion notification of the CPU processing has been entered into the
pipeline unit 52, theresult decision unit 52 determines the request as a reset of the address lock register, and theaddress lock register 516 is reset by the result of the determination at thestage TPnn+ 2. -
FIG. 16 is a flow diagram of a cache synchronization process according to the embodiment. - (S10) A firmware program installed in the
MMB 2, when receiving the start instruction of the information processing system, turns off the power of the system, then turns on the power of the system. - (S12) The firmware program in the MMB2 performs processing of initialization of the
system board 1A (1B ~ 1P) after power-on. In the initialization process, the firmware program in theMMB 2 obtains specification information for the CPU 10-0 ~ 10-3, which are mounted on thesystem board 1A (1B ~ 1P), and sets the busy range values in the busy range setting registers 22-0 ~ 22-3 in thesystem controller 12 based on the specification information. Then, the firmware program in the MMB2 allows the operation of thesystem board 1A (1B ~ 1P) (as depicted asmode 1 inFIG. 16 ). - (S14) A reboot has occurred for some reason during this operation.
- (S16) When even the reboot has occurred, the firmware program of the MMB2 performs an initialization process after a reboot of the
system board 1A (1B ~ 1P). In the initialization process, the firmware program in theMMB 2 obtains specification information for the CPU 10-0 ~ 10-3, which are mounted on thesystem board 1A (1B ~ 1P), and sets the busy range values in the busy range setting registers 22-0 ~ 22-3 in thesystem controller 12 based on the specification information. Then, the firmware program in the MMB2 allows the operation of thesystem board 1A (1B ~ 1P) (as depicted asmode 2 inFIG. 16 ). - (S18) When the operation is end, the operation of the system is finished.
- In this way, it is possible that the firmware program of the
MMB 2 sets and changes the busy range value in the busy range setting registers during the initialization process of the system board after power on and according to the reboot. - Next, dynamic setting process of the busy range according to another embodiment will be described.
- In the process of setting in
FIG. 16 , it is not possible to change the busy range values in the busy range setting registers 22-0 ~ 22-3 unless the reboot occurs. In other words, during operating of the system, the busy range values are not changed until the next setting change. - On the other hand, in an active maintenance, etc., there is a case that new system board and/or new CPU chip in the exist system board are added and connected to the information processing system during the operating of the system. Also, there is a case to replace a failed system board and failed CPU chip to new system board and new CPU chip. According to dynamic change of configuration, it is necessary to change the settings of the busy range values during the system operation.
- As described in
FIG. 10 , in order to change the busy monitoring range in response to dynamic reconfiguration (called to "DR"), thebusy control unit 56 confirms that following conditions (1) and (2) are established, then notifies a changeable notification of the busy range setting to theregister unit 58 via the signal line S9. - (1) Commands in the address
lock register unit 54 are all processed. - (2) Commands in the
pipeline unit 42 has been completed. In other words, there is no command in the stages TP00 ~ TP03 (the FF circuits 50-0 ~ 50-3) which will access thelock register unit 58 in the future and there is no command to all stages. - The busy
setting changing unit 23 in theregister unit 58, after receiving the changeable notification of the busy range setting from thebusy control unit 56, performs the request of change of the busy range setting from theMMB 2 to the busy setting registers 22-0 ~ 22-3. In other words, the busysetting changing unit 23 performs change of setting of the busy monitoring range in a state that there is not a command during the processing in the addresslock register unit 54. -
FIG. 17 is a flow diagram of setting change processing of the busy range according to another embodiment. - (S20) The firmware program installed in the
MMB 2, when receiving the start instruction of the information processing system, once turns off the power of the system and then turns on the power of the system. - (S22) The firmware program in the
MMB 2 performs processing of initialization thesystem board 1A (1B ~ 1P) after power on. In the initialization process, the firmware program in theMMB 2 obtains specification information for the CPU 10-0 ~ 10-3, which are mounted on thesystem board 1A (1B ~ 1P), and sets the busy range values in the busy range setting registers 22-0 ~ 22-3 in thesystem controller 12 based on the specification information. Then, the firmware program in the MMB2 allows the operation of thesystem board 1A (1B ~ 1P). - (S24) The
MMB 2 receives the request of the setting change of the busy range during the operation. - (S26) When the request of change is the dynamic configuration change, the firmware program in the
MMB 2 modifies the dynamic configuration. Thebusy control unit 56 in thesystem controller 12 outputs the stop request of entering the command to thepipeline unit 42 to thecommand control unit 40. Thereby, the system becomes a suspend state. - (S28) In the meantime, the
busy control unit 56 in thesystem controller 12 monitors the presence or absence of a command being processed in the addresslock register unit 54 via the signal line S7. - (S30) The
busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ~ TP03 in thepipeline unit 42 and then notifies the changeable notification of the busy range setting to theregister unit 58. The busysetting changing unit 23 in theregister unit 58 performs the change request of the busy range setting from theMMB 2 to the busy setting registers 22-0 ~ 22-3. - The busy
setting changing unit 23 notifies the setting value of the busy range to thebusy control unit 56 and the addresslock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic. Thebusy control unit 56, after changing the busy range setting, performs control to resume the entering of the command to thepipeline unit 42 for thecommand control unit 40 and becomes a state of release of the suspend. Then, according to the completion of dynamic reconfiguration processing of the firmware program in theMMB 2, the suspend state is released, and the dynamic configuration change process is terminated. - That is, it is possible to change the busy range values which are set in the initialization process of the
system board 1A (in the mode 1) to different setting values (mode 2) during the operation of the system. - (S32) When dynamic configuration change processing is not performed by the firmware in the
MMB 2, thebusy control unit 56 of thesystem controller 12 outputs the stop request of entering of the command to thepipeline unit 42 to thecommand control unit 40. Thereby, thesystem 1 becomes suspend state. - (S34) In the meantime, the
busy control unit 56 in thesystem controller 12 monitors the presence or absence of a command being processed in the addresslock register unit 54 via the signal line S7. - (S36) The
busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ~ TP03 in thepipeline unit 42 and then notifies the changeable notification of the busy range setting to theregister unit 58. The busysetting changing unit 23 in theregister unit 58 performs the change request of the busy range setting from theMMB 2 to the busy setting registers 22-0 ~ 22-3. - The busy
setting changing unit 23 notifies the setting value of the busy range to thebusy control unit 56 and the addresslock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic. Thebusy control unit 56, after changing the busy range setting, performs control to resume the entering of the command to thepipeline unit 42 for thecommand control unit 40 and becomes a state of release of the suspend. - That is, it is possible to change the busy range values which are set in the initialization process of the
system board 1A (mode 1) to different setting values during the operation of the system (mode 2). - (S38) When the dynamic reconfiguration change process is not performed and the suspend is not performed by the firmware in the
MMB 2, thebusy control unit 56 of thesystem controller 12 monitors the presence or absence of the command being processed in the addresslock register unit 54. - (S40) The
busy control unit 56 confirms that the commands being processed are now processed all and that there is no command in the stages TP00 ~ TP03 in thepipeline unit 42 and then notifies the changeable notification of the busy range setting to theregister unit 58. The busysetting changing unit 23 in theregister unit 58 performs the change request of the busy range setting from theMMB 2 to the busy setting registers 22-0 ~ 22-3. - The busy
setting changing unit 23 notifies the setting value of the busy range to thebusy control unit 56 and the addresslock register unit 54 and the setting value of the busy range that has been set is used for a busy decision logic. - In this way, in the busy monitoring, when satisfying the condition of no command which is managed by the index being processed in the
system controller 12, it may be performed to change the setting value in the registers 22-0 ~ 22-3. - In the above embodiments, a combination of even and double capacities of the cache memories are depicted by an example, it is applied to a mixed configuration of the CPUs that have one times capacity and n (n> 2) times capacity of cache memories. In that case, because the occurrence rate of the busy falls to 1/n times, it is possible to further improve the throughput.
- The system controller has been described to the system board which mounts a plurality of CPU chips in the example, but the system controller may be applied to mount a controller board which is connected to the plurality of system boards.
- The foregoing has described the embodiments of the present invention, but within the scope of the spirit of the present invention, the present invention is able to various modifications, and it is not intended to exclude them from the scope of the present invention.
- In system that the system controller connects to a plurality of CPU units which has a cache memory of a different capacity each other and controls cache synchronization, since the monitoring range of contention between a preceding request and a subsequent request are set for each capacity of the cache memory, it is possible to improve the throughput of the CPU unit which has a large cache capacity.
-
- 1A ~ 1P: system board (processing device)
- 2: system management device
- 3A, 3B: crossbar switch board
- 10-0 - 10-7: CPU unit (chip)
- 12,12A, 12B: system controller
- 14: main storage unit
- 16-0 ~ 16-3: cache memory
- 18-0 - 1.8-3: cache tag memory
- 20-0 ~ 20-7: SC cache tag memory
- 22: cache synchronization unit
- 22-0 - 22-7: busy setting register
Claims (18)
- Information processing system (1) comprising:a first CPU unit (10-0) having a first CPU and a first cache memory (16-0) that stores cache tag information and cache data;a second CPU unit (10-1) having a second CPU and a second cache memory (16-1) that stores cache tag information and cache data and has a different capacity from a capacity of the first cache memory (16-0); anda system controller (22) that is connected to the first CPU unit (10-0) and the second CPU unit (10-1) and searches a third cache memory (20-0) that stores a copy of the cache tag information in the first cache memory (16-0) and a fourth cache memory (20-1) that stores a copy of the cache tag information in the second cache memory (16-1) according to a request to the first cache memory (16-0) and the second cache memory (16-1) from the first CPU unit (10-0) and the second CPU unit (10-1),wherein the system controller (12) comprising:a cache synchronization unit (22) that monitors whether or not preceding request and subsequent request require same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU (10-0, 10-1) that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request; anda setting unit (58) that sets different monitoring ranges of the busy between the third cache tag memory (20-0) and the fourth cache tag memory (20-1) to the cache synchronization unit (22).
- The information processing system (1) according to claim 1, wherein the system controller (12) further comprises a pipeline unit (42) that holds a plurality of requests inputted from the first CPU unit (10-0) and the second CPU unit (10-1) in order of the input,
and wherein the cache synchronization unit (22) monitors whether or not the preceding request and the subsequent request in the pipeline unit (42) require same cache address by the monitoring range of busy which is set. - The information processing system (1) according to claim 1, wherein the cache synchronization unit (22) compares one portion of physical address in the preceding request with one portion of physical address in the subsequent request in the monitoring range of the busy to monitor whether or not the preceding request and the subsequent request require same cache address.
- The information processing system (1) according to claim 1, wherein the cache synchronization unit (22) compares one portion of physical address in the preceding request with one portion of physical address in the subsequent request in a first monitoring range of the busy and compares another one portion of the physical address in the preceding request with another one portion of the physical address in the subsequent request in a second monitoring range of the busy.
- The information processing system (1) according to claim 1, wherein the information processing system (1) further comprises a system management device (2) that monitors status of the first CPU unit (10-1), the second CPU unit (10-2) and the system controller (12) and sets the monitoring range of busy to the setting unit (58) in the system controller (12).
- The information processing system (1) according to claim 1, wherein the information processing (1) further comprises a system management device (2) that monitors status of the first CPU unit (10-1), the second CPU unit (10-2) and the system controller (12), receives a change request of the monitoring range of busy and sets the monitoring range of busy to the setting unit (58) in the system controller (12) after the cache synchronization unit (22) detects that the request is not present in the system controller (12).
- The information processing system (1) according to claim 2, wherein the cache synchronization unit (22) further comprises a lock register unit (54) that holds a physical address of the request until a completion of processing of the request by the CPU unit (10-1,10-2).
- The information processing system (1) according to claim 2, wherein the system controller (12) further comprises a command control unit (40) that accepts the requests from the first CPU unit (10-0) and the second CPU unit (10-1), enters accepted request to the pipeline unit (42), and sends a retry request to the CPU (10-0, 10-1) of request source according to a reception of a result of the monitor from the cache synchronization unit (22).
- The information processing system (1) according to claim 7, wherein the cache synchronization unit (22) comprises:a first monitoring unit (56) that monitors whether or not the preceding request and the subsequent request that are held in the pipeline unit (42) require same cache address by the monitoring range of busy; anda second monitoring unit (7) that monitors whether or not the preceding request held in the lock register unit (54) and the subsequent request held in the pipeline unit (42) require same cache address by the monitoring range of busy.
- A system controller (12) that is connected to a first CPU unit (10-0) having a first CPU and a first cache memory (16-0) that stores cache tag information and cache data and a second CPU unit (10-1) having a second CPU and a second cache memory (16-1) that stores cache tag information and cache data and has a different capacity from the capacity of the first cache memory (16-0), the system controller (12) comprises:a cache tag search unit (21-0 ~21-3) that searches a third cache memory (20-0) that stores a copy of the cache tag information in the first cache memory (16-0) and a fourth cache memory (20-1) that stores a copy of the cache tag information in the second cache memory (16-1) according to a request to the first cache memory (16-0) and the second cache memory (16-1) from the first CPU unit (10-0) and the second CPU unit (10-1);a cache synchronization unit (22) that monitors whether or not preceding request and subsequent request require same cache address by monitoring range of busy that is set and make a retry of the subsequent request that requires the same cache address as the preceding request to a CPU (10-0, 10-1) that has required when receiving the subsequent request before completing update of the copy of the cache tag information by the preceding request; anda setting unit (58) that sets different monitoring ranges of the busy between the third cache tag memory (20-0) and the fourth cache tag memory (20-1) to the cache synchronization unit (22).
- The system controller (12) according to claim 10, wherein the system controller (12) further comprises a pipeline unit (42) that holds a plurality of requests inputted from the first CPU unit (10-0) and the second CPU unit (10-1) in order of the input,
and wherein the cache synchronization unit (22) monitors whether or not the preceding request and the subsequent request in the pipeline unit (42) require same cache address by the monitoring range of busy which is set. - The system controller (12) according to claim 10, wherein the cache synchronization unit (22) compares one portion of physical address in the preceding request with one portion of physical address in the subsequent request in the monitoring range of the busy to monitor whether or not the preceding request and the subsequent request require same cache address.
- The system controller (12) according to claim 10, wherein the cache synchronization unit (22) compares one portion of physical address in the preceding request with one portion of physical address in the subsequent request in a first monitoring range of the busy and compares another one portion of the physical address in the preceding request with another one portion of the physical address in the subsequent request in a second monitoring range of the busy.
- The system controller (12) according to claim 10, wherein the setting unit (58) sets the monitoring range of busy from a system management device (2) that monitors status of the first CPU unit (10-0), the second CPU unit (10-1) and the system controller (12).
- The system controller (12) according to claim 10, wherein the setting unit (58) sets the monitoring range of busy from a system management device (2) that monitors status of the first CPU unit (10-0), the second CPU unit (10-1) and the system controller (12) after the cache synchronization unit (22) detects that the request is not present in the system controller (12).
- The system controller (12) according to claim 11, wherein the cache synchronization unit (22) further comprises a lock register unit (54) that holds a physical address of the request until a completion of processing of the request by the CPU unit (10-0,10-1).
- The system controller (12) according to claim 11, wherein the system controller (12) further comprises a command control unit (40) that accepts the requests from the first CPU unit (10-0) and the second CPU unit (10-1), enters accepted request to the pipeline unit (42), and sends a retry request to the CPU (10-0, 10-1) of request source according to a reception of a result of the monitor from the cache synchronization unit (22).
- The system controller (12) according to claim 16, wherein the cache synchronization unit (12) comprises:a first monitoring unit (56) that monitors whether or not the preceding request and the subsequent request that are held in the pipeline unit (42) require same cache address by the monitoring range of busy; anda second monitoring unit (7) that monitors whether or not the preceding request held in the lock register unit (54) and the subsequent request held in the pipeline unit (42) require same cache address by the monitoring range of busy.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/058971 WO2011148482A1 (en) | 2010-05-27 | 2010-05-27 | Information processing system and system controller |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2579160A1 true EP2579160A1 (en) | 2013-04-10 |
Family
ID=45003489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP10852152.7A Withdrawn EP2579160A1 (en) | 2010-05-27 | 2010-05-27 | Information processing system and system controller |
Country Status (6)
Country | Link |
---|---|
US (1) | US8856457B2 (en) |
EP (1) | EP2579160A1 (en) |
JP (1) | JP5348320B2 (en) |
KR (1) | KR101413787B1 (en) |
CN (1) | CN102906713A (en) |
WO (1) | WO2011148482A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012127629A1 (en) * | 2011-03-22 | 2012-09-27 | 富士通株式会社 | Server system and method of executing maintenance of crossbar board in hot-line state |
WO2013084314A1 (en) * | 2011-12-07 | 2013-06-13 | 富士通株式会社 | Processing unit and method for controlling processing unit |
EP2790107A1 (en) * | 2011-12-07 | 2014-10-15 | Fujitsu Limited | Processing unit and method for controlling processing unit |
US20170329711A1 (en) | 2016-05-13 | 2017-11-16 | Intel Corporation | Interleaved cache controllers with shared metadata and related devices and systems |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04335298A (en) | 1991-05-09 | 1992-11-24 | Nec Corp | Static ram circuit |
JPH1173370A (en) * | 1997-08-29 | 1999-03-16 | Fujitsu Ltd | Information processor |
JP2000132531A (en) * | 1998-10-23 | 2000-05-12 | Pfu Ltd | Multiprocessor |
JP2005293357A (en) * | 2004-04-01 | 2005-10-20 | Toshiba Corp | Log-in system and method |
JP2006079218A (en) * | 2004-09-08 | 2006-03-23 | Fujitsu Ltd | Memory control device and control method |
US7287122B2 (en) * | 2004-10-07 | 2007-10-23 | International Business Machines Corporation | Data replication in multiprocessor NUCA systems to reduce horizontal cache thrashing |
US7370155B2 (en) * | 2005-10-06 | 2008-05-06 | International Business Machines Corporation | Chained cache coherency states for sequential homogeneous access to a cache line with outstanding data response |
EP1988464B1 (en) | 2006-02-24 | 2018-11-21 | Fujitsu Ltd. | Snoop control method and information processing device |
WO2007099583A1 (en) * | 2006-02-28 | 2007-09-07 | Fujitsu Limited | System controller and cache control method |
JP4965974B2 (en) * | 2006-11-14 | 2012-07-04 | ルネサスエレクトロニクス株式会社 | Semiconductor integrated circuit device |
EP2343655A4 (en) * | 2008-10-02 | 2012-08-22 | Fujitsu Ltd | Memory access method and information processing apparatus |
-
2010
- 2010-05-27 EP EP10852152.7A patent/EP2579160A1/en not_active Withdrawn
- 2010-05-27 JP JP2012517051A patent/JP5348320B2/en not_active Expired - Fee Related
- 2010-05-27 WO PCT/JP2010/058971 patent/WO2011148482A1/en active Application Filing
- 2010-05-27 KR KR1020127030589A patent/KR101413787B1/en not_active IP Right Cessation
- 2010-05-27 CN CN2010800670338A patent/CN102906713A/en active Pending
-
2012
- 2012-11-27 US US13/686,171 patent/US8856457B2/en not_active Expired - Fee Related
Non-Patent Citations (1)
Title |
---|
See references of WO2011148482A1 * |
Also Published As
Publication number | Publication date |
---|---|
US20130086331A1 (en) | 2013-04-04 |
CN102906713A (en) | 2013-01-30 |
US8856457B2 (en) | 2014-10-07 |
KR101413787B1 (en) | 2014-06-30 |
WO2011148482A1 (en) | 2011-12-01 |
JP5348320B2 (en) | 2013-11-20 |
KR20130014573A (en) | 2013-02-07 |
JPWO2011148482A1 (en) | 2013-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907528B2 (en) | Multi-processor bridge with cache allocate awareness | |
US5398325A (en) | Methods and apparatus for improving cache consistency using a single copy of a cache tag memory in multiple processor computer systems | |
US7085897B2 (en) | Memory management for a symmetric multiprocessor computer system | |
US7395379B2 (en) | Methods and apparatus for responding to a request cluster | |
EP1938190B1 (en) | Method and apparatus to clear semaphore reservation | |
US8015366B2 (en) | Accessing memory and processor caches of nodes in multi-node configurations | |
EP3788495B1 (en) | High-performance streaming of ordered write stashes to enable optimized data sharing between i/o masters and cpus | |
US9009372B2 (en) | Processor and control method for processor | |
JP2002182976A (en) | Dynamic serial conversion for memory access in multi- processor system | |
KR20180063820A (en) | An apparatus and method for transferring data between address ranges in memory | |
CN114860329B (en) | Dynamic consistency bias configuration engine and method | |
US20130262553A1 (en) | Information processing system and information transmitting method | |
CN108874687A (en) | For the non-unibus of tiled last level cache(NUB)Interconnection agreement | |
CN102834813A (en) | Update handler for multi-channel cache | |
US20170228164A1 (en) | User-level instruction for memory locality determination | |
US6973547B2 (en) | Coherence message prediction mechanism and multiprocessing computer system employing the same | |
US6546465B1 (en) | Chaining directory reads and writes to reduce DRAM bandwidth in a directory based CC-NUMA protocol | |
US7159079B2 (en) | Multiprocessor system | |
EP2579160A1 (en) | Information processing system and system controller | |
JP2006048406A (en) | Memory system controller and memory system control method | |
US7174430B1 (en) | Bandwidth reduction technique using cache-to-cache transfer prediction in a snooping-based cache-coherent cluster of multiprocessing nodes | |
US5987544A (en) | System interface protocol with optional module cache | |
US7653790B2 (en) | Methods and apparatus for responding to a request cluster | |
US20090240893A1 (en) | Information processing device, memory control method, and memory control device | |
US9983994B2 (en) | Arithmetic processing device and method for controlling arithmetic processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20121123 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20140811 |