WO2003048955A1 - Systeme multiprocesseur - Google Patents

Systeme multiprocesseur Download PDF

Info

Publication number
WO2003048955A1
WO2003048955A1 PCT/JP2002/012523 JP0212523W WO03048955A1 WO 2003048955 A1 WO2003048955 A1 WO 2003048955A1 JP 0212523 W JP0212523 W JP 0212523W WO 03048955 A1 WO03048955 A1 WO 03048955A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
data
data cache
cache
multiprocessor system
Prior art date
Application number
PCT/JP2002/012523
Other languages
English (en)
Japanese (ja)
Inventor
Koji Hosogi
Kiyokazu Nishioka
Toru Nojiri
Kazuhiko Tanaka
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to JP2003550079A priority Critical patent/JPWO2003048955A1/ja
Publication of WO2003048955A1 publication Critical patent/WO2003048955A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0851Cache with interleaved addressing

Definitions

  • the present invention relates to a multiprocessor system, and more particularly to a technique for performing high-speed communication between processors.
  • a shared data cache among multiple processors and broadcast Access method Japanese Patent Laid-Open No. 10-254 779
  • a method of maintaining data consistency by cache memory of store-through method and snoop control Japanese Patent Application Laid-Open No. 8-2976464
  • a method using an interleaved cache with a fixed address for the shared data cache Japanese Patent Laid-Open No. 3-172690.
  • a first object of the present invention is to eliminate useless data transfer between processors in data communication between processors and prevent performance degradation.
  • a second object of the present invention is to prevent a decrease in the use efficiency of a cache memory due to a fixed interleave configuration in a system that uses a shared interleave cache between processors.
  • a multi-port processor system in which a plurality of processors having a data cache and a main memory are connected by a bus. Area for storing information for specifying cache or main memory, information for specifying addresses to be accessed, and information for fingering access types And a data transfer engine for issuing a load instruction and a store instruction to a data cache or a main memory in accordance with information recorded in the area. Is provided. .
  • a multiprocessor system in which a plurality of processors each having a data cache are connected by a path. An area for setting whether or not to share the data cache (including the case where sharing is not performed) and an area for setting the size of the data cache to be shared.
  • a multiprocessor system is provided which includes a determination unit that determines which processor should be accessed with reference to two regions.
  • FIG. 1 is a block diagram for explaining the configuration of the first exemplary embodiment of the present invention.
  • FIG. 2 is a block diagram for explaining the configuration of the second embodiment of the present invention.
  • FIG. 3 is a block diagram for explaining the mapping control unit 22 according to the second embodiment of the present invention.
  • FIG. 4 is a diagram for explaining a boundary register 41 and a shared processor 'register 33 according to the second embodiment of the present invention.
  • FIG. 5 is a diagram for explaining the sharing of the data cache of the processor 1 ′ in the second embodiment of the present invention.
  • FIG. 6 is a block diagram for explaining the configuration of the third exemplary embodiment of the present invention.
  • FIG. 1 is a block diagram for explaining a configuration of a multiprocessor system according to the present embodiment.
  • NP processors 1 each having a data cache 2 therein are connected to an internal bus 10 and share a main memory 13.
  • the main memory 13 is connected to the internal bus 10 via a main memory control unit 12 including a main memory control circuit and an interface.
  • a data transfer engine 11 which is a characteristic part of the present embodiment is connected to the internal bus 10.
  • the data transfer engine 11 issues a command / store instruction to the data cache 2 in the processor 1 or the main memory 13 connected to the internal bus 10 to execute the processor la and the other processor lb. It has a function to control the data transfer between processor 1 and la and lb, or processor 1 and main memory 13.
  • each processor 1 includes a data cache 2, a load / store control unit 3, a CPU 4, and an internal bus control unit 5.
  • the data cache 2 can be a general data cache including a data memory for storing a part of the data in the main memory 13 and a tag memory for storing address data.
  • the load / store control unit 3 is a control circuit for accessing the data cache 2.
  • the exchange of control signals, memory addresses, store data, load data, and the like between the load / store control unit 3 and the data cache 2 is performed via a path 6.
  • the CPU 4 can be, for example, a general-purpose CPU, a dedicated coprocessor for a specific use, or the like.
  • the data cache for the load / store control unit 3 is There are two routes for issuing access requests to (2). One is a normal load / store request notified by the CPU 4 as an issuer to the portal store controller 3 via a path 7, and the other is an internal path controller 5 as an issuer. This is a store request sent to the load / store control unit 3 via the path 8.
  • the load / store request that is issued by the internal path control unit 5 is not issued by the internal path control unit 5 as a request / master, but is issued by the data transfer engine 11 request / master. This is issued to the internal bus control unit 5 in each port processor 1 via the internal path 10. At this time, the internal bus control unit 5 operates as a slave module on the internal bus 1.0.
  • the load / store control unit 3 arbitrates these two types of load / store requests, and accesses the data cache 2 via the path 6. 'That is, in the present embodiment, communication between the processor 1a and the processor 1b (for example, when transferring the contents of the data cache 2a of one processor 1a to the data cache 2b of another processor 1b) Alternatively, when data communication between the data cache 2 of the processor 1 and the main memory 13 is required (for example, when prefetching data), the data transfer engine 11 Controls the process of reading data and writing to the destination.
  • the data transfer engine 11 sends a load instruction for reading data from the data cache 2 to the processor 1 in the multiprocessor system via the internal path 10 and a data instruction to the data cache 2 via the internal path 10. It is an engine that can issue store instructions to be written. Similarly, a load instruction and a store instruction can be issued to the main memory 13 via the main memory control unit 12.
  • the slave processor 1 or the main memory control unit 12 has identification information in response to this access, and the data transfer engine 11 uses the identification information and It is possible to access the mouth processor 1 or the main memory controller 12.
  • the data transfer engine 11 can be configured to include, for example, an 'internal bus interface 11', an address generator 112, and a buffer 113.
  • the address generator 112 is used to read / write data to / from the modules (data cache 2 and main memory 13 in the processor 1) connected to the internal path 10. Generate an address.
  • the address generator 112 also generates a selection signal for specifying which module is to be accessed.
  • the address generation unit 112 In order to perform these processes, the address generation unit 112 generates the start address, width, pitch, number of repetitions, module identification information, buffer 113 entry number (information for specifying the storage location), and access.
  • a register group that indicates the read Z write as a seed is provided.
  • the address generation unit 112 can hold a plurality of sets with these register evenings as one set.
  • the value of each registry can be set by software, for example, via the operating system.
  • the address generator 1 1 2 generates an address based on the start address, width ′, pitch, and number of repetitions (these are referred to as “address generation information”) set in the register. An address that specifies the two-dimensional area 1 2 1 can be generated. Then, it is possible to determine which processor 1 or main memory 13 to access based on the identification information. The generated address and the selection signal are transmitted to the internal bus interface 111.
  • the address generation information generated by the address generator is not limited to this. '
  • the data transfer engine 11 reads data from the buffer according to the entry number and transfers it to the internal bus interface 11 1 at the time of write access (when the write is indicated by the register). .
  • Internal bus interface The device 11 specifies the output destination based on the input address and the selection signal, and outputs the data read from the buffer 113 via the internal bus 10.
  • the data read via the internal bus 10 is transferred to the buffer 113, and the buffer 113 stores the entry number set by the register. Store the night for
  • Address generation information AO
  • recognition information processor 0, entry number: B0, read-only write: Read 2
  • recognition information Processor 1, entry number: B0, read / write: Write. This is because the data in the address area specified by the address generation information A0 is read from the processor 0, stored in the buffer entry number B0, and stored in the address area specified by the address generation information A1 of the processor 1. This indicates that data stored in the entry number B 0 of the buffer is to be written, that is, data is transferred from the processor 0 to the processor 1.
  • the data transfer engine 11 performs the data transfer between the processor 1 a and the processor 1 or the data transfer between the processor 1 and the main memory 13 by the CPU of the processor 1. This can be realized in parallel with the processing in (4). At this time, since the data transfer engine 11 transfers only necessary data from a specific transfer source to a specific transfer destination, traffic due to unnecessary data transfer does not occur.
  • the data transfer engine 11 is started by interrupt or polling, thereby synchronizing the processors.
  • the load / store control unit 3 performs a write process on the data cache 2 via the path 6.
  • the cache 2 is a cache hit, read the data from the data cache 2 and return the load data via the path 7 or 8 to the CPU 4 or the internal bus control unit 5 that requested access. .
  • a data cache miss the same sequence as for a cache miss at the time of store is performed. 4 or returned to the internal bus control unit 5.
  • the target data can be reliably stored in the data cache 2. That is, for example, it can operate as a prefetch to the data cache 2.
  • the multiprocessor system includes an interface for sharing a data cache among a plurality of processors having a data cache and a sosh.
  • the data cache is shared with which processor in the system, and the size of the data cache to be allocated can be set.
  • FIG. 2 is a block diagram illustrating a configuration of a multiprocessor system according to the second embodiment.
  • the multiprocessor system is configured by NP processors 20 having a single data cache 26 connected by a global bus 23. '
  • Each processor 20 includes a data cache 26, a CPU 4, a load storage controller 21, and a mapping controller 22.
  • the global bus 28 has an arbiter 25 for arbitrating paths.
  • the load / store control unit 21 loads the data cache 26 issued from the CPU 4 of the local CPU 4 (or the own processor) or the CPU 4 of another processor 20 sharing the data cache 26. Process instructions and store instructions.
  • the mapping control unit 22 determines which processor 20 on the global bus 28 should access the data cache 26 when loading or storing the data cache 26.
  • mapping control unit 22 determines that the mapping control unit 22 accesses the data cache 26 in its own processor 20 (oral processor 20)
  • the load / store control unit 21 executes the local bus operation. Access local data cache 26 through 27.
  • mapping control unit 22 determines that the data cache 26 in the other processor 20 (the other processor 20 connected by the global bus 28) is to be accessed, the global bus 28 Access the data cache 26 in the other target processor 20 via.
  • the mapping control unit 22 will be described in more detail with reference to FIG. '
  • the mapping control unit 22 includes a shared processor 'register 33 indicating which processor 20 is to share the data cache, a boundary register 30 indicating the size of the data cache to be allocated, and a boundary register 30. And a shifter 31 for shifting an address 23 input from the load / store control unit 21 in accordance with the value of the register.
  • each processor 20 in the multiprocessor is provided with a processor ID for identifying the processor, and the mapping control unit 22 holds a processor ID 36 indicating the processor 20 itself.
  • mapping control unit 22 outputs a processor selection signal 24 to the aviator 25 based on the output of the shifter 31, the processor ID 36, and the value of the shared processor 'register 33. It has 3 4.
  • the values of the shared processor register 33 and the pandary register 30 can be set, for example, via an operating system that controls the multiprocessor system.
  • the application software executed in the multiprocessor system uses the values of the shared processor register 33 and the pandary register 30 so that the data cache suitable for executing the application software can be shared. Make settings.
  • the shared processor register 33 determines which processor 20 among the plurality of processors 20 connected to the global bus 28 should share the data cache. This is a register to specify. '
  • the m bits from the lower side of the shared processor / register 33 are set to "1". That is, sharing is performed between 2, 4, 8,... Processors, and data cache sharing is performed between a maximum of 2 ⁇ processors 20.
  • the processor 20 performs data cache sharing with all other processors 20, all bits of the shared processor register 33 are set to "1".
  • the processors 20 sharing the cache with each other have the same value in the shared processor / register 33.
  • boundary register S0 which is a register indicating the allocation of the size of the data cache 26
  • the boundary register S0 which is a register indicating the allocation of the size of the data cache 26
  • the minimum address pane C (byte), which is the minimum unit of the allocation size of the data cache, is determined in advance. For the lowest address and foundry C, for example, it can be set via the operating system. You can make it. 'At this time, if a puncture / interleave configuration is adopted with the minimum address boundary in the data cache sharing, all bits of the boundary register '30 are set to "0". Then, each time the size of the cache overnight 26 is set to twice the minimum address boundary C, it is set to "1" sequentially from the lower bits of the boundary register 30. For example, the size of the data cache 26 of the processor 20 in which only the lower 2 bits of the boundary register 30 are "1" is set to four times the minimum address boundary C.
  • the format of the boundary register 30 and the shared processor register 33 is not limited to the above example.
  • mapping control unit 22 Next, the processing of the mapping control unit 22 will be described.
  • the address 23 input to the matching control unit 22 is shifted in the shifter 31 based on the value of the boundary register 30 and the like, and becomes a shift address 35.
  • the shift address 35 is ⁇ 1 og 2 (minimum address boundary C) + ⁇ ⁇ boundary register 30 ⁇ +1 og 2 (NP) —1: 1 og 2 (minimum address boundary C) + ⁇ ⁇ Boundary register 3 0 ⁇ >.
  • ⁇ Boundary Register 30 ⁇ specifies the address space to interleave.
  • ⁇ ⁇ Boundary register 30 ⁇ indicates, for example, ⁇ ⁇ Bit 0, Bit 1,. Bit 2 ⁇ when the boundary register is represented by 3 bits.
  • the shift address 35 when the minimum address' boundary C is 1 KB is as follows.
  • Boundary register 0 0 0: N 1 2: 1 0>
  • Boundary register 0 0 1: ⁇ 1 3: 1 1>
  • Boundary Regis evening 0 1 1: ⁇ 1 4: 1 2>
  • Boundary register 1 1 1: 1 5: 1 3>
  • the value of the boundary register 30 is 0 X 0 1 1 and the minimum address boundary C is 1 KB, and the input and input addresses are ⁇ 31: 0>
  • Shift address 35 is ⁇ 1 og 2 (1 KB) + ⁇ ⁇ 0 1, 1 ⁇ + 1 og 2 (8)-1: 1 og 2 (1 KB) + ⁇ ⁇ 0 ; 1, 1 ⁇ >, That is, the address becomes ⁇ 14:12>.
  • the shift addresses 3, 5 and the processor ID 36 are input to the selector 34.
  • the selector 34 generates a processor selection signal 24 indicating a processor ID to be accessed, based on the value of the shared processor 'register 33, and transmits the processor selection signal 24 to the aviator 25.
  • the value of the shared processor 'register 33' is adjusted for each bit. If the value of the shared processor ⁇ register 33 is “1”, the value of the shift address 35 is selected, and the value of the shared processor ⁇ register 33 is selected. Is "0", the processor selection signal 24 is generated by selecting the value of the processor ID 36.
  • the load / store control unit 21 uses the local bus 27 to execute the local data cache 2. Load against 6 Execute instructions or store instructions.
  • mapping control unit 22 determines that the data cache 26 in the other processor 20 is to be accessed, the load / store control unit 21 requests the arbiter 25 for a path right, and if the path right is obtained, The global bus 28 is used to execute a read instruction or a store instruction to the data cache 26 in another processor 20.
  • the data cache 26 to be accessed actually accesses the data memory and tag memory in the data cache 26 after arbitrating access from the oral path 27 or access from the global bus 28. .
  • a further specific example of a multiprocessor system using this method will be described with reference to FIGS.
  • the multiprocessor system is a system composed of eight processors (processor IDs are assumed to be from 0x0000 to 0x111), and the minimum address' boundary C is set to 1 KB. .
  • FIG. 5 is a diagram showing an image of data cache sharing of the multiprocessor system shown in FIG.
  • processor 0 (0x00) and processor 1 (0x001) are defined as distributed distributed caches (codes 45, 46).
  • Processor 2 (0 0 1 0) and processor 3 (0 x 0 1 1) operate as a multiprocessor connected by a shared data cache, and transfer addresses 0 to 4 KB—address 1 to processor 2 ( 0 X 0 10) Allocation, interleave configuration in which 4 KB to 8 KB-1 are allocated to processor 3 (reference numeral 47).
  • Processor 4 (0x100) to processor 7 (0111) operate as multiprocessors connected by a shared data cache, and the interleave address boundary is 1 KB (code 48). .
  • the application software executed in this multiprocessor system can share the data cache suitable for the execution, and prevent the cache memory usage efficiency from decreasing due to the fixed interleaving configuration. it can.
  • each processor 20 may be located at a physically remote location, and bus arbitration is also added. Therefore, the latency for accessing the local data cache 26 is limited. In general, the latency to access the data cache 26 of another processor increases. Also, The cause of the increase in the number of storage cycles of the processor 20 is mainly a load instruction. Therefore, with respect to non-local accesses, it is possible to reduce the logical scale of the arbiter 25 and the global bus 28 by having a constraint that only the store instruction can be executed.
  • FIG. 6 shows a third embodiment of the present invention.
  • the basic configuration is a combination of the above-described two embodiments, and is a multiprocessor system in which NP processors 50 are connected by an internal bus 10 and a global bus 28. 'The same parts as those in the first embodiment and the second embodiment are denoted by the same reference numerals.
  • the load / store control unit 51 has the functions of both the load / store control unit 3 in the first embodiment and the portal / storage control unit 21 in the second embodiment. ,
  • the data transfer by the data transfer engine 11 of the first embodiment is executed by interruption or polling, overhead until the start is generated.
  • data can be transferred independently of the processor 50, large-capacity data can be transferred at high speed.
  • the shared data cache system in which the global bus 28 and the arbiter 25 according to the second embodiment are connected large-capacity data transfer and small-capacity data transfer are also possible, but the data cache 26 of other processors is used. Latency is high for the ⁇ instruction.
  • the present embodiment has a configuration in which these two features are taken into consideration and can be properly used by the application software.
  • the present invention relates to a data communication between a plurality of processors provided with a data cache, which eliminates useless data transfer between the processors, and a method of using an interleaved cache of a multiprocessor to fix an interleave structure. It can be applied to data communication of a multiprocessor system for the purpose of preventing a decrease in the use efficiency of the cache memory due to the development.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Multi Processors (AREA)

Abstract

Selon la présente invention, lors d'une communication de données entre les processeurs, il est possible de supprimer les transferts de données inutiles entre les processeurs, ce qui permet d'éliminer la réduction des performances. En outre, dans un multiprocesseur utilisant une mémoire cache entrelacée, il est possible d'empêcher l'efficacité d'utilisation de la mémoire cache de diminuer de par la fixation de la configuration d'entrelacement. Le système multiprocesseur de la présente invention comprend une pluralité de processeurs (50) comportant une mémoire cache (26) de données et une mémoire centrale (13), lesquelles sont connectées les unes aux autres par l'intermédiaire d'un bus (10). Ce système comprend un moteur (11) de transfert de données comportant une zone de stockage d'informations relatives à l'accès aux données et chargé d'envoyer des instructions de charge et des instructions de stockage à la mémoire cache (26) conformément aux informations. Chacun des processeurs (50) comprend des éléments d'évaluation (22) servant à stocker des informations relatives au partage de la mémoire cache de données et à référencer ces informations dès la réception d'une adresse à laquelle on va accéder, de manière à déterminer à quel processeur il convient d'accéder.
PCT/JP2002/012523 2001-12-03 2002-11-29 Systeme multiprocesseur WO2003048955A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003550079A JPWO2003048955A1 (ja) 2001-12-03 2002-11-29 マルチプロセッサシステム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2001/369009 2001-12-03
JP2001369009 2001-12-03

Publications (1)

Publication Number Publication Date
WO2003048955A1 true WO2003048955A1 (fr) 2003-06-12

Family

ID=19178486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2002/012523 WO2003048955A1 (fr) 2001-12-03 2002-11-29 Systeme multiprocesseur

Country Status (2)

Country Link
JP (1) JPWO2003048955A1 (fr)
WO (1) WO2003048955A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012008774A (ja) * 2010-06-24 2012-01-12 Fujitsu Ltd キャッシュ装置、及び情報処理装置
CN113222115A (zh) * 2021-04-30 2021-08-06 西安邮电大学 面向卷积神经网络的共享缓存阵列

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710907A (en) * 1995-12-22 1998-01-20 Sun Microsystems, Inc. Hybrid NUMA COMA caching system and methods for selecting between the caching modes
WO2000022538A1 (fr) * 1998-10-14 2000-04-20 Hitachi, Ltd. Dispositif de transfert de donnees
US6314491B1 (en) * 1999-03-01 2001-11-06 International Business Machines Corporation Peer-to-peer cache moves in a multiprocessor data processing system
US20020174301A1 (en) * 2001-05-17 2002-11-21 Conway Patrick N. Method and system for logical partitioning of cache memory structures in a partitioned computer system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710907A (en) * 1995-12-22 1998-01-20 Sun Microsystems, Inc. Hybrid NUMA COMA caching system and methods for selecting between the caching modes
WO2000022538A1 (fr) * 1998-10-14 2000-04-20 Hitachi, Ltd. Dispositif de transfert de donnees
US6314491B1 (en) * 1999-03-01 2001-11-06 International Business Machines Corporation Peer-to-peer cache moves in a multiprocessor data processing system
US20020174301A1 (en) * 2001-05-17 2002-11-21 Conway Patrick N. Method and system for logical partitioning of cache memory structures in a partitioned computer system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HIKARU SAMUKAWA: "Heiretsusei no tsuikyu -programmer kara mita RISC keisanki no uchigawa-", TUTORIALS SHIRYO, 5 June 2001 (2001-06-05), pages 1 - 32, XP002967002 *
KEIJI KOJIMA, KIYOKAZU NISHIOKA, TORU NOJIRI: "Media processor(MAP) no vision to architecture", INFORMATION PROCESSING SOCIETY OF JAPAN DAI 59 KAI (HEISEI 11 NEN KOKI) ZENKOKU TAIKAI TOKUBETSU SESSION (1) KOEN RONBUNSHU, 28 September 1999 (1999-09-28), pages 87 - 92, XP002967001 *
KOJI HOSOKI ET AL.: "Media processor MAPCA no data tenso hoshiki", vol. 2002, no. 9, 1 February 2002 (2002-02-01), pages 91 - 95, XP002965888 *
MASAO ISHIGURO ET AL.: "VLIW-gata media processor o mochiita MPEG-4 video decoder", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS KENKYU HOKOKU, vol. 101, no. 456, 15 November 2001 (2001-11-15), pages 31 - 36, XP002965887 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012008774A (ja) * 2010-06-24 2012-01-12 Fujitsu Ltd キャッシュ装置、及び情報処理装置
CN113222115A (zh) * 2021-04-30 2021-08-06 西安邮电大学 面向卷积神经网络的共享缓存阵列
CN113222115B (zh) * 2021-04-30 2024-03-01 西安邮电大学 面向卷积神经网络的共享缓存阵列

Also Published As

Publication number Publication date
JPWO2003048955A1 (ja) 2005-08-11

Similar Documents

Publication Publication Date Title
US7120755B2 (en) Transfer of cache lines on-chip between processing cores in a multi-core system
US5754800A (en) Multi processor system having dynamic priority based on row match of previously serviced address, number of times denied service and number of times serviced without interruption
JP2512651B2 (ja) メモリ共有マルチプロセッサ
JP3722415B2 (ja) 効率的なバス機構及びコヒーレンス制御を有する繰り返しチップ構造を有するスケーラブル共用メモリ・マルチプロセッサ・コンピュータ・システム
US7668997B2 (en) High speed bus system that incorporates uni-directional point-to-point buses
JP2010191638A (ja) キャッシュ装置
US6996693B2 (en) High speed memory cloning facility via a source/destination switching mechanism
US7069394B2 (en) Dynamic data routing mechanism for a high speed memory cloner
US6892283B2 (en) High speed memory cloner with extended cache coherency protocols and responses
JPH0810447B2 (ja) メモリ共有マルチプロセッサが使用する全ての物理的アドレスのデータ両立性を保持する方法
JP5439808B2 (ja) 複数バスを有するシステムlsi
US6898677B2 (en) Dynamic software accessibility to a microprocessor system with a high speed memory cloner
US7502917B2 (en) High speed memory cloning facility via a lockless multiprocessor mechanism
WO2003048955A1 (fr) Systeme multiprocesseur
JPS6237752A (ja) 別々の命令及びデ−タインタ−フエ−ス及びキヤツシユを持つたマイクロプロセサを有するマルチプルバスシステム
JP2005128963A (ja) 記憶制御装置及びdma転送が可能な制御システム
US7356649B2 (en) Semiconductor data processor
US6928524B2 (en) Data processing system with naked cache line write operations
US8015326B2 (en) Central processing apparatus, control method therefor and information processing system
WO2001050267A2 (fr) Antememoire double a fonctionnement d'interconnexion multiple
Dixon Page associative caches on Futurebus
JP2003216566A (ja) マイクロコンピュータ
JPH06195263A (ja) キャッシュ・メモリ・システム
JP2000057053A (ja) キャッシュメモリ装置
JP2009042992A (ja) バス制御装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003550079

Country of ref document: JP

122 Ep: pct application non-entry in european phase