WO2007017376A1 - Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres - Google Patents

Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres Download PDF

Info

Publication number
WO2007017376A1
WO2007017376A1 PCT/EP2006/064661 EP2006064661W WO2007017376A1 WO 2007017376 A1 WO2007017376 A1 WO 2007017376A1 EP 2006064661 W EP2006064661 W EP 2006064661W WO 2007017376 A1 WO2007017376 A1 WO 2007017376A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
access
commands
cache
Prior art date
Application number
PCT/EP2006/064661
Other languages
German (de)
English (en)
Inventor
Reinhard Weiberle
Bernd Mueller
Eberhard Boehl
Yorck Collani
Rainer Gmehlich
Original Assignee
Robert Bosch Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh filed Critical Robert Bosch Gmbh
Priority to JP2008525519A priority Critical patent/JP2009505181A/ja
Priority to EP06777976A priority patent/EP1915695A1/fr
Publication of WO2007017376A1 publication Critical patent/WO2007017376A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0853Cache with multiport tag or data arrays

Definitions

  • Computer system with at least two processing units and at least one first memory or memory area for data and / or commands
  • the present invention relates to microprocessor systems with fast cache memory and in this context describes a dual port cache.
  • Microcontrollers with at least two integrated cores are known from the state of the art. These microcontrollers are also called dual-core or multi-
  • the at least two cores execute the same programs, program segments or commands redundantly and clock-synchronously, the results of the two cores are compared, and an error will then be detected in the comparison for agreement.
  • this configuration of a multi-core system is referred to as a comparison mode.
  • Dual-core or multi-core architectures are also used in other applications to increase performance. Both cores execute different programs, program segments, and commands, which can improve performance, which is why this configuration of a multi-core system is called a performance mode or performance mode.
  • This system is also referred to as a symmetric multiprocessor system (SMP).
  • SMP symmetric multiprocessor system
  • An extension of these systems can be achieved by means of a switching, ie depending on the purpose of the multiprocessor system, this can be operated in a comparison mode or in a performance mode. In comparison mode, the output signals of the cores are compared with each other. If there is a difference, an error signal is output. In performance mode, the two cores work as a symmetric multiprocessor system
  • Microprocessors are starting from a certain clock frequency with fast buffers
  • Caches to speed access to instructions and data.
  • main memory main memory
  • main memory main memory
  • processors to speed access to instructions and data.
  • caches are implemented in such storage technologies that provide fast access to memory contents and are physically located close to the processor.
  • a subset of the data and / or instructions are cached from main memory in a cache.
  • the use of a cache memory partly avoids or reduces the slow access to a large (main) memory and then the processor does not have to wait for the delayed provision of the data by the main memory.
  • Both caches for commands and data only are known, as well as "unified caches" where both data and instructions are stored in the same cache. Systems with several levels (hierarchy levels) of caches are also known.
  • Such multi-level caches are used to optimally adjust the speeds between the processor and the (main) memory using graduated memory sizes and various addressing strategies of the caches at the different levels.
  • it is customary to equip each processor with a cache or, in the case of multi-level caches, with correspondingly more hierarchically structured caches.
  • processors each have a fixed cache, they can also be switched to different operating modes of the processor system, where they execute either different programs, program segments or commands (performance mode) or execute the same programs, program segments or commands and compare the results ( Comparison mode), the data or commands in the parallel caches of each controller must be either deleted when switching between the operating modes, or they must be provided when loading the caches with the corresponding information of the respective operating mode, preferably together with the
  • the object of the invention is to provide means and methods to optimize the size of the cache and to accelerate the process of switching between a performance mode and a comparison mode.
  • the data / instructions do not have to be fetched several times into the cache for the different execution units and / or the different operating modes and possibly maintained. - A -
  • the data / instructions in the cache do not need to be differentiated as to whether they are used in different operating modes of the multiprocessor system.
  • the data / instructions in the cache do not need to be distinguished in which operating modes of the multiprocessor system they are processed and / or in which modes of operation they have been cached or written and / or from which core they were requested or written.
  • the cache does not have to be deleted when changing the operating mode, so the cache contents have to be loaded less often and the access to the main memory is reduced. Switching can be faster than in systems that have multiple caches.
  • Two processors can simultaneously read to the same data / instructions in the cache.
  • a "write back” mode can be used for the cache, which is more time-saving, in particular because it is not always necessary to update the (main) memory but only when the data in the cache is overwritten; there are no consistency issues because the cache for both processors provides the data from the same source.
  • a method for storing data and / or commands in a computer system having at least two processing units and at least one first memory or memory area for data and / or commands, wherein switching means are provided and switched between at least two operating modes, wherein comparison means are provided and a first mode of operation corresponds to a comparison mode and a second mode of operation a performance mode, characterized in that in the device, a second memory or memory area is included, wherein the device is formed as a cache memory system and is equipped with at least two separate ports and about these Ports an access of the at least two processing units to the same or different memory cells of the second memory or memory area, wherein the data and / or commands from the first memory system are buffered in blocks and when switching.
  • a method is described, characterized in that the second memory or memory area is subdivided into at least two address areas which can be read or written independently of each other.
  • an address decoder is provided which generates enable signals which allow simultaneous access to an address range through multiple ports only one port access and prevent the access of the at least one other ports or delay , in particular by wait signals.
  • a method is described, characterized in that more than two ports are provided, selection devices being present and the access to the independent address areas being effected via the selection devices having a plurality of stages and for this purpose the enable signals being forwarded via these stages.
  • a method is described, characterized in that there is at least one mode signal which switches over the access possibilities of the different ports.
  • a method is described, characterized in that there is at least one configuration signal which switches the access possibilities of the different ports.
  • a method is described, characterized in that both processing units specify a read access.
  • a method is described, characterized in that when specifying the read access by both processing units the data and / or commands associated Kennu ⁇ - conditions or access addresses are compared and only if a read access to the cache is switched through.
  • a method is described, characterized in that, when a read access is specified, both processing units access the cache system.
  • a method is described, characterized in that the read during read access by both processing units data and / or commands are compared and in case of deviation, a signal, in particular an error signal is generated.
  • a method is described, characterized in that in a write access to the second memory or memory area to be written data and / or commands are compared and are only written if they match.
  • a method is described, characterized in that in a write access until the data and / or commands are compared, the processing units are stopped.
  • a method is described, characterized in that the data and / or commands are written to a buffer memory or buffer memory area and are taken over only after successful comparison in the second memory or memory area.
  • a method is described, characterized in that in a write access to the second memory or memory area to be written data and / or commands are written only by a processing unit and depending on a parallel comparison in discrimination the registered data and / or commands disabled or invalidated.
  • a method is described, characterized in that the blocking or invalidation is effected by setting or resetting at least one bit.
  • a method is described, characterized in that the blocking or invalidation is effected by overwriting with another, in particular old date and / or command.
  • a method is described, characterized in that the blocking or invalidation is effected by deleting a corresponding entry of a content table of the second memory or memory area.
  • Memory or memory area is included, wherein the device is designed as a cache memory system and is equipped with at least two separate ports and via these ports access the at least two processing units to the same or different memory cells of the second memory or memory area, wherein the data and / or Be - missing from the first memory system Cached blockwise.
  • a device characterized in that data and commands are stored separately in the cache memory system and so a data memory or data storage area and an instruction memory or command memory area are provided. Also advantageous is such a device, characterized in that the second
  • Memory or memory area is divided into at least two address areas that can be read or written independently.
  • an apparatus is advantageous, characterized in that an address decoder is provided which generates enable signals which, in the case of simultaneous access to one Address range through multiple ports allow only one port access and prevent the access of the at least one other ports or delay, in particular by wait signals.
  • such a device is advantageous, characterized in that more than two ports are provided, with selection devices being present and the access to the independent address areas being effected via the selection devices with multiple ports and for this purpose the enable signals being forwarded via these stages ,
  • FIG. 1 shows a multiprocessor system with two execution units, with means for comparing data of the execution units and means for switching between at least two operating modes of the multiprocessor system.
  • FIG. 2 shows a dual-port cache memory for data and / or commands, which has an internal dual-port memory, two ports for connecting one execution unit each and an interface to the data / address bus of the multiprocessor system.
  • FIG. 3 shows a dual port cache with further details.
  • FIG. 4 shows a device and a method for address transformation in a dual port cache.
  • FIG. 5 shows a division of the internal dual-port memory into two subareas which can be operated independently of one another and are accessed by two separate enable signals from each port.
  • FIG. 6 shows a realization of a dual-port memory area by a single
  • FIG. 7 shows the division of a multiple port memory with p ports into a plurality of sub-ports.
  • FIG. 7 a shows a realization of a multi-port memory area by a single-port memory by means of a port switchover.
  • FIG. 8 shows a breakdown of the RAM areas for the ports as a function of a system state or a configuration.
  • FIG. 9 shows a division of a multi-port RAM into areas as a function of one
  • FIG. 10 shows the division of a multi-port RAM into areas with multiple associative access.
  • FIG. 11/1 Ia a multiprocessor system with two execution units and a dual port cache is shown, which is preferably constructed symmetrically.
  • FIG. 12 shows a general case of a switching and comparing unit, also for use for more than two execution units.
  • FIG. 13 shows the internal switching state of the switching and comparison unit for write and read accesses to the cache memory in the performance mode.
  • FIG. 14 shows the internal switching state of the switching and comparison unit for read accesses to the cache memory in a first embodiment of the comparison mode.
  • FIG. 15 shows the internal switching state of the switching and comparison unit for read accesses to the cache memory in a second embodiment of the comparison mode.
  • FIG. 16 shows a configuration of a switching and comparison unit with write access interrupting device to the cache memory in comparison mode.
  • FIG. 17 there is shown a configuration of a switching and comparing unit with buffering means for writing access to the cache memory in the comparing mode.
  • Figure 18 shows a multiprocessor system with two execution units and with separate dual port cache memories for instructions and data.
  • a processor As an execution unit, a processor, a core, a CPU, as well as an FPU (floating point unit), a DSP (digital signal processor), a coprocessor or an ALU (arithmetic logical unit) may be referred to below.
  • FPU floating point unit
  • DSP digital signal processor
  • ALU Arimetic logical unit
  • the invention relates to a multiprocessor system (W100) shown in FIG. 1 having at least two execution units (W11a, W10b), a comparison unit (W120) and a switching unit (W150).
  • the execution units are each via signal lines or buses (Wl 12a, Wl 12b) with the comparison unit (W120) and the switching unit
  • the switching unit (W 150) has at least two outputs to two system interfaces (W130a, W130b). Memory or peripherals such as digital outputs, D / A converters and communication controllers can be controlled via these interfaces.
  • An execution unit can be implemented both as processor / core / CPU, as well as FPU (floating point unit), DSP (digital signal processor), coprocessor, ALU (Arithmetic logical Unit).
  • This multiprocessor system can be operated in at least two modes of operation, a compare mode VM and a performance mode PM.
  • the comparison unit In the performance mode PM, different commands, program segments or programs are executed in parallel in the different execution units.
  • the comparison unit is deactivated.
  • the switching unit (Wl 50) is configured in this operating mode so that each execution unit is exclusively connected to one of the system interfaces (W130a, W130b).
  • the system interfaces can be used to write a result of an execution unit into a memory (W 170) or to output it to a peripheral block (W180, W190).
  • a peripheral module may be, for example, an analog-digital converter or a communication controller of a communication system (eg SPI, LIN, CAN, FlexRay).
  • SPI serial-digital converter
  • a communication controller of a communication system eg SPI, LIN, CAN, FlexRay
  • Ignore comparator Next you can also interrupt the error signal itself. All options have in common that they create a state in the system that does not matter if two or more potentially matched data are different. If this state is reached by a measure in the comparator or its input or output signals, then the comparator is referred to as passive or deactivated.
  • comparison mode VM the same or similar commands, program segments or programs are executed in both execution units (W10a, W10b). Via the signal lines or buses (Wl 12a, Wl 12b), the output signals of the execution units are fed to the comparison unit (W 120) and to the switching unit (W 150). In the comparison unit, the two data are checked for conformity. After the comparison has been made, the changeover unit is informed via a status signal (W125) whether it is allowed to output one of the matching results to one of the system interfaces or whether it has to block the signal due to a recognized discrepancy of the results. In this case, an optional error signal (Wl 55) can be output by the comparison unit. This error signal can also be output by the switching unit instead of the comparison unit (Wl 56).
  • the comparison unit (W120) and the switchover unit (Wl 50) can also be combined into a combined switchover and comparison unit (W520).
  • n signals N140, ..., N14n go to the switching and comparison component W520. This can generate up to n output signals N160, ..., N16n from these input signals.
  • the "pure performance mode” all signals N14i are directed to the corresponding output signals N16i.
  • the "pure comparison mode” all signals N 140,..., N14n are only routed to exactly one of the output signals N16i.
  • n execution units and n> 2 more than just two operating modes are conceivable.
  • the logical component of a switching logic Nl 10 is included in this figure. This first determines how many output signals there are. Furthermore, the switching logic Nl 10 determines which of the input signals contribute to which of the output signals. An input signal can contribute to exactly one output signal. In other words, in terms of mathematical form, the circuit logic defines a function that assigns an element of the set ⁇ N160, ..., N16n ⁇ to each element of the set ⁇ N140, ..., N14n ⁇ .
  • the processing logic N 120 determines to each of the outputs N16i how the inputs contribute to that output signal.
  • the configuration of the switchover and comparison unit depends on or defines the operating mode of the multiprocessor system. In order to ensure consistent information about the operating mode within the system and, if necessary, to communicate this to external units, it is advantageous to identify the information about the operating mode in one of the system components and to make it available in one or more signals.
  • this signal can be generated in the switching and comparison unit and made available as a mode signal NI 50 to other parts of the system.
  • an error signal N 170 is shown in this figure.
  • the optional error signal is generated by fault circuit logic N130, which collects the error signals, and is either a direct forwarding of the single error signals or a bundling of the error information contained therein.
  • the mode signal Nl 50 is optional, but its use outside of this component can be used to advantage in many places.
  • the Combining the information of the switching logic NI 10 (ie the above-mentioned function) and the processing logic (ie the determination of the comparison operation per output signal, ie per function value) is the mode information and this defines or reflects the operating mode of the multiprocessor system.
  • this information is of course multi-valued, ie not representable only via a logical bit. Not all the theoretically conceivable modes are useful in a given implementation, it is preferable to restrict the number of modes allowed.
  • the mode signal then brings the relevant mode information to the outside.
  • An HW implementation is preferably shown so that the externally visible mode signal can be configured.
  • the processing logic and circuitry are also configured to be configurable. Preferably, these configurations are coordinated. Alternatively, one can give only or additionally changes of the mode signal to the outside. This has advantages especially in a two-configuration.
  • the present invention proposes the use of a dual port cache memory in a multiprocessor system having at least two execution units. Such a configuration is particularly advantageous if the multiprocessor system can switch between at least two operating modes, for example the comparison mode described above and the performance mode.
  • FIG. 2 shows a dual port cache 200, which essentially consists of a dual port
  • the dpRAM 230 is preferably provided with two independent address decoders, two data read / write stages and, in contrast to a single memory cell matrix, also with duplicated word and bit lines, so that at least the read operation for any memory cells of the dpRAMs from both ports can be done simultaneously.
  • a dual port RAM is therefore understood to mean any RAM which has two ports 231 and 232 which are independent can be used by one another without considering how much time is needed to complete a request to read or write from that port, ie, how long it takes for the requested read or write operation to interact with requests from that port another port is completed.
  • the two ports of the dpRAM are via the signals 201 and 202 with the devices 210 and 220, respectively which perform an examination of the incoming addresses, data and control signals 211 and 221, respectively, of independent execution units 215 and 225 and optionally transform the addresses.
  • the execution units 215 and 225 correspond to the execution units Wl 10a and Wl 10b from FIG. 1.
  • the data are read from the cache via 201 through 210 to 211 or 202 through 220 to 221 output in the reverse direction of the execution units in the cache memory.
  • Both ports of the dpRAM are connected via the signals 201 and 202 to a bus access controller 240 which is connected to signals 241 which connect to a main memory (not shown) or to a next level cache.
  • the cache may be partially or fully associative, i.
  • the data can be stored in several or even arbitrary locations of the cache.
  • the address In order to enable access to the dpRAM, the address must first be determined by means of which the desired data / commands can be accessed. Depending on the addressing mode, one or more block addresses are selected at which the data addressed by the execution unit is searched for in the cache. All these blocks are read and the identifier stored in the cache with the data is compared with the index address (part of the original address). In the case of coincidence and after the additional validation with the help of the control bits likewise stored in the cache for each block (for example valid bits, dirty bits and process bits).
  • a cache hit signal is generated indicating the validity of the date.
  • a table is preferably used which is arranged in a memory unit 214 or 224 (register or RAM, also referred to as TAGRAM) shown in FIG. 2 and is located in the units 210 and 220, respectively.
  • the table is an address transformation unit that both converts the virtual address to a physical address and, in the case of a direct-mapped cache, provides the exact (unique) cache access address; in a multi-associative cache organization, multiple blocks are addressed, and in a fully associative cache, all blocks of the cache must be read and compared.
  • Such an address transformation unit is described, for example, in US Pat. No. 4,669,043.
  • Table stores the access address of the dpRAM for each address or address group of a block. 4, the significant address bits (index address) for the table are used as the address and the content is the access address of the dpRAM.
  • a block is the number of bytes which, in the case of a cache miss (lack of required data in the cache), are jointly fetched from the (main) memory into the cache when an address from this area is read-accessed ,
  • the address bits significant for the block are transformed with the table and the remaining (low-order) address bits are adopted unchanged.
  • one of the two ports is set to a higher priority, ie it is prevented that two ports are simultaneously written.
  • This precision can also be performed dynamically, for example by the switching and comparison unit of the multiprocessor system or depending on the addressed memory area. Only when the preferred port has performed the write operation may the other port write; if necessary, only one processor has write access for correspondingly allocated memory areas. Similarly, with any write operation to a memory cell, one can prevent the same memory cell from being read from the other port, or the read operation can be delayed by pausing the read-to-write processor until the write operation is complete. For this purpose, an address comparator of all address bits (251) shown in FIG.
  • the output signals 213 and 223 can each occupy three signal states in an advantageous embodiment: enable, wait, equal. For a pure instruction cache, write access is not necessary; In this case, a signal state "equal" is sufficient for the output signals 213 and 223.
  • the address comparator prevents the data from being recalled from the memory if there is no hit, but a signal equal (constituent or state of 213 and 223) is indicated by the address comparator. In the case of reading via both ports, the equal signal is formed only by the significant address bits because the entire block is always fetched from the memory. Only when the
  • Block is stored in the cache, the waiting execution unit can access the cache.
  • two separate dual port caches for data and for instructions / instructions are provided, wherein in the instruction cache usually no write operations are provided.
  • the address comparator always checks in this case only
  • the simultaneous read access from both ports to the internal memory will only work fully if the requested data is in different address ranges that allow concurrent access. This can be saved in the hardware implementation expenses, because not all access mechanisms must be duplicated in memory.
  • the cache has been implemented in several sub-memory areas that can be operated independently of each other. Each sub-memory enables only the processing of a port via enable signals.
  • FIG 5 such a memory 230 is shown, which includes two partial storage areas 235 and 236.
  • an address bit A becomes the two enable signals.
  • the two e- nable signals and the low-order address bits A i ... A 0 are included.
  • the 4 enable signals can be generated from two address bits, since each partial memory uniquely serves a specific address range. For example, with the 2 address bits A + i and A 1, four partial memory areas can be addressed by generating the four enable signals E 0 to E 3 corresponding to the binary significance in accordance with Table 1.
  • FIG. 5 For the partial memories 235 and 236 shown in FIG. 5, an exemplary embodiment is shown in FIG.
  • the designated there 260 part memory is executed in this particular embodiment as a single port RAM 280, the addresses, data and control signals are switched depending on the requirement. Switching is accomplished by a control circuit 270 by means of a multiplexer 275, depending on the enable signals and other control signals 2901 and 2902 (e.g., read, write) from the respective ports. These signals are included together with the data and addresses in the signals 233 and 234, respectively, and are fed to the multiplexer 275 via 5281 and 5282, respectively, depending on the decision of the control circuit 270 corresponding to the output signal 2701 either 5281 or 5282 with the signals 2801 connects.
  • a direct memory is executed in this particular embodiment as a single port RAM 280, the addresses, data and control signals are switched depending on the requirement. Switching is accomplished by a control circuit 270 by means of a multiplexer 275, depending on the enable signals and other control signals 2901
  • the control circuit can make the forwarding of the signals 5281 or 5282 to 2801 and thus to the single port RAM 280 and also forward the data and other signals from 280 in the opposite direction. This is done in response to a valid enable signal and the signals 233 and 234 and / or the order in which the ports cause a read or write operation to the memory 280 via these signals. If the read or write signals become active in the signals 233 and 234 at the same time, one becomes previously defined port first served. This preferred port remains connected to 2801 even if no read or write signal is active. Alternatively, the preferred port may be set dynamically by the processor system, preferably depending on state information of the processor system.
  • This arrangement with a single port RAM is less expensive than a dual port RAM with parallel access, but delays the processing of at least one execution unit when a sub memory area is accessed simultaneously (also read).
  • This arrangement can also be expanded to accesses of more than two execution units:
  • a multi-port RAM can likewise be implemented in the same way if the switching over of the addresses, data and control signals is provided successively in succession via several multiplexers (FIG. 7).
  • Such a multi-port RAM 290 is shown in FIG. There, the port input signals 261, 262, ... 267 are decoded in the decoding devices 331, 332, ..337 to the signals 291, 292 ... 297. This decoding generates the enable signals for the accesses to the individual RAMs in 281, 282 and 288.
  • Fig. 7a is an embodiment of a
  • Partial memory 28x (281 ... 288) closer shown.
  • the enable signals and control signals 3901, 3902, ... 3908 are processed from the control signals 291, 292 ... 298 to the output signals 3701, .. 3707.
  • These output signals each control a multiplexer 375 which, depending on the signal value, establishes the connections of the buses 381 or 382 to 387 or 388 with the signals 481... 488.
  • similar controllers 370 and multiplexers 375 are switched accordingly until in a last stage the signals 5901 and 5902 are used for the controller.
  • the output signal 5701 then connects either 581 or 582 to 681 connected to the single port RAM 280.
  • the multiplexers 375 of FIG. 7a connect, in addition to the address, data and control signals, also the enable signals of the next stages, which in 381, 382 ... 388 are included. Furthermore, comparison means may be included in 375 which determine the validity of the data read from the subareas in the case of a multi-associative addressing mode.
  • FIG. 8 shows an example of a configurable dual port cache for this purpose.
  • a mode or configuration signal 1000 is used in decoding the input signals for each of the two ports.
  • This configuration signal may correspond to the mode signal Nl 50 of FIG. 12, which may include portions of the information of the mode signal Nl 50 or may be formed from combination of Nl 50 with other information of the multiprocessor system.
  • each port will only have access to half of the cache, but each port will have full access to that area (without the activity on the other port).
  • the address bit A is not used to address the cache (in the direct-mapped mode), but data that differ in addressing only in this bit are stored in the same place in the cache. Only when reading the cache content can be determined by the
  • Identifier then be found out whether it is the date sought and accordingly the cach-hit signal generated.
  • the data including identifier and control bits are available via the signals 291, 292,... 297 to the ports 331, 332,... 337 and further to the signals 261, 262,. .267.
  • the various execution units do not interfere with accessing as far as possible only independent cache areas via the various ports. Since these conditions depend on the programs intended for the application, it is advantageous if there is the possibility of a different configuration depending on the application. on the other hand If the system state (comparison mode / performance mode) changes, the cache can be automatically switched over by the mode signal 1000.
  • a bivalent mode signal 1000 is not sufficient. In this case, it is advantageous to perform the mode signal in a multi-valued fashion or to introduce another mode signal 2000, not shown in FIGS. 8-10.
  • FIG. 10 Another embodiment is shown in Figure 10 when there is a multi-associative cache in which the data is read back from each sub-memory 281, 282, ... 288 along with the identifier and control bits.
  • the comparators 2811, 2812, ... 2817, 2821, 2822, ... 2827, ... 2881,2882, .. 2887 the validity is then checked and
  • FIG. 11 shows a multiprocessor system (W500) with two execution units (W510a, W510b), a switchover and comparison unit (W520) and a dual port cache memory (W550).
  • the comparison unit (W520) signals via signal path W518 if an error has occurred in comparison mode.
  • the execution units are each connected via signal lines or buses (W512a, W512b) to the switching and comparison unit (W520), which in turn has two connections (W513a, W513b) to a dual port cache memory and two connections (W514a, W514b ) to two system interfaces (W535, W545).
  • the signals (W512a, W512b) correspond to the signals 211 and 221 from FIG. 2 and have the same or similar amount of data.
  • the dual port cache memory (W550) is connected via a connection 241 and an optional memory interface (W530) with a (main)
  • the system interfaces can be used to connect additional units of the processor system (W580), such as Coprocessors or timer units and peripherals (W590) such as digital outputs, D / A converters and communication controllers.
  • An execution unit can be implemented both as processor / core / CPU, as well as FPU (floating point unit), DSP (digital signal processor), coprocessor, ALU (Arithmetic logical Unit).
  • FIG. 11a the multiprocessor system (W500) of FIG. 11 is abstracted and shown for clarity without the system interfaces (W535, W545), the connections (W514a, W514b) and the units W580 and W590.
  • This multiprocessor system can be operated in at least two modes of operation, a compare mode VM and a performance mode PM.
  • FIG. 13 shows the internal switching state of the switching and comparison unit (W520) for write and read access to the cache memory in the performance mode.
  • the comparator (W522) of the switching and comparison device (W520) is deactivated, ie no data is compared.
  • a third possibility is to ignore at system level the status or error signal (W518, shown in FIG. 14) of the comparator. Furthermore, one can also interrupt the error signal itself. All options have in common that they create a state in the system that does not matter if two or more potentially matched data are different. If this state is reached by a measure in the comparator or its input or output signals, then the comparator is referred to as passive or disabled.
  • each execution unit is connected to one port of the dual port cache
  • Port Cache memory and thus a higher priority execution unit, i. it is prevented that is written from both execution units to the same memory cell or the same memory block simultaneously.
  • This priority can also be dynamic, e.g. be determined by the switching and comparison unit of the multiprocessor system or depending on the addressed memory area. Only when the preferred port the
  • Write operation may write the other port.
  • An access conflict eg a simultaneous write access of two execution units to the same memory cell or the same memory block, can be recognized and resolved as follows:
  • the signals in the signals 211 and 221 (corresponding to the signals 512a and 512b in FIG. 11) contained addresses 212 and 222 of the processing units 215 and 225 in an address comparator 251
  • the device 250 compared with each other and tested together with the control signals also transmitted in 211 and 221 for compatibility. If these addresses are the same, and if at least one of the addresses is to be accessed in writing, then there is a conflict.
  • the access of at least one port to the dual port RAM 230 is prevented by means of the control signals contained in the signals 213 or 223.
  • a cache miss i. if, in a read access of a first execution unit to the cache memory, the requested data or instruction is not included in the cache memory, the data or instruction must be transferred via the bus system from a program or data memory, such as main memory (FIG. W570 from FIG. 11).
  • the incoming data is forwarded to the execution unit that requested that data and is written to the cache in parallel along with the identifier and control bits. If, during the transmission of a memory block from the main memory into the cache memory, an address from the same memory block is addressed, for example, by a second execution unit, this is detected by the address comparator and this prevents the memory block from being re-transmitted from the main memory.
  • the switching and comparing unit (W520) is advantageously arranged between the execution units (W510a, W510b) and the dual port cache memory W550. This allows several advantageous in comparison mode
  • each execution unit is connected via the signal connections W512a and W512b to the switching and comparison unit (W520) and thus also to the comparator (W522) contained therein according to FIGS. 15 and 16.
  • FIG. 14 and FIG. 15 the internal shading of the switching and comparison unit is shown for two embodiments of a read access in comparison mode.
  • the switch (W525) is in the same position as in the performance mode, ie each of the execution units (W510a, W510b) is connected to one port of the dual-port cache.
  • Memory (W550) connected. Since the two execution units execute the same programs, program segments or commands in comparison mode, they also access the same memory addresses. First, the addresses can be checked by the comparator (W522) and the transfer of the addresses takes place only if they match. Furthermore, it is alternatively or additionally possible to read out the data via different ports of the cache memory
  • Dual port cache memory e.g. according to FIG. 3, in which at the same time read access can be made to the same memory cell via the two ports.
  • the changeover switch here designated W526, is in a different switching position.
  • W513a connection to the dual port cache memory occurs only if the access addresses routed to the comparator via the signal connections W512a and W512b are identical. If the comparator detects a discrepancy of the addresses, the memory access via the signal W517 and the delay and interruption device W519 can be prevented.
  • a configuration of the switch and compare unit (W520) for write access to the dual port cache memory in the compare mode In this case, the access addresses guided via the signal connections W512a and W512b to the comparator and the data to be written are compared. If coincident, only the data of one execution unit, in the embodiment shown, the data obtained via the connection W512a, is written into the dual port cache memory via a connection (W513a) and a port. In the event of a discrepancy of the addresses and / or data obtained via the connections W512a and W512b, in a first exemplary embodiment the access to the cache memory is made via the signal W517 and the delay and interrupt device
  • the switch W527 sets the signal connection W512b exclusively to the comparison W522.
  • the access addresses passed through the signal links W512a and W512b to the comparator and the data to be written are compared, and at the same time, the data obtained through a link (e.g., W512a) is written in a buffer memory W529.
  • the buffer memory W529 is supplied with information representing the comparison result.
  • the information stored in the buffer memory is written into the dual port cache memory via a connection (e.g., W513a) and a port.
  • the comparator detects a discrepancy, the data in the buffer memory is not released for write access and advantageously deleted. This results in the advantage that only valid or correct data is written into the cache, but no delay of the execution units takes place.
  • one of the two execution units writes the data to the cache memory.
  • a comparison of the addresses and / or data of the two execution units obtained via the connections W512a and W512b takes place in parallel with the writing or after the writing of the data into the cache. If the comparator detects a discrepancy, the written date in the cache must subsequently be declared invalid.
  • the cache memory is provided by the switching and comparison unit with information representing a comparison result via a comparison / error signal (W518 from FIG. 11a and FIGS. 14-17 or N170 from FIG. 12).
  • the already written date is then marked as invalid by setting a bit in a status or control register of the cache or by deleting information from the HIT table, or replaced with an old date if it is still in the cache or in the system is available.
  • the advantage of this embodiment is that the execution units are not delayed by the comparison of the addresses and / or data and no buffer memory in the switching and comparison unit is necessary.
  • FIG. 18 shows a multiprocessing system (W501) with two separate dual port cache memories W550c and W550d.
  • W550c for example, is an instruction cache accessed read-only by the execution units (W510a, W510b) via W513ac or W513bc
  • W550d is a data cache to which the execution units access reading and writing via W513ad or W513bd can.
  • Each cache has access to the (main) memory via the signal connection 241c and the memory access unit W530c or the signal connection 241d and the memory access unit W530d.
  • the multiprocessor system (W501) is shown abstracted in FIG. 18 for better clarity.
  • Units W580 and W590 and memory system W570 may be wholly or partially included in the multiprocessor system (W501) of FIG.
  • the characterizing features of the invention may also be applied to multiprocessor systems having more than two execution units. Changes are necessary mainly in the switching and comparison unit.
  • the dual port cache can be extended to a multi-port cache, as described for example in the exemplary embodiments illustrated in FIG. 9 and FIG.
  • the number of execution units of the multiprocessor system and the number of ports of the multi-port cache and the number of partial storage areas of the cache memory need not be identical.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Hardware Redundancy (AREA)
  • Multi Processors (AREA)

Abstract

La présente invention concerne un procédé et un dispositif pour enregistrer des données et/ou des ordres dans un système informatique comprenant au moins deux unités de traitement et au moins une première mémoire ou zone de mémoire pour des données et/ou des ordres. Des systèmes de commutation permettent la commutation entre au moins deux modes de fonctionnement. Des systèmes de comparaison sont également prévus. Un premier mode de fonctionnement correspond à un mode comparaison et un second mode de fonctionnement correspond à un mode performance. Cette invention est caractérisée en ce que le dispositif comprend aussi une seconde mémoire ou zone de mémoire, le dispositif est conçu sous forme de système de mémoire cache et le dispositif est équipé d'au moins deux ports distincts. Ces ports permettent auxdites unités de traitement d'accéder à des cellules mémoire similaires ou différentes de la seconde mémoire ou zone de mémoire. Les données et/ou les ordres provenant du premier système de mémoire sont alors entrés dans la mémoire tampon par blocs et il y a commutation.
PCT/EP2006/064661 2005-08-08 2006-07-26 Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres WO2007017376A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2008525519A JP2009505181A (ja) 2005-08-08 2006-07-26 少なくとも2つの処理ユニットと、データおよび/または指令のための少なくとも1つの第1のメモリもしくはメモリ領域とを有する計算機システム内で指令および/またはデータを記憶するための方法および装置
EP06777976A EP1915695A1 (fr) 2005-08-08 2006-07-26 Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102005037215.5 2005-08-08
DE102005037215A DE102005037215A1 (de) 2005-08-08 2005-08-08 Verfahren zur Speicherung von Daten und/oder Befehlen in einem Rechnersystem mit wenigstens zwei Verarbeitungseinheiten und wenigstens einem ersten Speicher oder Speicherbereich für Daten und/oder Befehle

Publications (1)

Publication Number Publication Date
WO2007017376A1 true WO2007017376A1 (fr) 2007-02-15

Family

ID=37192655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2006/064661 WO2007017376A1 (fr) 2005-08-08 2006-07-26 Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres

Country Status (5)

Country Link
EP (1) EP1915695A1 (fr)
JP (1) JP2009505181A (fr)
CN (1) CN101243415A (fr)
DE (1) DE102005037215A1 (fr)
WO (1) WO2007017376A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345910B (zh) * 2013-06-09 2015-11-18 苏州国芯科技有限公司 单端口调色板sram控制器及其控制方法
US11269777B2 (en) * 2019-09-25 2022-03-08 Facebook Technologies, Llc. Systems and methods for efficient data buffering

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247649A (en) 1988-05-06 1993-09-21 Hitachi, Ltd. Multi-processor system having a multi-port cache memory
US6101589A (en) * 1998-04-01 2000-08-08 International Business Machines Corporation High performance shared cache
DE10332700A1 (de) 2003-06-24 2005-01-13 Robert Bosch Gmbh Verfahren zur Umschaltung zwischen wenigstens zwei Betriebsmodi einer Prozessoreinheit sowie entsprechende Prozessoreinheit

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01280860A (ja) * 1988-05-06 1989-11-13 Hitachi Ltd マルチポートキヤツシユメモリを有するマルチプロセツサシステム
JPH0973436A (ja) * 1995-09-05 1997-03-18 Mitsubishi Electric Corp 多重化計算機における動作モード切替方式
US20070277023A1 (en) * 2003-06-24 2007-11-29 Reinhard Weiberle Method For Switching Over Between At Least Two Operating Modes Of A Processor Unit, As Well Corresponding Processor Unit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247649A (en) 1988-05-06 1993-09-21 Hitachi, Ltd. Multi-processor system having a multi-port cache memory
US6101589A (en) * 1998-04-01 2000-08-08 International Business Machines Corporation High performance shared cache
DE10332700A1 (de) 2003-06-24 2005-01-13 Robert Bosch Gmbh Verfahren zur Umschaltung zwischen wenigstens zwei Betriebsmodi einer Prozessoreinheit sowie entsprechende Prozessoreinheit

Also Published As

Publication number Publication date
JP2009505181A (ja) 2009-02-05
DE102005037215A1 (de) 2007-02-15
CN101243415A (zh) 2008-08-13
EP1915695A1 (fr) 2008-04-30

Similar Documents

Publication Publication Date Title
DE69732181T2 (de) Verfahren und gerät zum zwischenspeichern von systemverwaltungsinformationen mit anderen informationen
DE4218003C2 (de) Cache-Steuereinrichtung für ein sekundäres Cache-Speichersystem
DE4335475A1 (de) Datenverarbeitungseinrichtung mit Cache-Speicher
DE112010004963T5 (de) Synchronisieren von SIMD Vektoren
DE112005002420T5 (de) Verfahren und Vorrichtung zum Pushen von Daten in den Cache eines Prozessors
DE10297166T5 (de) Mechanismus zur Interrupt-Abwicklung in Computersystemen, welche die gleichzeitige Ausführung mehrerer Threads unterstützen
WO2007017373A1 (fr) Procede et dispositif pour memoriser des donnees et/ou des ordres dans un systeme de calcul comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres
DE19807872A1 (de) Verfahren zur Verwaltung von Konfigurationsdaten in Datenflußprozessoren sowie Bausteinen mit zwei- oder mehrdimensionalen programmierbaren Zellstruktur (FPGAs, DPGAs, o. dgl.
DE112006003453T5 (de) Per-Satz-Relaxation der Cache-Inklusion
DE60025788T2 (de) Flexibles Mehrzweck-Ein/Ausgabesystem
DE4417068A1 (de) Verfahren und Einrichtung zum Betreiben eines Einzel-Prozessor-Computersystems als Mehr-Prozessor-System
DE102007018033A1 (de) Kohärenzverzeichnisaktualisierung
DE3502147C2 (fr)
DE102007006190A1 (de) Techniken zur Verwendung von Speicher-Attributen
DE3650782T2 (de) Anordnung von Cachespeicherverwaltungseinheiten
EP1915695A1 (fr) Procede et dispositif pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres
EP1915689B1 (fr) Procede et dispositif pour surveiller une unite memoire d'un systeme a plusieurs processeurs
EP1915686B1 (fr) Procede et dispositif pour fixer un etat de depart dans un systeme de calcul comprenant au moins deux unites d'execution par marquage de registres
EP1917593B1 (fr) Procede et dispositif pour commander un acces en memoire dans un systeme informatique comprenant au moins deux unites d'execution
WO2007017367A1 (fr) Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres
DE102014012155A1 (de) Verbesserte verwendung von speicherressourcen
EP1915683B1 (fr) Procede et dispositif pour commander un acces a la memoire dans un systeme informatique comprenant au moins deux unites d'execution
EP1915685B1 (fr) Procede et dispositif pour commander l'acces a une memoire dans un systeme informatique comprenant au moins deux unites d'execution
DE10025952A1 (de) Vorabruf-Puffer
EP1915674B1 (fr) Procede et dispositif pour commander un systeme informatique comprenant au moins deux unites d'execution et au moins deux groupes d'etats internes

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2006777976

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2008525519

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 200680029401.3

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2006777976

Country of ref document: EP