EP1915684A1 - Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres - Google Patents

Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres

Info

Publication number
EP1915684A1
EP1915684A1 EP06777936A EP06777936A EP1915684A1 EP 1915684 A1 EP1915684 A1 EP 1915684A1 EP 06777936 A EP06777936 A EP 06777936A EP 06777936 A EP06777936 A EP 06777936A EP 1915684 A1 EP1915684 A1 EP 1915684A1
Authority
EP
European Patent Office
Prior art keywords
data
memory
cache
counter
comparison
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06777936A
Other languages
German (de)
English (en)
Inventor
Reinhard Weiberle
Bernd Mueller
Eberhard Boehl
Yorck Collani
Rainer Gmehlich
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Robert Bosch GmbH
Original Assignee
Robert Bosch GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch GmbH filed Critical Robert Bosch GmbH
Publication of EP1915684A1 publication Critical patent/EP1915684A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/1641Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1683Temporal synchronisation or re-synchronisation of redundant processing components at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/845Systems in which the redundancy can be transformed in increased performance

Definitions

  • Computer system with at least two execution units and at least one first memory or memory area for data and / or commands
  • the present invention relates to microprocessor systems with a cache, and in this context describes a dual-port cache, and more particularly its application to an application in a data processing system having at least two execution units that selectively operate independently or handle the same tasks.
  • processors are cached to speed access to instructions and data. On the other hand, this is necessary with the constantly growing amount of data on the one hand and the increasing complexity of data processing with ever faster processors on the other hand.
  • a cache partially avoids slow access to large (main) memory, and the processor then does not have to wait for the data to be provided.
  • Both caches for commands and data only are known, but also "unified caches" where both data and instructions are stored in the same cache.
  • Systems with several levels (hierarchy levels) of caches are also known. Such multi-level caches are used to optimally adjust the speeds between the processor and the (main) memory using graduated memory sizes and various addressing strategies of the caches at the different levels.
  • the unit for comparison generated data (the comparator) is arranged after the caches in a possible embodiment.
  • the result is that the data can not be compared until it is written back from the cache to the main memory, which can lead to a delayed evaluation of the validity of data.
  • the comparator between the execution units and the cache, then the data transfer between the execution unit and cache Slowed down by the higher electrical load on the signals.
  • the object of the invention is to ensure that a comparison of the data can take place in real time with the storage in the cache and independently of the time of writing back into the main memory.
  • the data transfer between at least one execution unit and the cache should not be affected by higher electrical bus load.
  • a realization of a dual port cache memory is not obvious because of the increased hardware complexity in known processor systems with one or more execution units (single or multiple cores).
  • a dual port cache architecture can advantageously be used.
  • the main advantage over multi-cache multi-processor systems is that when switching between The operating modes of the multiprocessor system, the content of the cache must not be deleted or invalidated because the data is stored only once and therefore remain consistent even after a switch.
  • a dual-port cache in a multiprocessor system with multiple operating modes has the distribution that the data / instructions do not need to be cached several times and possibly maintained. Furthermore, only one memory space per datum / command has to be provided in hardware, even if this datum or instruction is used by several execution units. Moreover, in different modes of operation of the multiprocessor system, the data need not be distinguished as to the mode in which they were processed or fetched. A particular advantage is that the cache does not have to be deleted when changing the operating mode. With a dual port cache, two processors can read simultaneously to the same data / instructions. A particular advantage is further that instead of the "write through" mode, a "write back" mode for the cache can be used.
  • This method does not require constantly updating the (main) memory, but only when overwriting the data in the cache. There are no consistency problems because the cache for both processors provides the data from the same source and also the comparison time of the data is not affected because the comparison is independent of the write back to main memory.
  • An asymmetrical dual port cache structure as proposed here according to the invention, has the particular advantage that the writing of the data into the cache is not hindered, at least for one execution unit, but on the other hand, it is not necessary to wait until the data from the Cache be written in the skin memory. This makes it possible for the data to be written back to main memory in blocks only if the block in the cache is replaced by another block ("write back" mode).
  • the execution units In a "write back" mode, the execution units only work with the cache as long as the data is available there.When writing an execution unit in the cache, the dirty bit is set to indicate that the block's data is no longer with the cache If the participating execution units work with the shared cache, the data in the main memory need not be updated as long as the block in question remains in the cache, but multiple data words can be changed several times without affecting the data consistency of the execution units.
  • a device for storing data and / or commands in a computer system having at least two execution units and at least a first memory or memory area for data and / or commands, switching means are provided and is switched between at least two operating modes, wherein comparison means are provided and a first operating mode corresponds to a comparison mode and a second operating mode corresponds to a performance mode, characterized in that a second memory or memory area is contained in the device, wherein the device is designed as a cache memory system and is equipped with at least two separate ports, one port directly with one first execution unit is connected and between the second port and the at least second execution unit, a third device is included, which is designed such that an access of the second execution unit to the second memory o the memory area is via the third device.
  • a device is advantageous, characterized in that at least one memory means is present in the switching means and / or the comparison means and the switching takes place by at least one bit in the memory means.
  • switchover takes place by at least one external or internal signal to the computer system.
  • such a device is advantageous, characterized in that, in the performance mode, the third device of the directly connected execution unit ensures read and write access to the second memory device via the connected port.
  • such a device is advantageous, characterized in that there is at least one counter in the cache memory system which is incremented or decremented by the first execution unit via the first port of this memory of the counter upon each comparison of comparable data being stored.
  • a second counter is provided in the third device and the counter value of the counter is used to set the second counter of the third unit.
  • a method for storing data and / or commands in a computer system having at least two execution units and at least one first memory or memory area for data and / or commands is described, wherein switching means are provided and switched between at least two operating modes, wherein comparison means are provided and a first operating mode corresponds to a comparison mode and a second operating mode corresponds to a performance mode, characterized in that a second operating mode corresponds to a comparison mode
  • Memory or memory area is provided, which contain in a cache system and is equipped with at least two separate ports, wherein the first execution unit accesses the second memory or memory area directly via a first port and the second execution unit accesses the second memory or memory area via a third device.
  • the third device contains memory means, in which data and / or signals can be stored by the connected execution unit and, wherein the third unit, independently of the state of said execution unit data with the second memory or memory area can exchange
  • a method is described, characterized in that the third device receives data and / or addresses and / or control signals from the second processing device and then accesses the corresponding data in the second memory or memory area read or write.
  • a method is described, characterized in that the cache memory system decides on the presence of the data and in the case of non-presence sends a signal to the third device.
  • a method is described, characterized in that the data and / or commands in the third unit are checked for their validity and forwarded in the case of validity.
  • a method is described, characterized in that the validity is checked on the basis of additional information stored with the data and / or commands.
  • a method is described, characterized in that synchronization signals are sent to the respective execution units when switching to the comparison mode.
  • a method is described, characterized in that an error is signaled in the case of a comparison and a deviation of the data to be compared.
  • a method is described, characterized in that in the case of voting and a deviation of at least one date from the data to be voted the status and / or an error is signaled.
  • a method is described, characterized in that in the cache memory system, a counter is provided and the counter outputs when switching to the comparison mode on the corresponding connected port, the counter value and this in the third
  • a method is described, characterized in that a second counter is provided in the third device and the counter value of the counter is used to set the second counter of the third unit.
  • a method is advantageously described, characterized in that the counter present in the cache memory system is assigned to a port and is set to a fixed value when the comparison mode is switched on in the processing device connected to the respective port.
  • FIG. 1 shows a dual port cache for data and / or commands.
  • Figure 2 shows a dual port cache with details.
  • FIG. 3 shows a transformation table of 214 and 224, respectively.
  • FIG. 4 shows a division of the dpRAM into two subareas which can be operated independently of one another and are accessed with two separate select signals from each port in the access.
  • Table 1 describes the generation of 4 select signals from 2 address bits by means of decoding
  • Figure 5 shows a realization of a dual port RAM area by a single port RAM by means of port switching.
  • FIG. 6 shows a division of a multiple port RAM with p ports into a plurality of sub-ports.
  • Figure 7 shows a realization of a multi-port RAM area by a single port RAM by means of port switching.
  • FIG. 8 shows a division of the RAM areas for the ports as a function of a system state or a configuration.
  • Table 2 describes the generation of two select signals on each port from one
  • Table 3 describes the generation of two select signals on each port from one
  • FIG. 9 shows a division of a multi-port RAM into areas as a function of a system state or a configuration by generation of the corresponding select signals.
  • FIG. 10 shows a division of a multi-port RAM into areas with multiple associative access.
  • Figure Bl shows the basic principle of asymmetric DCSL architecture with dual port
  • FIG. B2 shows the basic principle of a switching and comparison unit for two execution units.
  • FIG. B3 shows a read request unit B 106 in the inactive state
  • FIG. B4 shows an alternative access of an execution unit to the memory without
  • Figure B5 shows a read request to a cache by a unit if a compare bit is set and data is output from an execution unit BI 10.
  • Table 4 describes the structure of a cache.
  • Table 5 explains one possible design of a control bit.
  • Figure B7 shows a dual port cache through single port RAM and access control.
  • FIG. B8 shows the basic structure of an asymmetrical data processing unit with the possibility of switching between several modes and a multi-port cache.
  • execution unit can in the following both a processor, a core, a CPU, as well as an FPU (Floating Point Unit), a DSP (Digital Signal Processor), a coprocessor or a
  • ALU Arimetic logical Unit
  • the dual port cache 200 essentially consists of a dual port RAM (dpRAM, 230).
  • This dpRAM 230 is preferably provided with two mutually independent address decoders, two data read / write stages and, in contrast to a simple memory cell matrix, also with duplicated word and bit lines, so that at least the read operation for any memory cells of the dpRAMs of Both ports can be done simultaneously.
  • a dual port RAM is therefore understood to mean any RAM which has two ports 231 and 232 which can be used independently, without considering how much time is needed to complete a request to read or write from that port, ie How long does it take for the requested read or write to complete, possibly interacting with requests from the other port?
  • the two ports of the dpRAM are via the signals 201 and 202 with the devices 210 and
  • the data is output via 201 through 210 to 211 or via 202 by 220 to 221 or written in the reverse direction by the execution units into the cache memory.
  • Both ports of the dpRAM are connected via signals 201 and 202 to a bus access controller 240 which is connected to signals 241 which connect to a main memory (not shown) or to a next level cache.
  • the units 210, 220 and 250 are described in more detail.
  • the addresses 212 and 222 of the execution units 215 and 225 contained in the signals 211 and 221 are compared in an address comparator 251 of the device 250 and checked for compatibility together with the control signals also transmitted in 211 and 221.
  • access to the dual port RAM 230 is prevented by means of the control signals contained in the signals 213 or 223.
  • the cache may be partially or fully associative, i.
  • the data can be stored in several or even arbitrary locations of the cache.
  • the address In order to enable access to the dpRAM, the address must first be determined by means of which the desired data / commands can be accessed. Depending on the addressing mode, one or more block addresses are selected where the date is searched in the cache. All these blocks are read and the identifier stored in the cache with the data is compared with the index address (part of the original address). If there is a match and after the additional validation by means of the control bits also stored in the cache for each block (e.g., valid bits, dirty bits and process ID), a cache hit signal is generated indicating the validity.
  • a table is preferably used which is arranged in a memory unit 214 or 224 (register or RAM, also referred to as TAG RAM) shown in FIG. 2 and is located in units 210 and 220, respectively.
  • the table is an address transformation unit which converts both the virtual address into a physical address. and, in the case of a direct-mapped cache, provides the exact (unique) cache access address; in a multi-associative cache organization, multiple blocks are addressed, and in a fully associative cache, all blocks of the cache must be read and compared.
  • Such an address transformation unit is described, for example, in US Pat. No. 4,669,043.
  • Table stores the access address of the dpRAM for each address or address group of a block.
  • the significant address bits (index address) for the table are used as the address and the content is the access address of the dpRAM (FIG. 3).
  • a block is the number of bytes which, in the case of a cache miss (lack of the required data in the cache), are jointly fetched from the memory into the cache when an address from this area is read-accessed.
  • this bbck transfer can take place consecutively in several parts, or it can also be carried out in parallel.
  • the address bits significant for the block are transformed with the table and the remaining (low-order) address bits are adopted unchanged.
  • one of the two ports is set to a higher priority, i. it prevents that is written from both ports simultaneously. Only when the preferred port has performed the write operation may the other port write; if necessary, only one processor has write access for correspondingly allocated memory areas.
  • any write operation to a memory cell one can prevent the same memory cell from being read from the other port, or the read operation can be stored by pausing the read-to-write processor until the write operation is completed.
  • an address comparator of all address bits (251) shown in FIG. 2 is provided with a corresponding arbiter 252 which also evaluates the control signals of the processors and forms the output signals 213 and 223 which control these sequences.
  • Output signals 213 and 223 can each occupy three signal states in an advantageous embodiment: select, wait, equal.
  • select For a pure instruction cache is a Scrap access unnecessary; In this case, a signal state "equal" is sufficient for the output signals 213 and 223.
  • the date or command In the case of a cache miss, the date or command must be fetched from a program or data memory via the bus system.
  • the incoming data is forwarded to the execution unit and written to the cache in parallel with the identifier and control bits.
  • the address comparator prevents retrieving the date from the memory if there is no hit but a signal equal (constituent or state of 213 and 223) is displayed by the address comparator.
  • the signal equal is formed in the case of two-sided reading only of the significant address bits, because always the whole
  • Block is fetched from memory. Only when the block is cached can the waiting execution unit access the cache.
  • two separate dual port caches for data and for commands are provided, wherein in the latter usually no write operations are provided.
  • the address comparator only checks for equality of the significant address bits and provides the corresponding control signal "equal" in the signals 213 and 223, respectively.
  • the concurrent read access from both ports will only work fully if the requested data is in different address ranges that allow concurrent access. This can be saved in the hardware implementation expenses, because not all access mechanisms must be duplicated in memory.
  • the cache can be implemented in several sub-memory areas that can be operated independently of one another. Each partial memory allows only the execution of a port via select signals. In Figure 4, such a memory 230 is shown, which includes two partial storage areas 235 and 236.
  • the two select signals and the low-order address bits A 1-I ... A 0 are included.
  • the 4 select signals can be generated from two address bits, since each partial memory uniquely serves a specific address range.
  • the 2 address bits A + i and A 1 four partial memory areas can be addressed by generating the four select signals E 0 to E 3 corresponding to the binary significance in accordance with Table 1.
  • the partial memory designated by 260 in this particular embodiment is designed as a single-port RAM 280, the addresses, data and control signals of which are switched as required. Switching is accomplished by a control circuit 270 having floats of a multiplexer 275, depending on the select signals and other control signals 2901 and 2902 (e.g., read, write) from the respective ports. These signals are included together with the data and addresses in the signals 233 and 234, respectively, and are fed to the multiplexer 275 via 5281 and 5282, respectively, depending on the decision of the control circuit 270 corresponding to the output signal 2701 either 5281 or 5282 with the signals 2801 connects.
  • a direct addressing of the cache is assumed (direct-mapped) without limiting the generality. If there is a multi-associative cache organization, then either in units 275 the comparison must be valid and the cache-hit signal forwarded to the port or all data is sent via port 5331 and signal 233 to 231 and port 5332, respectively and forward signal 234 to 232 where the validity is checked.
  • the control circuit can make the forwarding of the signals 5281 or 5282 to 2801 and thus to the single port RAM 280 and also forward the data and other signals from 280 in the opposite direction. This is done in response to a valid select signal and the signals 233 and 234 and / or the order in which the ports cause a read or write operation to the memory 280 via these signals. If the read or write signals become active in the signals 233 and 234 at the same time, a previously defined port is first operated. This preferred port remains connected to 2801 even if no read or write signal is active. Alternatively, the preferred port can also be accessed dynamically by
  • Processor system are set, preferably depending on state information of the processor system.
  • This arrangement with a single port RAM is less expensive than a dual port RAM with parallel access, but delays the execution of at least one execution unit if a partial memory (also reading) is accessed at the same time.
  • This arrangement can also be extended to accesses of more than two processors:
  • a multi-port RAM can also be realized in the same way if the switching over of the addresses, data and control signals is provided step by step over several multiplexers (FIGS. 6 and 7). ,
  • Such a multi-port RAM 290 is shown in FIG. There, the port input signals 261, 262, ... 267 are decoded in the decoding devices 331, 332, ..337 to the signals 291, 292 ... 297. This decoding generates the select signals for the accesses to the individual RAMs in 281, 282 and 288.
  • FIG. 7 an embodiment for a sub-memory 28x (281 ... 288) is shown in more detail.
  • the select signals and control signals 3901, 3902, ... 3908 are processed from the control signals 291, 292 ... 298 to the output signals 3701, .. 3707.
  • These output signals each control one Multiplexer 375, depending on the signal value, the connections of the buses 381 or 382, to
  • the multiplexers 375 of Figure 7 in addition to the address, data and control signals, also connect the select signals of the next stages contained in 381, 382 ... 388. Furthermore, comparators can be included in 375, which determine the validity of the data read from the subareas in a multi-associative addressing mode. In a further advantageous embodiment, the connection of RAM areas to different execution units can be made dependent on one or more system states or configurations.
  • FIG. 8 shows an example of a configurable dual port cache for this purpose. To do this, the system or configuration signal 1000 is used in decoding the input signals for each of the two ports. Table 2 shows one way of changing the decoding in response to this signal 1000, designated M here.
  • FIG. 10 A further embodiment is shown in FIG. 10 when there is a multi-associative cache in which, from each sub-memory 281, 282,... 288, the data together with the identifier and the
  • Figure Bl shows a structure using the dual port cache for an asymmetric system structure with two execution units.
  • Bl 10 and Bl I l are the two execution units with their data / address and control signals B 120 and B 121.
  • BlOO is a switching and comparison unit (UVE).
  • This switching and comparison unit is shown in Figure B2 in its basic function for the application in conjunction with two execution units BIO and BI l.
  • Various output signals, such as data, control and address signals B20 and B21 of the execution units BIO and BI l are connected to the switching unit.
  • both output signals B40 and B41 which are each connected to one of the compensation units.
  • the switching unit contains at least one control register B 15 which has at least one binary symbol (bit) B 16 memory element which deactivates the mode of the comparison unit.
  • This bit B 16 can assume the two values 0 and 1 and can be set or reset by the signals B20 or B21 of the execution units or by internal processes of the switching unit.
  • the Umschalbit B 16 can be omitted if a permanent comparison mode is to be set.
  • this bit B 16 is set to the value 1, then the switching unit operates in comparison mode. In this mode, all incoming data signals from B20 will turn off with the data signals
  • switching unit of Figure B2 can be on one of the signals
  • B40 or B41 are waived if it is always ensured that the associated execution unit does not provide comparative data rather than the other execution unit. If the bit B 16 is not set, then the synchronization signals B40 and B41 and the error signal B 17 are always set to the value 0. There is no comparison and both execution units work independently in performance mode.
  • both execution units of FIG. 1 independently process programs, program parts or program segments.
  • Execution unit Bl 11 accesses cache B 105 via B 121, and the cache is connected to main memory or other memory devices via B 161.
  • the execution unit BI10 preferably also accesses the dual port cache via the device 106 (deactivated by the control signal Bl0l in this mode) and uses the second port of this cache for this purpose (see FIG. B3).
  • the switchover and comparison unit (UVE) B100 is inactive, i. no data is compared (B 16 not set).
  • the access of the execution unit BI10 via the cache can also be provided in a further embodiment via B160 directly to the main memory or other memory devices (see FIG. B4).
  • this further embodiment has the disadvantage that then the data in the cache are no longer consistent.
  • the corresponding block must then be invalidated in cache by resetting the valid bit, either by the processing unit BI10 itself, or independently by the cache, which detects by observing the bus B161 whether in one
  • Block that is present in the cache This process is also called bus snooping.
  • the further embodiment of Figure B3 is preferable because of the lower cost.
  • the comparison mode of the structure of Figure Bl is shown in detail in Figure B5. If this mode is activated by setting B 16 in the UVE B100, then both execution units start to execute the same program, which if necessary proceeds diversitively on the two execution units, ie uses different algorithms and / or commands for generating the data to be compared.
  • the BIlO execution unit outputs data that is to be compared with each other, with a corresponding identifier to the switching and comparison unit (UVE) BlOO.
  • This action causes the read request unit B 106 to store the control signals (e.g., write) and possibly additional tags (status or process information, processing cycle) to the memory element B 1061 and the associated address to a memory element B1062.
  • the unit B106 thereby triggers a read operation by means of the control signals (e.g., read) generated from B1061 in the control unit B 1064 and identifiers in B 1021 and the address signals B 1022 output from B 1062 at the port 2 of the dual port cache B 105.
  • the control signals e.g., read
  • the data value received is written to B 1063 via B 1023.
  • the validity of the data is indicated in the received control signals B 1024 (evaluation of the cache hit signal and the valid bit, and if valid, the data, addresses and appropriate control signals are provided via B1003, B1002 and BlOOl for comparison in B100
  • the identifier (control bits) returned with the data from the cache is compared with the current identifier. If the identifiers do not match, a read again is initiated at the same address in the cache Identifier together with the addresses, which may possibly also be used as an additional identifier, are released for comparison by means of suitable control signals B 1001.
  • a FIFO first in, first out refers to a memory unit which can store a plurality of data words and which also outputs the first stored data again first.
  • a synchronization of the two execution units can take place in that the execution unit Bl 11 stores the comparison data together with an identifier (address and additional control bits - for example valid bit, dirty bit, process ID) in the cache and before the comparison Validity is checked accordingly.
  • the stored address data and the valid bit are used to generate the cache hit signal.
  • the dirty bit only indicates whether the data of that block has been cached and not yet written back to main memory.
  • the process ID is understood to be an identification of the program execution, which is changed into the cache with each new write of valid data from Bl 11. This ensures that the timeliness of the data can be checked if the process ID, e.g. is set to a known value with the start of the comparison mode and then changed in a known manner, for example incremented.
  • Table 4 shows the internal structure of a cache. Each line corresponds to a data block.
  • the address tag is the part of the address that is relevant to access for a block and is compared to the address index on the current access. If there is a match and valid valid bit (part of the control bits), a cache hit signal is generated.
  • the data block can contain a few bytes to several KBytes of data.
  • the control bits are shown as an example in Table 5.
  • this value may be the hexadecimal value 0x0000 when the first comparison cycle begins.
  • This counter can then be incrementally incremented with each write of comparative data from Bl I l. be done. Comparative relevant means that you can specify based on the address and / or other control signals, whether the data is provided for comparison.
  • the counter B 1069 is to set with the activation of BlOl preferably to the same start value as the counter B 1059 and to increment with each Schreb- signal relevant data from Bl 10. If the process ID bits in the read block have the same or a higher value than the count of B 1069, then the
  • a further embodiment of the counter is also the storage of the low-order address bits (the address bits 0,..., K-1 according to FIG. 3 that are not relevant for the block addressing) as part of the process ID, while other process-ID bits are the cycle mark. If the word width is greater than one byte, the corresponding least significant bits are omitted. This makes it possible to recognize which word in the block was last written. For this purpose, in the program sequence, for example, either the data is always written with a continuous address, or one can make a unique assignment of the last written value of BlI l to the current comparison value of BI 10 using a look-up table with linear continuous program execution.
  • a further embodiment of the synchronization without the use of the counters B 1059 and B1069 consists in the transmission of a control signal B141 (eg interrupt) to the execution unit Bl 11 for requesting the comparison mode.
  • the execution unit B1 I l then begins after a known maximum time T with the execution of the program part in comparison mode.
  • the execution unit BI 10 is initialized accordingly and to the
  • the execution unit Bl 11 is to prevent the overwriting of the data not yet compared, if the data is e.g. be updated cyclically.
  • the resetting of the bit B16 may be e.g. also a
  • B 106 no further data for comparison are requested and the previous comparison operations are completed. As long as then B 141 is not set again, the execution unit Bl 11 operates in the performance mode. Overwriting of the data in the cache is thus possible again, but only after the renewed activation of B 141 is the comparison mode again initiated.
  • the signal B 141 is always active only for a short time in order to prepare the comparison mode by means of an interrupt. An overwriting of the data is then prevented until B 141 is set short again and is jumped by the thus triggered interrupt to the relevant program location, which provides the comparison data again.
  • All synchronization measures can be dispensed with after a start of the comparison, if it can be ensured by suitable measures that the time duration between the provision of comparison data by the execution unit B1 is always greater than that to compare the data with those required by BI 10, the data from Bl I l always at least at the same time or more than that of BI 10 and by temporarily storing the data from Bl 10 in B100 in a simple memory or FIFO the execution unit BI10 not the signal B140 must be stopped for synchronization purposes.
  • suitable measures that the time duration between the provision of comparison data by the execution unit B1 is always greater than that to compare the data with those required by BI 10, the data from Bl I l always at least at the same time or more than that of BI 10 and by temporarily storing the data from Bl 10 in B100 in a simple memory or FIFO the execution unit BI10 not the signal B140 must be stopped for synchronization purposes.
  • the dual port cache is not necessarily implemented with a dual port RAM, but a single port RAM B 1056 is used (see FIG. B7).
  • the two ports are serviced one after the other, depending on the request, by means of an access control B 1057 of which, however, only one access to the RAM has the signals B 1058.
  • readback by B 106 may be additionally delayed by one or more clocks in the case of access conflicts, but since the data are not supplied synchronously by both execution units in any case, this is not a nadir in various applications. It is only important that the execution unit Bl I l is equipped with higher prioritization and is thus not hindered in access.
  • the unit B 106 is not necessarily a separate unit, but may be in the UVE
  • BlOO or be integrated with the cache or execution unit together on a chip.
  • a cache with more than two ports according to FIG. 6 is available, i. a multi-port cache B205 can also be more than two processors, the data in one
  • Compare comparison mode or vote ie determine by majority vote the valid value. For each additional execution unit Bl 12,..., An additional read request unit B 107,... According to FIG. B8 is then also to be provided, and the UVE B200 must have a corresponding number of inputs for it.
  • comparison or voting mode The execution unit directly connected to the cache writes the date directly into the cache for comparison or voting. That of the other execution units Bl 10, Bl 12,..., Which first provides a date for comparison / voting, then requests the corresponding data via the connected read request unit B 106, B 107,... Via the connected port of the cache B205 , This execution unit transmits this status to the others
  • Execution units and the UVE by means of the signals B8105. After providing the data through the cache via B 102, B 104, ..., these are provided via B8105 to UVE B200 for comparison.
  • the comparison / voting takes place when all participating execution units have provided the corresponding data. If necessary, the other execution units must be stopped by this time via the control signals B 140, B 142, .... Time monitoring ensures that the comparison takes place within a tolerated time window, or an error is signaled.
  • RAM random access memory
  • FERAM programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable read-only memory
  • MRAM magnetic resonance RAM
  • FERAM programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable programmable

Abstract

La présente invention concerne un dispositif et un procédé pour enregistrer des données et/ou des ordres dans un système informatique comprenant au moins deux unités d'exécution et au moins une première mémoire ou zone mémoire destinée à des données et/ou des ordres. L'invention fait intervenir des éléments de transition, une transition ayant lieu entre au moins deux modes de fonctionnement, et l'invention fait intervenir des éléments de comparaison, un premier mode de fonctionnement correspondant à un mode de comparaison, et un second mode de fonctionnement correspondant à un mode de performance. L'invention se caractérise en ce que le dispositif comprend une seconde mémoire ou zone mémoire, le dispositif se présentant sous la forme d'un système de mémoire cache, et étant équipé d'au moins deux ports séparés, un port étant relié directement à une première unité d'exécution, et un troisième système étant intercalé entre le second port et la/les seconde(s) unité(s) d'exécution, ledit troisième système étant conçu de sorte que l'accès de la seconde unité d'exécution à la seconde mémoire ou zone mémoire, s'effectue par le troisième système.
EP06777936A 2005-08-08 2006-07-24 Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres Withdrawn EP1915684A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102005037234A DE102005037234A1 (de) 2005-08-08 2005-08-08 Vorrichtung und Verfahren zur Speicherung von Daten und/oder Befehlen in einem Rechnersystem mit wenigstens zwei Ausführungseinheiten und wenigstens einem ersten Speicher oder Speicherbereich für Daten und/oder Befehle
PCT/EP2006/064588 WO2007017367A1 (fr) 2005-08-08 2006-07-24 Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres

Publications (1)

Publication Number Publication Date
EP1915684A1 true EP1915684A1 (fr) 2008-04-30

Family

ID=36926336

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06777936A Withdrawn EP1915684A1 (fr) 2005-08-08 2006-07-24 Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres

Country Status (6)

Country Link
EP (1) EP1915684A1 (fr)
JP (1) JP2009505178A (fr)
KR (1) KR20080033338A (fr)
CN (1) CN101243404A (fr)
DE (1) DE102005037234A1 (fr)
WO (1) WO2007017367A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073565B (zh) * 2010-12-31 2014-02-19 华为技术有限公司 触发操作方法、多核分组调试方法、装置及系统
KR101432274B1 (ko) 2013-12-12 2014-08-21 (주)이건산전 백업모듈을 포함하는 철도 차량용 제어기
CN112416609A (zh) * 2021-01-22 2021-02-26 南京芯驰半导体科技有限公司 双核模式的模式配置方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4823256A (en) * 1984-06-22 1989-04-18 American Telephone And Telegraph Company, At&T Bell Laboratories Reconfigurable dual processor system
EP0439952A3 (en) * 1990-01-31 1992-09-09 Sgs-Thomson Microelectronics, Inc. Dual-port cache tag memory
JPH05128080A (ja) * 1991-10-14 1993-05-25 Mitsubishi Electric Corp 情報処理装置
US5751932A (en) * 1992-12-17 1998-05-12 Tandem Computers Incorporated Fail-fast, fail-functional, fault-tolerant multiprocessor system
CA2178440A1 (fr) * 1995-06-07 1996-12-08 Robert W. Horst Systeme multiprocesseur insensible aux defaillances
US6615366B1 (en) * 1999-12-21 2003-09-02 Intel Corporation Microprocessor with dual execution core operable in high reliability mode
US6772368B2 (en) * 2000-12-11 2004-08-03 International Business Machines Corporation Multiprocessor with pair-wise high reliability mode, and method therefore
DE10136335B4 (de) * 2001-07-26 2007-03-22 Infineon Technologies Ag Prozessor mit mehreren Rechenwerken
US7055060B2 (en) * 2002-12-19 2006-05-30 Intel Corporation On-die mechanism for high-reliability processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2007017367A1 *

Also Published As

Publication number Publication date
CN101243404A (zh) 2008-08-13
DE102005037234A1 (de) 2007-02-15
KR20080033338A (ko) 2008-04-16
JP2009505178A (ja) 2009-02-05
WO2007017367A1 (fr) 2007-02-15

Similar Documents

Publication Publication Date Title
DE10353268B3 (de) Paralleler Multithread-Prozessor (PMT) mit geteilten Kontexten
EP1057117B1 (fr) PROCEDE POUR LA MISE EN ANTEMEMOIRE HIERARCHIQUE DE DONNEES DE CONFIGURATION DE PROCESSEURS DE FLUX DE DONNEES ET DE MODULES AVEC UNE STRUCTURE DE CELLULE PROGRAMMABLE BI- OU MUTLIDIMENSIONNELLE (FPGAs, DPGAs OU ANALOGUE)
DE69233655T2 (de) Mikroprozessorarchitektur mit der Möglichkeit zur Unterstützung mehrerer verschiedenartiger Prozessoren
DE69724355T2 (de) Erweiterte symmetrische Multiprozessorarchitektur
EP1329816B1 (fr) Procédé pour le transfert dynamique automatique de processeurs à flux de données (dfp) ainsi que de modules à deux ou plusieurs structures cellulaires programmables bidimensionnelles ou multidimensionnelles (fpga, dpga ou analogues)
DE112013000891T5 (de) Verbessern der Prozessorleistung für Befehlsfolgen, die Sperrbefehle enthalten
DE102013201079A1 (de) Mechanismus des Weiterleitungsfortschritts für Speichervorgänge beim Vorhandensein einer Überlastung in einem System, das Belastungen durch Zustandsänderungen begünstigt
EP1915694A1 (fr) Procede et dispositif pour memoriser des donnees et/ou des ordres dans un systeme de calcul comprenant au moins deux unites de traitement et au moins une premiere memoire ou zone de memoire pour des donnees et/ou des ordres
DE2547488C2 (de) Mikroprogrammierte Datenverarbeitungsanlage
DE112015005597T5 (de) Verknüpfungsfähige Parallelausführungs-Schicht einer Ausgabewarteschlange für einen Prozessor
DE4335475A1 (de) Datenverarbeitungseinrichtung mit Cache-Speicher
DE112006003453T5 (de) Per-Satz-Relaxation der Cache-Inklusion
DE112005002432B4 (de) Verfahren und Vorrichtung zum Bereitstellen eines Quellenoperanden für eine Instruktion in einem Prozessor
DE3502147C2 (fr)
DE102006030879A1 (de) System zum Reduzieren der Latenzzeit von exklusiven Leseanforderungen in einem symmetrischen Multiprozessorsystem
DE10045188B4 (de) Cacheadresskonfliktvorrichtung
DE10219623A1 (de) System und Verfahren zur Speicherentscheidung unter Verwendung von mehreren Warteschlangen
DE19908618A1 (de) Gemeinsam genutzter Hochleistungscachespeicher
WO2007017367A1 (fr) Dispositif et procede pour enregistrer des donnees et/ou des ordres dans un systeme informatique comprenant au moins deux unites d'execution et au moins une premiere memoire ou zone memoire destinee a des donnees et/ou des ordres
DE102004009610B4 (de) Heterogener paralleler Multithread-Prozessor (HPMT) mit geteilten Kontexten
DE19950255A1 (de) Mikroprozessor
DE102007055138B4 (de) System zum Zugreifen auf einen Einzelport-Mehrwege-Cache
DE102009032071A1 (de) Technik für das Disponieren von Threads
DE102014012155A1 (de) Verbesserte verwendung von speicherressourcen
DE102005037215A1 (de) Verfahren zur Speicherung von Daten und/oder Befehlen in einem Rechnersystem mit wenigstens zwei Verarbeitungseinheiten und wenigstens einem ersten Speicher oder Speicherbereich für Daten und/oder Befehle

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20080310

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20090728

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110201