CN101356511A - Power conservation via DRAM access - Google Patents

Power conservation via DRAM access Download PDF

Info

Publication number
CN101356511A
CN101356511A CNA2006800508506A CN200680050850A CN101356511A CN 101356511 A CN101356511 A CN 101356511A CN A2006800508506 A CNA2006800508506 A CN A2006800508506A CN 200680050850 A CN200680050850 A CN 200680050850A CN 101356511 A CN101356511 A CN 101356511A
Authority
CN
China
Prior art keywords
impact damper
cache
data
miniature high
memory buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006800508506A
Other languages
Chinese (zh)
Other versions
CN101356511B (en
Inventor
劳伦特·R·莫尔
先勇·皮特·宋
皮特·N·格拉斯科斯奇
程宇庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Montalvo Systems Inc
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/351,070 external-priority patent/US7516274B2/en
Priority claimed from US11/559,192 external-priority patent/US7899990B2/en
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority claimed from PCT/US2006/044129 external-priority patent/WO2007097791A2/en
Publication of CN101356511A publication Critical patent/CN101356511A/en
Application granted granted Critical
Publication of CN101356511B publication Critical patent/CN101356511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

Power conservation via DRAM access reduction is provided by a buffer/mini-cache selectively operable in a normal mode and a buffer mode. In the buffer mode, entered when CPUs begin operating in low-power states, non-cacheable accesses (such as generated by a DMA device) matching specified physical address ranges, or having specific characteristics of the accesses themselves, are processed by the buffer/mini-cache, instead of by a memory controller and DRAM. The buffer/mini-cache processing includes allocating lines when references miss, and returning cached data from the buffer/mini-cache when references hit. Lines are replaced in the buffer/mini-cache according to one of a plurality of replacement policies, including ceasing replacement when there are no available free lines. In the normal mode, entered when CPUs begin operating in high-power states, the buffer/mini-cache operates akin to a conventional cache and non-cacheable accesses are not processed therein.

Description

Power transfer by the DRAM access
Background technology
In some microprocessor systems, when other processing in the system is minimized or suspends, during dma operation (quoting frame buffer information such as GPU), carry out the DRAM access, and each a lot of power of DRAM access consumes.Similarly, in some microprocessor systems, the processor of the outage that all or a part are other is powered on (or preventing to be de-energized) with relevant cache memory subsystem, to handle relevant dma operation (such as USB device affairs (transaction)).The technology of DRAM access and processing dma access be need more effectively carry out, performance, power efficiency and use effectiveness made it possible to improve thus.
In the system that ACPI is suitable for, owing to there is not monitoring (snooping), so the state C3, the C4 that are suitable for for lower powered ACPI, C5 etc. do not allow the DMA that is correlated with, monitored owing to cache memory simultaneously, so be suitable for state C2, C1 and the relevant DMA of C0 permission for the ACPI of relative higher-wattage.Hereinafter, term C0, C1, C2, C3, C4 and C5 refer to the ACPI suitable power state of similar name.
Description of drawings
Fig. 1 shows the details of selection of several embodiment of system, but this system has realized being used for the transparent use satisfying the non-cache of selecting and be used to satisfy the one or more impact dampers/miniature high-speed memory buffer of some backstage DMA device accesses.
Fig. 2 shows any the aspect of selection of embodiment in impact damper/miniature high-speed memory buffer among Fig. 1.
Fig. 3 shows the aspect of selection of the embodiment of state machine, but this state machine is applicable to that the example of the impact damper/miniature high-speed memory buffer in the control chart 2 is to satisfy the non-cache of selecting.
Fig. 4 shows the embodiment of memory range information, but this memory range information is used for the definite non-cache of selecting of coupling and the coupling dma access of selection, to handle by the example of the impact damper among Fig. 2/miniature high-speed memory buffer.
Fig. 5 shows the operation of the selection of carrying out by the embodiment that realizes being used to satisfying the relevant impact damper/miniature high-speed memory buffer of backstage DMA device access.
Fig. 6 shows the operation of the selection of carrying out by the embodiment that realizes being used to satisfying the incoherent impact damper/miniature high-speed memory buffer of backstage DMA device access.
Fig. 7 A-7F shows the contextual various embodiment of the impact damper/miniature high-speed memory buffer that is included in the processor.
Describe in detail
Provide detailed description below in conjunction with the accompanying drawing that shows principle of the present invention to one or more embodiment of the present invention.The feature of some embodiment or its distortion can be " significantly ".Describe the present invention in conjunction with the embodiments, these embodiment should be understood that it only is schematically, rather than restrictive.(significantly or on the contrary) the invention is not restricted to here one or all embodiment.Scope of the present invention is only limited by the subsidiary claim of the ending of this patent of having issued, and the present invention comprises multiple choices, modification and equivalent.Set forth many specific detail in the following description, to provide to thorough of the present invention.It is for example purposes that these details are provided, and the present invention can implement according to claim, and does not need some or all of these specific detail.For clarity sake, the technique known data is not described in detail in the technical field related to the present invention, thereby makes the present invention unnecessarily do not blured.
Introduce
Comprise that this introduction only is to help understanding quickly detailed description.Because the paragraph of any introduction carries out necessary abreviation to whole theme and is not intended to description detailed or restriction, so the design that the invention is not restricted to present in this introduction.For example, only provide the survey information that is subjected to length and institutional framework restriction below this introduction to specific embodiment.In fact a plurality of other embodiment are arranged, comprise the embodiment that claim finally is directed to, before and after whole instructions, these embodiment will be discussed.
Abbreviation
Here, various abbreviations of writing a Chinese character in simplified form elsewhere or abbreviation are used to represent specific element.Being described as follows of abbreviation.
Abbreviation Explanation
ACPI ACPI
CPU CPU (central processing unit)
CRT Cathode-ray tube (CRT)
DMA Direct memory access (DMA)
DRAM Dynamic randon access (read/write) storer
FIFO First in first out
GPU Graphics Processing Unit
I/O I/O
L1 First order cache memory
L2 Second level cache memory
L3 Third level cache memory
LRU Least recently used
MRU Use at most recently
MSR Machine/model particular register
OS Operating system
PC Personal computer
PDA Personal digital assistant
USB USB (universal serial bus)
But eliminate some DRAM accesses by satisfying the non-cache of selecting, thereby make power consumption to reduce the small-sized transparent use in the inside in the processor system with impact damper power-efficient/miniature high-speed memory buffer with impact damper/miniature high-speed memory buffer.By satisfy some backstage DMA device accesses with impact damper/miniature high-speed memory buffer, reduce the CPU of microprocessor system or the generation of the situation that cache memory subsystem must be powered on, impact damper/miniature high-speed memory buffer makes power consumption further to reduce.In certain embodiments, microprocessor system has realized a plurality of processors (or a plurality of CPU), each processor (or CPU) has the cache memory subsystem that is associated (for example, the first order, the second level, the third level and the more various layouts of high-level cache memory).
Impact damper/miniature high-speed memory buffer provides the power-saving that reduces via the DRAM access, and impact damper/miniature high-speed memory buffer optionally can be operated under normal mode and buffer mode.Under buffer mode, (when beginning to operate, CPU enters buffer mode under low power state), but the range of physical addresses of non-cache (but such as the non-cache that is produced by the DMA device) coupling regulation is handled by impact damper/miniature high-speed memory buffer, rather than is handled by storer control and DRAM.Impact damper/miniature high-speed memory buffer is handled and is comprised: alllocated lines (line) when quoting failure (reference miss), return the data of speed buffering from impact damper/miniature high-speed memory buffer when quoting success (reference hit).According to one in a plurality of replacement policies (comprise and when not having vacant line, stop to replace), a plurality of row of replacement in impact damper/miniature high-speed memory buffer.(enter normal mode when CPU begins under high power state operation) under normal mode, impact damper/miniature high-speed memory buffer and traditional cache memory are operated similarly, but and non-cache are not handled therein.Under a kind of operating position, be retained in data in impact damper/miniature high-speed memory buffer and be more new data of the figure that keeps with compressed format.
In some embodiments, impact damper/miniature high-speed memory buffer is the cache memory that the is connected to one or more CPU part of (such as in the first order, the second level and the third level cache memory any one).In some embodiments, impact damper/miniature high-speed memory buffer is a part that is connected to the highest order cache (or " outer level " (outer level) cache memory) in the cache memory subsystem of one or more CPU.For example, in the cache memory subsystem that only has first order cache memory, impact damper/miniature high-speed memory buffer can be the part of first order cache memory when a plurality of first order cache memory is perhaps arranged (if optional cache memory).Again for example, in the cache memory subsystem with the first order and second level cache memory, impact damper/miniature high-speed memory buffer can be the part of second level cache memory when a plurality of second level cache memory is perhaps arranged (if optional cache memory).Again for example, in the cache memory subsystem with the first order, the second level and third level cache memory, impact damper/miniature high-speed memory buffer can be the part of third level cache memory when a plurality of third level cache memory is perhaps arranged (if a plurality of cache memories).
When since any or all CPU (or the microprocessor system that is associated) be in when making under the low power state of not supporting to monitor that the data of the speed buffering among the CPU can not be by speed buffering, send (sourcing) and absorb the dma access of the storage space in the domain of dependence that sensing that (sinking) select is included in CPU by impact damper/miniature high-speed memory buffer, powering on to reduce via CPU and/or cache memory subsystem provides power-saving.By allowing microprocessor system (or its part) to remain on low power state, satisfy the dma access of selecting via impact damper/miniature high-speed memory buffer and make power consumption to be lowered.Impact damper/miniature high-speed memory buffer can be at operating under (temporarily) uncorrelated mode in the data of microprocessor system high speed buffering, and when microprocessor (or its part) when being transformed into the high power state that to monitor, make lost efficacy with the data sync of speed buffering before, impact damper/miniature high-speed memory buffer is cleaned (flush).Perhaps, impact damper/miniature high-speed memory buffer can be operated under (increment) relevant mode at the data of speed buffering.
Can adopt the allocation strategy (or its distortion) that comprises LRU, MRU that impact damper/miniature high-speed memory buffer is managed as the storer that is associated that is associated directly, fully or be provided with.Allocation strategy can comprise: but no longer the time spent stops to distribute in impact damper/miniature high-speed memory buffer when vacant line.Distribution state (such as LRU or MRU state) can be independent of or depend on the power rating of microprocessor system (or element of selecting therein) in advance.
Impact damper/miniature high-speed memory buffer can be designated as handles all DMA affairs, is perhaps just initiating affairs based on transaction address scope or which DMA device and is selecting the DMA affairs to handle.Only when microprocessor system (or its part) was operated under low power state or monitoring illegal state, impact damper/miniature high-speed memory buffer was optionally handled the DMA affairs.When microprocessor system (or its part) was operated under high power state or monitoring initiate mode, impact damper/miniature high-speed memory buffer can be handled the DMA affairs.
When microprocessor is converted to non-snoop-operations, the data of the speed buffering in microprocessor become unavailable before, data can be stored (or " pre-fill ") in impact damper/miniature high-speed memory buffer (DMA quotes the expectation that is to use for future).Pre-filling can be according to the address realm of programming or dynamically according to the previous DMA affairs of observing.
Schematically combination
This introduction is reached a conclusion with the paragraph collection, and these paragraphs have been summed up exemplary system and the method according to the design of being instructed briefly here.Each paragraph uses informal pseudo-claim form to highlight the various combinations of feature.These concise descriptions do not mean it is mutually exclusive, detailed or restrictive, the invention is not restricted to the combination that these highlight.As what gone through in the conclusion chapters and sections, the present invention is included in the interior all possible modifications and variations of scope of the claim of issue, and described claim is attached to the last of patent.
This introduction is reached a conclusion with the paragraph collection, and these paragraphs have been summed up exemplary system and the method according to the design of being instructed briefly here.Each paragraph uses informal pseudo-claim form to highlight the various combinations of feature.These descriptions that concentrate do not mean it is mutually exclusive, detailed or restrictive, the invention is not restricted to the combination that these highlight.As what gone through in the conclusion chapters and sections, the present invention is included in the interior all possible modifications and variations of scope of the claim of issue, and described claim is attached to the last of patent.
First of system schematically makes up and comprises: memory array, but operate according to cache pattern and Fei Ke cache pattern; The memory array controller, the control store array, but the memory array controller is designed to the identification incident and the conversion from the cache pattern to non-cache pattern in response to memory array.First schematically in the combination, described incident is first incident, but the memory array controller also is designed to discern second incident and be the cache pattern in response to memory array from non-cache mode switch.In above-mentioned schematic combination, first incident comprises any one in following: but but instruction processing unit switches to that lower power state, instruction processing unit switch to the lower-performance state, instruction processing unit cuts out monitoring, pattern information programmed programme to enable non-cache pattern with forbidding cache pattern and to pattern information.In above-mentioned schematic combination, second incident comprises any one in following: but but instruction processing unit switches to that higher-power state, instruction processing unit switch to the superior performance state, instruction processing unit is opened and monitored, pattern information is programmed to enable the cache pattern and pattern information is programmed to forbid non-cache pattern.
Second of system schematically makes up and comprises: memory array, but operate according to cache pattern and Fei Ke cache pattern; State machine, the control store array, but state machine is designed to the notice of the incident that receives and in response to the conversion operations of memory array from non-cache pattern to the cache pattern.In second schematically made up, described incident was first incident, but state machine also is designed to the notice of second incident that receives and the conversion operations from the cache pattern to non-cache pattern in response to memory array.In above-mentioned schematic combination, first incident comprises any one in following: but but instruction processing unit switches to that higher-power state, instruction processing unit switch to the superior performance state, instruction processing unit is enabled monitoring, pattern information is programmed to enable the cache pattern and pattern information is programmed to forbid non-cache pattern.In above-mentioned schematic combination, second incident comprises any one in following: but but instruction processing unit switches to that lower power state, instruction processing unit switch to the lower-performance state, the instruction processing unit forbidding is monitored, pattern information programmed programme to enable non-cache pattern with forbidding cache pattern and to pattern information.
In the first schematically combination and second schematically in any one in the combination, but when memory array during according to the operation of cache pattern, but memory array is used as cache memory in response to cache.In in the first schematic combination and second is schematically made up any one, but when memory array is operated according to non-cache pattern, but memory array is used as cache memory in response to non-cache.
During the first schematic combination and second is schematically made up any one also comprises outer grade cache memory, and this outer level cache memory comprises memory array.In above-mentioned schematic combination, outer level cache memory is at least one in first order cache memory, second level cache memory and the third level cache memory.Above-mentioned schematic combination also comprises microprocessor, and this microprocessor comprises outer level cache memory.Above-mentioned schematic combination also comprises Memory Controller, and this Memory Controller is included in the microprocessor.Above-mentioned schematic combination also comprises storage arrangement, can carry out access to this storage arrangement via Memory Controller.
During the first schematic combination and second is schematically made up any one also comprises video-stream processor.In above-mentioned schematic combination, but be listed in when operating under the non-cache pattern when memory array, but video-stream processor produces the non-cache at least a portion result of the distribution of the clauses and subclauses in the memory array.In above-mentioned schematic combination, but be listed in when operating under the non-cache pattern when memory array, but video-stream processor also produces the non-cache at least a portion result of the clauses and subclauses that read the distribution in the memory array.
In in the first schematic combination and second is schematically made up any one, but as the part from the cache mode switch, memory array is cleaned.In in the first schematic combination and second is schematically made up any one, but as the part from non-cache mode switch, memory array is cleaned.In in the first schematic combination and second is schematically made up any one, but as a part that is converted to the cache pattern, memory array is cleaned.In in the first schematic combination and second is schematically made up any one, but as a part that is converted to non-cache pattern, memory array is cleaned.
In in the first schematic combination and second is schematically made up any one, memory array comprises a plurality of clauses and subclauses, and each clauses and subclauses is managed as single object.In above-mentioned schematic combination, each clauses and subclauses comprises at least one in label field, address field, data field, effective field, obscene word section and the access type field.In above-mentioned schematic combination, but the access type field represent that but each clauses and subclauses quote quoting with non-speed buffering from speed buffering at least one be created.In above-mentioned schematic combination, but but, be cleaned but be marked as all clauses and subclauses of quoting establishment from non-speed buffering at least one the transition period from non-cache pattern and cache pattern.In above-mentioned schematic combination, but but, be cleaned but be marked as all clauses and subclauses of quoting establishment from speed buffering at least one the transition period from non-cache pattern and cache pattern.
During the first schematic combination and second is schematically made up any one also comprises control register.In above-mentioned schematic combination, control register is specified the condition relevant with incident.In above-mentioned schematic combination, described condition comprises at least one in following: instruction processing unit is changed between power rating, instruction processing unit is transformed into relatively low power rating, instruction processing unit is transformed into relative higher-power state, instruction processing unit is changed between performance class, instruction processing unit is transformed into relatively low performance class, instruction processing unit is transformed into relative superior performance rank, instruction processing unit is changed between listen mode, but the non-cache pattern of the indication of programming forbidding, but the indication forbidding cache pattern of programming, but but the cache pattern is enabled in the indication that non-cache pattern and programming are enabled in the indication of programming.In above-mentioned schematic combination, the part of control register is a mode register.
During the first schematic combination and second is schematically made up any one also comprises mode register, and mode register has the field of the physical memory address range of appointment to handle with memory array.In above-mentioned schematic combination, described field comprises at least one in the range of physical addresses upper limit and the range of physical addresses lower limit.In above-mentioned schematic combination, mode register has the other field of additional range to handle with memory array of specifying physical memory address.In above-mentioned schematic combination, described other field comprises at least one in the other range of physical addresses upper limit and the other range of physical addresses lower limit.
In the first schematically combination and second schematically in any one in the combination, memory array is operating as at least one that keeps in following: packed data, and at least a portion of packed data packed data not accordingly.In the first schematically combination and second schematically in any one in the combination, memory array is operating as at least one that keeps in following: not packed data, and the corresponding packed data of at least a portion of packed data not.
In in the first schematic combination and second is schematically made up any one, memory array is operating as the reservation packed data, and first during schematically combination and second is schematically made up any one also comprises the video-stream processor that can handle packed data.In above-mentioned schematic combination, the processing of packed data comprises packed data is launched.In above-mentioned schematic combination, the expansion of packed data produces the not packed data that is retained in the memory array.
In in the first schematic combination and second is schematically made up any one, memory array is operating as and keeps not packed data, and first during schematically combination and second is schematically made up any one also comprises and can handle the not video-stream processor of packed data.In above-mentioned schematic combination, the processing of packed data does not comprise packed data is not compressed.In above-mentioned schematic combination, the compression of packed data does not produce the packed data that is retained in the memory array.
In in the above-mentioned schematic combination with video-stream processor any one, video-stream processor is the processing element of Graphics Processing Unit, microprocessor and at least one in the virtual display controller.
In in above-mentioned schematic combination any one, memory array is also operated as two subregions.In above-mentioned schematic combination, but but first subregion in described two subregions keeps the data of non-speed buffering according to non-cache pattern.In above-mentioned schematic combination, but but second subregion in described two subregions keeps the data of speed buffering according to the cache pattern.In above-mentioned schematic combination, but but it is overlapping with reservation in time to the data of speed buffering to the reservation of the data of non-speed buffering.In above-mentioned schematic combination, specify at least one size in described two subregions by at least a in the following manner: the partition size pattern information is programmed; Dynamically determine working group's size.In above-mentioned schematic combination, the partition size pattern information comprises at least one in following: the quantity of mode; Every kind of mode bit field.
The 3rd of method schematically makes up and may further comprise the steps: but but optionally under non-speed buffering pattern and speed buffering pattern, operate memory construction; The detection incident; In response to the detection of incident, but but from non-speed buffering mode switch to the speed buffering pattern.The 3rd schematic combination also comprises: in response to the detection of other incident, but but from the speed buffering mode switch to non-speed buffering pattern.
The 4th of method schematically makes up and may further comprise the steps: but but optionally under non-speed buffering pattern and speed buffering pattern, operate memory construction; The identification incident; In response to the identification of incident, but but from the speed buffering mode switch to non-speed buffering pattern.The 4th schematic combination also comprises: in response to the identification of other incident, but but from non-speed buffering mode switch to the speed buffering pattern.
During the 3rd schematic combination and the 4th is schematically made up any one also comprises: pattern information is programmed, so that small part is specified some incidents.During the 3rd schematic combination and the 4th is schematically made up any one also comprises: but handle the part that non-speed buffering is quoted with memory construction.In above-mentioned schematic combination, a described part is determined to small part according to range of physical addresses.Above-mentioned schematic combination also comprises: other pattern information is programmed, to specify range of physical addresses.In above-mentioned schematic combination, a described part is also determined to small part according to other range of physical addresses.Above-mentioned schematic combination also comprises: other pattern information is programmed, to specify other range of physical addresses.Above-mentioned schematic combination also comprises: quote but handle speed buffering with memory construction.
During the 3rd schematic combination and the 4th is schematically made up any one also comprises: but in response to non-cache, in memory construction, distribute clauses and subclauses.In above-mentioned schematic combination, but when operation memory construction under non-speed buffering pattern, carry out described distribution.Above-mentioned schematic combination also comprises: but in response to cache, in memory construction, distribute other clauses and subclauses.In above-mentioned schematic combination, but, carry out the distribution of described other clauses and subclauses when operation under speed buffering pattern during memory construction.Above-mentioned schematic combination also comprises: but but will under non-speed buffering pattern, operate the clauses and subclauses that the time-division joins and be labeled as non-speed buffering clauses and subclauses.Above-mentioned schematic combination also comprises: but but will under the speed buffering pattern, operate the described other clauses and subclauses that the time-division joins and be labeled as the speed buffering clauses and subclauses.Above-mentioned schematic combination also comprises: but when from non-speed buffering mode switch, but the clauses and subclauses that are marked as non-speed buffering are cleaned.Above-mentioned schematic combination also comprises: but when from non-speed buffering mode switch, but the clauses and subclauses that are marked as speed buffering are cleaned.
In the 3rd during schematically combination and the 4th is schematically made up any one, but withdraw from non-speed buffering pattern in response in following at least one: instruction processing unit withdraws from relatively low power rating; Instruction processing unit withdraws from relatively low performance mode; Instruction processing unit is enabled monitoring; But forbid non-speed buffering pattern by software.In the 3rd during schematically combination and the 4th is schematically made up any one, but withdraw from the speed buffering pattern in response in following at least one: instruction processing unit withdraws from relative higher-power state; Instruction processing unit withdraws from relative superior performance pattern; The instruction processing unit forbidding is monitored; But forbid the speed buffering pattern by software.In the 3rd during schematically combination and the 4th is schematically made up any one, but enter non-speed buffering pattern in response in following at least one: instruction processing unit enters relatively low power rating; Instruction processing unit enters relatively low performance mode; The instruction processing unit forbidding is monitored; But enable non-speed buffering pattern by software.In the 3rd during schematically combination and the 4th is schematically made up any one, but enter the speed buffering pattern in response in following at least one: instruction processing unit enters relative higher-power state; Instruction processing unit enters relative superior performance pattern; Instruction processing unit is enabled monitoring; But enable the speed buffering pattern by software.
The 3rd schematically combination and the 4th schematically any one in the combination also comprise: but but produce to act on behalf of by non-cache and produce non-cache.Above-mentioned schematic combination also comprises: but when according to non-speed buffering pattern operational store structure, but handle in the non-cache of described generation at least one by memory construction.In above-mentioned schematic combination, but non-cache generation agency is in Graphics Processing Unit, direct memory access (DMA) device and the virtual display controller at least one.In above-mentioned schematic combination, but be packed data and at least one in the packed data not with at least a portion of at least some data that are associated of described non-cache.In above-mentioned schematic combination, never packed data draws some packed datas.In above-mentioned schematic combination, draw some not packed datas from packed data.
Summary
In various microprocessor systems, can often pass through the DMA affairs, but but use non-speed buffering and cache memory accesses to transmit data by device.Some DMA affairs are " backstage " accesses, even when microprocessor system is not busy, also need " backstage " access.
Owing to do not need to monitor, cause the monitoring bandwidth demand that reduces and the power consumption of reduction, so but non-cache is favourable.In legacy system, by DRAM (rather than cache memory) but satisfy non-cache.But, among the described here embodiment, in some cases, by impact damper/miniature high-speed memory buffer (rather than DRAM) but satisfy the non-cache of all or part, but but and carry out the non-cache of all or part according to non-speed buffering semanteme, the DRAM access of therefore having eliminated (or minimizing) pellucidly.The operation of impact damper/miniature high-speed memory buffer is sightless (rather than reducing the DRAM access) for other agency, and does not need existing OS is changed with relevant device driver code.Because being carried out access comparison DRAM, impact damper/miniature high-speed memory buffer carries out the less energy of access use, so the elimination of DRAM access has reduced power consumption.
For example, consider that the chip microprocessor system of enhancing has the Memory Controller of one or more CPU, embedding (being used for for example carrying out interface with DRAM is connected) and is used to satisfy (described here) but the impact damper/miniature high-speed memory buffer of non-cache.Even when but impact damper/miniature high-speed memory buffer satisfies non-speed buffering request from other agency, all CPU and Memory Controller also can remain under the low power state.In certain embodiments, but low power state comprises the low power state (such as ACPI state C3, C4, C5 etc.) of compatible AC-3 PI, and these states do not provide the monitoring of cache memory.When carrying out less DRAM access (or when not carrying out the DRAM access), but also can satisfy non-speed buffering request.Therefore, when the microprocessor system that strengthens is in the free time,, can use impact damper/miniature high-speed memory buffer to realize a large amount of power-savings such as when waiting for that keyboard is imported and just constant and be repeated access when being used for display update by the GPU data presented.Therefore, can obtain display update data (DRD), microprocessor is provided a large amount of power-savings according to effective and efficient manner.
In some respects, impact damper/miniature high-speed memory buffer and cache memory are similar, and comprise memory construction, this memory construction has a plurality of row, and each row has data and state (such as significant bit) and is associated with address or address realm.In response to one detection in the event sets, under the control of one or more state machines, make the operator scheme of impact damper/miniature high-speed memory buffer change (such as, make that impact damper/miniature high-speed memory buffer can be in response to non-cache but enter low-power CPU state).But non-cache that all steering logics that are associated with state machine or any a part of steering logic also are provided at selection and two-way connection the between impact damper/miniature high-speed memory buffer.But can be by the range of physical addresses (for example, corresponding to DRAM) of appointment in one or more program access mode registers but partly determine the non-cache selected.
But since do not need the particular procedure that the Shi Yufei cache is associated (such as, to the explicit cleaning of the part of whole cache memory or cache memory), so but cache is favourable.In legacy system, but satisfy cache by cache memory subsystem that is associated that forms the domain of dependence or CPU.Disadvantageously, the part of CPU or the cache memory subsystem that is associated must be powered on, but to carry out cache (that is, processor logic must withdraw from any monitoring disabled status).But, among the described here embodiment, in some cases, by impact damper/miniature high-speed memory buffer (rather than by CPU or cache memory subsystem) but satisfy all or part cache.Otherwise if be in the free time (that is, be in monitor disabled status), then CPU and cache memory subsystem can keep outage, to reduce power consumption largely therefore.
System
Fig. 1 shows the details of selection of several embodiment of system, but this system has realized being used for the transparent use satisfying the non-cache of selecting and be used to satisfy the one or more impact dampers/miniature high-speed memory buffer of some backstage DMA device accesses.Make the DRAM access to be lowered but satisfy the non-cache of selecting, therefore reduce power consumption with one in a plurality of impact dampers/miniature high-speed memory buffer.Satisfy that some dma accesses make it possible to reduce CPU and/or cache memory subsystem powers on, therefore reduce power consumption.In some cases, this system be included in can compatible PC machine (such as notebook type or desktop PC, perhaps Embedded Application device) in.In some cases, this system is included in PDA class device or other similar ambulatory handheld or the portable unit.
This system comprises several solid box parts, and these solid box elements are divided into the integrated circuit (or chip) of various uniquenesses according to various situations, as by shown in several frame of broken lines parts.3 kinds of distortion have been shown among the figure.First distortion has the impact damper/miniature high-speed memory buffer (such as impact damper/miniature high-speed memory buffer 112A) that is included in the processor, but to satisfy the non-cache of selecting.First distortion also has the impact damper/miniature high-speed memory buffer (such as impact damper/miniature high-speed memory buffer 112B) that is positioned at the processor outside, to satisfy some backstage DMA device accesses.Second distortion has the impact damper/miniature high-speed memory buffer that is included in the processor, but does not have the impact damper/miniature high-speed memory buffer that is positioned at the processor outside.The 3rd distortion has the impact damper/miniature high-speed memory buffer that is positioned at the processor outside, but does not have the impact damper/miniature high-speed memory buffer that is included in the processor.Under some operating positions, impact damper/miniature high-speed memory buffer 112A also can satisfy some backstage DMA device accesses.Under some operating positions, but impact damper/miniature high-speed memory buffer 112B also can satisfy the non-cache of selection.
CPU and cache memory parts 110 with one or more CPU and the cache memory that is associated and/or cache memory subsystem are connected to (processor) control module 130A, and processor control module 130A has impact damper/miniature high-speed memory buffer 112A according to first distortion and second distortion.The processor control module is connected to (chipset) control module 130B by link 120, and chipset control module 130B has impact damper/miniature high-speed memory buffer 112B according to first distortion and the 3rd distortion.The chipset control module is connected to GPU/DMA device 115, (built-in) DMA device 132 and (external) DMA device 133.Show DRAM is carried out two kinds of technology that interface connects.In first kind of technology, the dram controller 113A that processor is positioned at central authorities is connected to (processor) control module 130A and DRAM 114A.In second kind of technology, the dram controller 113B that chipset is positioned at central authorities is connected to chipset control module 130B and DRAM 114B.Various embodiment can realize any combination of DRAM interface interconnection technique.
The situation of dividing comprises the processor chips 102 that are implemented as single integrated circuit, and processor chips 102 have CPU and cache memory parts 110, control module 130A (comprising impact damper/miniature high-speed memory buffer 112A according to distortion alternatively) and optional dram controller 113A.The situation of dividing also comprises the chipset 103 that is implemented as another single integrated circuit, and chipset 103 has control module 130B (comprising impact damper/miniature high-speed memory buffer 112B according to distortion alternatively), (built-in) DMA device 132 and optional dram controller 113B.The situation of dividing also comprises the integrated graphics chipset 104 that is implemented as single chip, and integrated graphics chipset 104 has chipset 103 and GPU/DMA device 115.
The situation of dividing also comprises the processor system 101 that is implemented as single chip, and processor system 101 comprises processor chips 102 and chipset 103.Under some operating positions, (monolithic) processor system 101 is operated in conjunction with GPU/DMA device 115, (external) DMA device 133 and DRAM114A or the 114B as independent chip.The situation of dividing also comprises processor and the dram chip of realizing with monolithic, many small pieces (multi-die) or multichip module 100, and processor and dram chip 100 comprise any part of all parts and the whole DRAM 114A or the DRAM 114A of processor chips 102.The situation of dividing also comprises integrated graphics and the dram chip collection of realizing with monolithic, many small pieces or multichip module 105, and integrated graphics and dram chip collection 105 comprise any part of all parts and the whole DRAM 114B or the DRAM 114B of integrated graphics chip 104.The situation of above-mentioned division only is schematically, is not restrictive, and this is because the situation of other division is possible and can be expected.For example, the described parts that are implemented with monolithic can be implemented as the single integrated circuit small pieces that are included in single module encapsulation or the multimode encapsulation.
Unit that illustrates and square frame border are not restrictive, and this is because can use other parts to divide.For example, can in any one DMA device, realize whole or a part of chipset control module and the impact damper/miniature high-speed memory buffer that is positioned at the processor outside.Again for example, the impact damper/miniature high-speed memory buffer that is included in the processor can realize that perhaps whole or part is included in CPU and the cache memory with (shown in) CPU and cache memory separately.Again for example, can in each of a plurality of DMA devices, realize chipset control module (or its any part) and the example that is positioned at the impact damper/miniature high-speed memory buffer of processor outside.
Embodiment in first distortion and the 3rd distortion (having the impact damper/miniature high-speed memory buffer that is arranged in the processor outside at least), even be de-energized as all CPU, cache memory, processor control module and with the link that processor is connected to chipset, when perhaps under the low power state of for example monitoring is not provided, operating, but also make this system can carry out non-cache and some dma operations of selection.The chipset control module keeps when other parts are de-energized (or being in various low power states) and can operate, and (by the impact damper/miniature high-speed memory buffer that is positioned at the processor outside) but non-cache that satisfy to select and with respect to inner or outside some DMA requests that produce of chipset.When these accesses and request were satisfied, other parts of chipset can keep outage, had therefore further reduced whole power consumption.Perhaps, chipset control module (comprising the impact damper/miniature high-speed memory buffer that is positioned at the processor outside) remains on low power state or off-position when these accesses and request are just not processed, when receiving these accesses and request, be converted to the operable state long enough, to handle these accesses and request temporarily.
Similar with the embodiment that has the impact damper/miniature high-speed memory buffer that is positioned at the processor outside at least, embodiment in first distortion and second distortion (having the impact damper/miniature high-speed memory buffer that is arranged in processor inside at least), even when all CPU, cache memory and segment chip collection are de-energized, when perhaps under the low power state of for example monitoring is not provided, operating, but also make this system can carry out non-cache and some dma operations of selection.The processor control module keeps when other parts are de-energized (or being in various low power states) and can operate, and (by the impact damper/miniature high-speed memory buffer that is positioned at processor inside) but the DMA request satisfying non-cache and transmit from chipset by link.Communicate to the access of processor and request and with these access and requests in order to discern these processor, when satisfying these accesses and request, chipset control module and link keep and can operate, and other parts of chipset can keep outage, have therefore reduced whole power consumption.Perhaps, processor control module (comprising the impact damper/miniature high-speed memory buffer that is positioned at processor inside), link and chipset control module remain on low power state or off-position when these accesses and request are just not processed, when receiving these accesses and request, be converted to the operable state long enough, to handle these accesses and request temporarily.
In certain embodiments, impact damper/miniature high-speed memory buffer (no matter be built-in or external for processor) respectively with any speed buffering structure that in processor, realizes (such as first order cache memory L1 and second level cache memory L2) (or relevant) synchronously.In certain embodiments, when processor was just carried out access, impact damper/miniature high-speed memory buffer was correlated with being held increment, that is, impact damper/miniature high-speed memory buffer is monitored as required.In certain embodiments, because processor is to monitor power rating from non-monitoring power state transition, so impact damper/miniature high-speed memory buffer is held relevant by explicit cleaning.In certain embodiments, do not carry out explicit operation and make impact damper/miniature high-speed memory buffer synchronous, that is, irrelevantly operate for the cache memory that any processor is realized.In certain embodiments, when impact damper/miniature high-speed memory buffer was irrelevantly operated, system software was guaranteed not keep stale data in impact damper/miniature high-speed memory buffer.
At some embodiment of second distortion (have the impact damper/miniature high-speed memory buffer that is included in the processor, but do not have the impact damper/miniature high-speed memory buffer that is arranged in the processor outside), chipset control is as the LI(link interface) of chipset.In some embodiment of the 3rd distortion (have the impact damper/miniature high-speed memory buffer that is positioned at the processor outside, but do not have the impact damper/miniature high-speed memory buffer that is included in the processor), processor control is as the LI(link interface) of processor.
Impact damper/miniature high-speed memory buffer
Fig. 2 shows any the aspect of selection of embodiment among the impact damper as impact damper/miniature high-speed memory buffer 112/miniature high-speed memory buffer 112A-B among Fig. 1.Impact damper/miniature high-speed memory buffer comprises memory construction 201, and memory construction 201 is operated under at state machine 202 and according to the control from the steering logic that is associated of the information of mode register 221, and by state machine 201 by access.Memory construction is organized as a plurality of same item of row shown in the 201.0...201.N (or according to embodiment many groups same item).Every row comprises one or more fields of one or more bits of being given an example as row 201.0, row 201.0 have optional label field 211, data field 212, significant bit 213, dirty bit (dirty bit) but 214 and optional speed buffering bit 215.In certain embodiments, but any combination of dirty bit and optional speed buffering bit is implemented as single field (hereinafter being called mode field).Mode field is not limited to two bits on width, can comprise that three or more bits come multiple capable status condition is encoded.
In certain embodiments, memory construction and cache memory are similar, and can be organized according to direct mapping mode (that is, not having label field) or the mode that is associated fully (that is, having the label field that is used to mate whole row address).In certain embodiments, memory construction is similar with set associated cache (label field is mated the part of whole address), has the two or more groups row as each set operation.In various embodiments, the impact damper that memory construction provides in response to state machine/miniature high-speed memory buffer control information and according to the steering logic that is associated of mode register, operate under any combination of various modes, various modes comprises direct mapped mode, pattern and the set pattern that is associated that is associated fully.
Mode register is not limited to the embodiment in impact damper/miniature high-speed memory buffer, also is not limited to the embodiment of single register.In certain embodiments, mode register (or its part) can any one parts of processor system or chipset (comprise with processor or CPU in one or more MSR that are associated, impact damper/miniature high-speed memory buffer (shown in Figure 2) and dram controller) in realize.
In certain embodiments, memory construction and traditional cache memory (that is, but do not have speed buffering bit 215) are identical.In certain embodiments, memory construction is revised from traditional cache memory.In certain embodiments, the distribution of traditional cache memory and replacement function are used with the diode-capacitor storage structure by part.In certain embodiments, the part combination of memory construction and CPU cache memory, perhaps integrated with outer level cache memory, outer level cache memory is integrated such as L2 or L3 cache memory (referring to " being included in the impact damper/miniature high-speed memory buffer in the processor " the embodiment chapters and sections in other place herein, to obtain more information).
State machine
Fig. 3 shows the aspect (constitutional diagram) of selection of the embodiment of state machine 202, state machine 202 be suitable for controller buffer/miniature high-speed memory buffer 112 (Fig. 2) but example to satisfy the non-cache of selecting (such as according to aforementioned first distortion and the 3rd distortion) with the impact damper/miniature high-speed memory buffer that is positioned at the processor outside.Shown in dotted ellipse, state machine has been realized two groups of states according to normal manipulation mode (" normal mode " 301) and impact damper operator scheme (" impact damper pattern " 302).Under normal mode, but impact damper/miniature high-speed memory buffer is not handled non-cache, under the impact damper pattern, but can be by the non-cache of impact damper/miniature high-speed memory buffer processing selecting.Normal mode is in response to a plurality of normal modes in the impact damper mode event and be converted to the impact damper pattern.Impact damper mode response in a plurality of impact damper patterns to the normal mode incident one and be converted to normal mode.
After being converted to normal mode, state machine begins operation for 312 times at " normal running " state, and wherein, but impact damper/miniature high-speed memory buffer 112 is not handled non-cache affairs.When detecting a plurality of normal modes in the impact damper mode event, just withdraw from normal operating state.Then, state machine is converted to " (normally) cleans impact damper/miniature high-speed memory buffer " state 313 by " the impact damper pattern enters incident " 311, wherein, all dirty row (dirty line) (if present) are cleaned storer (such as the DRAM 114A of Fig. 1 or the 114B) from impact damper/miniature high-speed memory buffer.
After finishing cleaning, state machine is converted to " buffer operation " state 323 by " (normally) cleaned and finished " 303, wherein, but can be by the non-speed buffering affairs of impact damper/miniature high-speed memory buffer processing selecting.When detecting a plurality of impact damper patterns in the normal mode incident, just withdraw from the buffer operation state.Then, state machine is converted to " (impact damper) cleans impact damper/miniature high-speed memory buffer " state 322 by " normal mode enters incident " 321, and wherein, all dirty row (if any) are cleaned from impact damper/miniature high-speed memory buffer.After finishing cleaning, state machine is converted to " normal running " 312 by " (impact damper) cleans and finish " 304, wherein, but no longer handles non-speed buffering affairs by impact damper/miniature high-speed memory buffer.
Normal mode comprises various programmable event to the impact damper mode event, is reducing the used time of operation under power rating (for example, but in the C3 of compatible AC-3 PI, C4, the C5 state etc.) or the lower-performance pattern such as one or more CPU.But detected state/pattern, can be programmed by the pattern information that is stored in mode register 221 (Fig. 2) in time required under each state/pattern and the CPU that is observed to be used for the operation under each state/pattern.
The impact damper pattern comprises various programmable event to the normal mode incident, withdraws from the power rating of reduction one such as one or more CPU, returns under the superior performance pattern operation (that is, withdrawing from the lower-performance pattern) or produces the core traffic.But detected state/pattern and being observed with the CPU that is used for the operation under each state/pattern can be programmed by the pattern information of mode register 221.Some programmable event also can be relevant with the monitoring bandwidth threshold, when surpassing this threshold value, be identified as the impact damper pattern to normal mode incident (referring to " but minimizing of the DRAM access by the non-cache " chapters and sections in other place herein, to obtain more information).
In certain embodiments, by forbidding being transformed into the impact damper mode state from the enabling of mode register 221 (in bit that is stored in mode register or code field)/blocking information.In certain embodiments, impact damper pattern in the normal mode incident comprises that indication impact damper pattern is with disabled enabling/blocking information.In certain embodiments, normal mode in the impact damper mode event comprises enabling/blocking information that indication impact damper pattern will be activated.In certain embodiments, enable/software (such as the driver of bit that is written to MSR or field) programming that blocking information is carried out in a plurality of CPU one, and in certain embodiments, enable/blocking information is by state machine (for example, the identification of binding buffer device pattern to normal mode incident and normal mode to the impact damper mode event) and handle.
In certain embodiments, even when one or all CPU not at the power rating that reduces, when monitoring disabled status or lower-performance state of operation, but such as one or all CPU during at C0, the C1 of compatible AC-3 PI or C2 state or superior performance state of operation, the impact damper pattern also can be exercisable.Memory range
Fig. 4 shows the embodiment of memory range information, but this memory range information is used for the definite non-cache of selecting of coupling and the coupling dma access of selection, to handle by the example of impact damper/miniature high-speed memory buffer 112.Full physical address space 401 shows the whole physical address space of being realized by this system, and full physical address space 401 has the programmable storage scope 402 that can be programmed the ground appointment.When but impact damper/miniature high-speed memory buffer is operated non-cache with processing selecting under the impact damper pattern, be cushioned device/miniature high-speed memory buffer and handle (handling) but other non-cache is not cushioned device/miniature high-speed memory buffer but drop into non-cache in the programmable storage scope 402.Operate when satisfying the coupling dma access of selecting when impact damper/miniature high-speed memory buffer (in relevant or irrelevant mode), drop into dma access in the programmable storage scope 402 and be cushioned device/miniature high-speed memory buffer and handle (other dma access is not cushioned device/miniature high-speed memory buffer and handles).
In certain embodiments, the programmable storage scope is designated as scope 403, and scope 403 has the highest and minimum physical address that is defined respectively by the content of pattern address field 221.1 and 221.3.In certain embodiments, but only there is the part of impact damper/miniature high-speed memory buffer to be assigned to the non-cache of processing under the impact damper pattern, shown in defined subrange 404 concept natures of the content of pattern address field 221.2.In certain embodiments, the part of distribution can be specified by other mechanism (such as the specific selection in multiple mode or these modes), and there is no need to be subject to the standard of single continuous range of physical addresses.
In certain embodiments, any combination of pattern address field 221.1-221.3 is implemented as the field of the mode register 221 of Fig. 2.In certain embodiments, any combination of these pattern address fields can be implemented in independent mode register.In certain embodiments, but be implemented among any combination MSR of these pattern address fields, MSR is located among the one or more CPU that realize in the CPU of Fig. 1 and cache memory parts 110, processor chips 102, processor system 101 and processor and the dram chip 100.
Though at the single memory range describe above-mentioned memory range, various embodiments can provide a plurality of discontinuous or successive ranges (and subrange).In these scopes (and subrange) each can be relative to each other by independent appointment.For example, but more than first programmable storage scope can be designated as and be used for non-cache buffer operation, more than second programmable storage scope can be designated as and be used for the dma access associative operation, and more than the 3rd programmable storage scope can be designated as and be used for the irrelevant operation of dma access.In embodiment with a plurality of impact dampers/miniature high-speed memory buffer, independent programmable storage scope can be designated as and be used for each impact damper/miniature high-speed memory buffer, and perhaps one or more impact dampers/miniature high-speed memory buffer can be shared one or more in the described scope.
In certain embodiments, all or a part of described scope can be programmed by address-observation logic, address-observation logic is implemented as in time to be transmitted the address to DMA and monitors (referring to " the power dma access of the reduction " chapters and sections in other place herein, to obtain more information).
Request attribute
Except the sign based on memory range, the display update data also can be identified from the attribute of the request of High Speed I/O link (such as super transmission (HT) bus (AMD) or general-purpose system interface (CSI) bus (Intel Company or PCI Express)) by being used to.
Attribute includes but not limited to:
Source ID (the unit ID among the HT): therefore be enough to potentially the source is designated the display update engine.
Label (Svc label in the super transmission): when making up, can identify the source of the traffic accurately of expectation with source ID.
Synchronous or specific virtual channel traffic indication any indication of display update data traffic and other traffic separation (that is, with).
Particular bit is such as the bit of being correlated with or being equal in super transmitted bit, Pass PW, RespPassPw, other agreement.
In addition, CSI or PCT Express have aforesaid like attribute, and their use within the spirit and scope of the present invention.In addition, can have other High Speed I/O link that can be utilized, and their use within the spirit and scope of the present invention.These request attribute can self be used by them, and perhaps the combined memory range information is used, with the source of sign display update data (DRD).
But the minimizing of the DRAM access by non-cache
By by impact damper/miniature high-speed memory buffer (such as according to aforementioned first distortion and the 3rd distortion) with impact damper/miniature high-speed memory buffer of being positioned at the processor outside that but the non-cache of processing selecting reduces the operation of DRAM access is as follows.After resetting in system, be included in the CPU executive software in CPU and the cache memory parts 110, so that operation mode information and memory range in the mode register 221 (Fig. 2) are programmed, thereby but specify non-cache to reach optimization.Impact damper/miniature high-speed memory buffer (such as the impact damper among Fig. 1/miniature high-speed memory buffer 112B) begins to handle according to " normal running " state 312 (Fig. 3), and impact damper/miniature high-speed memory buffer do not handle (such as what produce) by the GPU/DMA device among Fig. 1 115 but non-cache.Programmable event (such as the time that spends under low-power/performance state) is taking place afterwards, state machine 202 (Fig. 2) is by cleaning the dirty row (if any) in impact damper/miniature high-speed memory buffer and all rowers of impact damper/miniature high-speed memory buffer are designated as invalid (such as by unconfirmed significant bit 213), but begins to make and can carry out speed buffering to the non-speed buffering affairs of the coupling in impact damper/miniature high-speed memory buffer.After impact damper/miniature high-speed memory buffer was cleaned fully, impact damper/miniature high-speed memory buffer was operated under the impact damper pattern, but and impact damper/miniature high-speed memory buffer handle mating non-speed buffering affairs.
But the non-speed buffering affairs that the GPU/DMA device produces compare with (such as described by the programmable storage scope 402 among Fig. 4) scope, but if a coupling in the address of non-speed buffering affairs and the described scope, but then handle non-speed buffering affairs (for example handling non-coupling affairs elsewhere) with DRAM 114A among Fig. 1 or 114B with impact damper/miniature high-speed memory buffer.Similar with the processing of traditional cache memory by having distribution, replace and monitor strategy, but the non-speed buffering of coupling (in certain embodiments, but be the coupling speed buffering alternatively) affairs are cushioned device/miniature high-speed memory buffer and handle.In certain embodiments, allocation strategy is set to usually distribute and reads, thereby after row had been read once from DRAM, this row resided in impact damper/miniature high-speed memory buffer.In certain embodiments, allocation strategy comprises the distribution that writes or all writes (writethrough).
Replacement policy can be at random, LRU, FIFO, definite sequence, circulation, greediness or any other strategy, but can (or be suitable for easily can) efficiently utilize impact damper/miniature high-speed memory buffer capacity when these strategies are avoided low performance when surpassing impact damper/miniature high-speed memory buffer capacity when non-cache scope.In certain embodiments, replacement policy is adjusted, thereby the DRAM access is in time reduced, minimizes or divides into groups, to make various system units (such as CPU, Memory Controller and DRAM) can reach lower power state by eliminating the DRAM access.In some embodiments, the replacement policy of adjustment comprises: have only when existing " free time " row just to distribute, otherwise do not distribute when (or clauses and subclauses), that is, but only when the vacant line time spent, distribute just continuation, when no longer including vacant line and stay, distribution stops.Vacant line can be an invalid entries, perhaps can be any clauses and subclauses of distributing enter the impact damper pattern when using definite sequence replacement policy (for example from mode 0 to mode N-1) before.Scanning regularly is suitable for the traffic of the address realm of impact damper/miniature high-speed memory buffer fully and manages optimisedly, simultaneously because the entire capacity of impact damper/miniature high-speed memory buffer will be by access, so that the scanning that impact damper/the miniature high-speed memory buffer is overflowed will be managed the power consumption that has therefore reduced the DRAM access and be associated optimisedly.Owing to overflow the integral body that sweep test is combination (coalesced whole),, therefore make DRAM (and Memory Controller) can enter low power state so the DRAM access also will in time be become piece.
Under some operating positions, when to monitor bandwidth be high, but the data of non-speed buffering are not retained in (even being in the impact damper pattern) in impact damper/miniature high-speed memory buffer, on the other hand, only when learning that the additional monitoring bandwidth that needs will be very little and/or by matching range (such as being used for more new traffic of figure) during by fine control, but the data of non-speed buffering are retained in impact damper/miniature high-speed memory buffer.Under some operating positions,, withdraw from impact damper pattern (referring to " state machine " chapters and sections in other place herein, to obtain more information) when monitoring pressure near peak value or when surpassing the threshold value of programming.
In certain embodiments, impact damper/miniature high-speed memory buffer and direct mapping cache are operated similarly, only cover the as many match address scope that capacity allowed with impact damper/miniature high-speed memory buffer.Label field (such as the label field with address 211 among Fig. 2) is not used, and significant bit (such as the significant bit among Fig. 2 213) is used.But owing to read and be received with the non-speed buffering of a coupling in the described scope, so the significant bit of impact damper/miniature high-speed memory buffer is by access.If read failure (that is, corresponding significant bit is eliminated), the data that then read are removed from DRAM, be copied in impact damper/miniature high-speed memory buffer (such as the data field among Fig. 2 212), and significant bit are set up.If read success (that is, significant bit is set up), then provide data from impact damper/miniature high-speed memory buffer (such as the data field from Fig. 2 212).Under some operating positions, the impact damper/miniature high-speed memory buffer of operating under direct mapped mode can be distributed or all writes writing.
Another programmable event (such as withdrawing from low-power/performance state) has taken place after, but state machine is by the distribution of the row in forbidding impact damper/miniature high-speed memory buffer when handling non-cache, all dirty row (if any) in impact damper/miniature high-speed memory buffer are cleaned, and make that all row are invalid, but begin to forbid the speed buffering of non-cache.After impact damper/miniature high-speed memory buffer was cleaned fully, impact damper/miniature high-speed memory buffer was operated under normal mode, but and impact damper/miniature high-speed memory buffer non-speed buffering affairs are not handled.Under the certain operations pattern, impact damper/miniature high-speed memory buffer is used as traditional cache memory when being in normal mode.
In some embodiment or operator scheme, impact damper/miniature high-speed memory buffer (under the impact damper pattern) is kept fully, be used for (coupling) but the processing of non-cache.In some embodiment or operator scheme, but impact damper/miniature high-speed memory buffer is handled cache under normal mode and impact damper pattern, but but and the speed buffering bit 215 among Fig. 2 be used to divide into cache (but such as the speed buffering bit 215 that confirms) but the row that distributes and the row that distributes for non-cache (but such as unconfirmed speed buffering bit 215).Under the certain operations pattern, the cleaning (such as when changing between impact damper pattern and the normal mode) of row is with good conditionsi when distinguishing, but such as when changing, only having non-cache line to be cleaned.
In certain embodiments, but have only the part of impact damper/miniature high-speed memory buffer under the impact damper pattern, to be operated handling non-speed buffering affairs, but and remainder is operated to handle the speed buffering affairs under normal mode.In some embodiments, described part can be disposed by pattern information, perhaps can be dynamically configured when entering the impact damper pattern.The part that dynamic-configuration makes it possible to understand working group and determine impact damper/miniature high-speed memory buffer that (and optimization) uses under the impact damper pattern.In some embodiments, if described remainder is cleaned, then described remainder is deactivated, if described remainder is not cleaned, then described remainder only is being operated under the retention data state (that is, not being proved), thereby reduces power consumption.Under the certain operations pattern, the part of impact damper/miniature high-speed memory buffer is operated under the impact damper pattern all the time, thereby but mates non-cache always by this section processes.Standard in the part of the standard of the part of the impact damper/miniature high-speed memory buffer of (conditionally) operation under the impact damper pattern and the impact damper/miniature high-speed memory buffer of operating under the impact damper pattern all the time can be by identifying multiple mode or for every kind of mode one bit being set, to use in each impact damper/miniature high-speed memory buffer.One or more MSR of the part of the software of carrying out on one in a plurality of CPU by realizing whole mode register or mode register can carry out access to the setting and the bit of multiple mode.
Can change according to the operation under normal mode and the impact damper pattern with any combination of replacement, distribution and monitoring strategy that impact damper/the miniature high-speed memory buffer is used in combination.For example, under the impact damper pattern, allocation strategy is enhanced, but is identified as the memory buffer zone so that non-speed buffering is quoted, and quoting of identification carried out speed buffering.Again for example, at some embodiment (wherein, during normal mode and impact damper pattern, but the data of the data of speed buffering and Fei Ke speed buffering coexist as in impact damper/miniature high-speed memory buffer) in, but the data allocations of non-speed buffering can be restricted to a read access.But making, this restriction can eliminate non-cache line by replacing the operating period of handling under normal mode.Again for example, in certain embodiments, the row that only is useful on impact damper/miniature high-speed memory buffer that figure and/or the frame buffer traffic are cushioned is monitored.
The power dma access that reduces
To the dma access of selecting handle and need not power on the operation of impact damper/miniature high-speed memory buffer of (or monitor and enable) to whole or portion C PU or the cache memory subsystem that is associated can be according to relevant flow process or uncorrelated flow process.Fig. 5 is that Fig. 6 is at uncorrelated flow process at relevant flow process.Relevant flow process and any combination of uncorrelated flow process can be used for having first distortion, second distortion and the 3rd distortion of the impact damper/miniature high-speed memory buffer that is positioned at processor inside or is positioned at the processor outside.
As long as impact damper/miniature high-speed memory buffer is served the DMA request, embodiment is (in described embodiment, impact damper/miniature high-speed memory buffer is outside processor or be included in (such as the impact damper that is positioned at the processor outside among Fig. 1/miniature high-speed memory buffer 112B) in the chipset) operation make the link of connection processing device and chipset keep outage, even and ask when serviced as DMA, bus in the processor and snoop logic can remain under the low power state, cause high power-saving.Under the operating position that impact damper/the miniature high-speed memory buffer is cleaned, interim " jumping to suddenly " higher-power state (such as being converted to C2, C1 or C0 from C3, C4, C5 etc.) of processor is to serve write-back explicitly with described cleaning.Associative processor and the chipset operated is postponed the processing memory traffic is revised the domain of dependence that state (for example, dirty row) is cleaned processor and is associated up to all impact dampers/miniature high-speed memory buffer.In certain embodiments, make chipset can participate in fully the domain of dependence (such as so-called by some can compatible X86 " front end " bus system of realizing of system) in.In chipset participated in some embodiment in the domain of dependence fully, impact damper/miniature high-speed memory buffer can be used as relevant cache memory and is monitored, and avoids explicit cleaning.
The operation that impact damper/miniature high-speed memory buffer is included in the embodiment of (such as the impact damper that is positioned at processor inside among Fig. 1/miniature high-speed memory buffer 112A) in the processor is whenever DMA activity link to connection processing device and chipset when processed powers on, so that the DMA activity is present in processor wherein from the chipset impact damper/miniature high-speed memory buffer of communicating by letter.Therefore, processor keeps at least a portion of processor control module to be powered on, with the responding DMA activity.In the embodiment that impact damper/the miniature high-speed memory buffer is operated under irrelevant mode, when the cache memory system that is associated with processor becomes (when being in full operation state or listening state such as withdraw from low power state or non-listening state when processor) can operate the time, impact damper/miniature high-speed memory buffer is by explicit cleaning.In relevant mode is finished drilling the embodiment that does, cause other power consumption in impact damper/miniature high-speed memory buffer, also do not use explicit cleaning although increment ground keeps relevant.
When impact damper/when the miniature high-speed memory buffer is used to handle dma access, can adopt any suitable replacement policy to come operation buffer/miniature high-speed memory buffer.In certain embodiments, replacement policy is selected new (or untapped) cache line (rather than the row that uses), perhaps select to have the newline of the priority higher, before the request write-back, other modification data are cushioned to allow impact damper/miniature high-speed memory buffer than the row that uses.Be suitable for selecting the row of untapped row rather than use under the operating position of impact damper/miniature high-speed memory buffer in the data volume that transmits, make whole data transmit and to be present in simultaneously in impact damper/miniature high-speed memory buffer.
Under the data set that the transmits operating position that data too big or that be associated with the address change in time for impact damper/miniature high-speed memory buffer, so in certain embodiments, depend on the several replacement policies of operation front and back use.For example, if data set be " fixing " but too big (promptly, need not will impact damper/miniature high-speed memory buffer be overflowed), then have only untapped row to be assigned with, and when all untapped row were used, distribution stopped.After using all row, stop to distribute and to bundle with overflowing access, cause more effective power operation.If data set changes (that is, with all data write buffer/miniature high-speed memory buffer) continuously, then when impact damper/when the miniature high-speed memory buffer was filled, distribution can stop.Perhaps, distribute and can be depending on legacy data and whether continued access and follow LRU or MRU strategy.In certain embodiments, replacement policy state (such as LRU or MRU state) is updated, and does not consider that the power rating processor just is operated.In certain embodiments, only when processor is operated, just upgrade the replacement policy state under C3 or darker state.
Significant bit in the row of impact damper/miniature high-speed memory buffer (such as the significant bit among Fig. 2 213) is eliminated when resetting in system and is the engineering noise coding, and the mode field in the row (but such as the mode field that is realized by dirty bit 214 in conjunction with optional speed buffering bit 215 among Fig. 2) is written as " free time " coding.By resetting remaining significant bit and write remaining mode field and come the continuation system reset to handle, all row in impact damper/miniature high-speed memory buffer are labeled engineering noise and " free time ".In certain embodiments, all row are handled simultaneously, and in certain embodiments, some row in the described row are subsequently processed.
After processing is reset by the system that finished, because row is brought into impact damper/miniature high-speed memory buffer, so corresponding significant bit is written as " effectively " coding.Corresponding mode field is set to " clear " coding to be used for the DMA read operation, perhaps is set to " dirty " coding to be used for the DMA write operation.When row was cleaned, mode field was written as " free time ", and significant bit keeps " effectively ".
When request during dma operation, comprise that the control module of impact damper/miniature high-speed memory buffer is powered on, and impact damper/miniature high-speed memory buffer by access to handle dma operation.If impact damper/miniature high-speed memory buffer comprise be used for described operation the data that read (promptly, " success "), the space that perhaps has the data that write that are used to store described operation, then impact damper/miniature high-speed memory buffer is served this dma operation, and does not use any part of any processor high speed buffer memory system.If impact damper/miniature high-speed memory buffer does not comprise the data that read (promptly, " failure "), the space that does not perhaps have the data that are used to write, then processor " jumps to " or is converted to listening state (such as C2 or higher) suddenly (under this state, make processor can respond the related communication amount), and processor is requested to serve this dma operation.Then, the data that provide with processor are come update buffer/miniature high-speed memory buffer.After programmable time quantum (especially, for example, by information from the mode register among Fig. 2 221), allow processor to turn back to lower power state.
When power state transition (or " change ") when incident is detected (dma operation that this and impact damper/miniature high-speed memory buffer satisfy is uncorrelated), at first that impact damper/miniature high-speed memory buffer is synchronous with processor (or any cache memory that is associated) by cleaning, allow storage operation (comprising dma operation) processed then.
In certain embodiments, only under low power state (such as any one and the darker state among C3, C4, the C5), impact damper/miniature high-speed memory buffer just is activated, wherein, relevant DMA is not supported, the cache memory system inoperation that is associated with (being in low power state) processor.Realized can some embodiment of compatible X-86 in, impact damper/miniature high-speed memory buffer is not activated under the C3 power rating, but is activated under C4 and darker power rating.In certain embodiments, impact damper/miniature high-speed memory buffer is activated under higher power rating (such as among C2, C1 and the C0 any one).
In certain embodiments, have only the part of DMA request to be cushioned device/miniature high-speed memory buffer and handle, filtered (referring to " memory range " chapters and sections in other place herein) by one or more range of physical addresses.In certain embodiments, all or a part of address scope are by processor or CPU programming.In certain embodiments, according to various embodiments, when the processor high speed buffer state memory because the operation under the low power state and can not be by access the time, perhaps when the processor high speed buffer state memory can be by access, perhaps when above-mentioned when either way taking place, via the time period of selecting to dynamic observing that DMA transmits, come all or a part of described scope are programmed by address observation logic.In some embodiment (such as some embodiments that in chipset, have impact damper/miniature high-speed memory buffer), filter according to DMA device identification (rather than address realm or except address realm).For example, also continue the device (such as network and USB interface) that moves even pattern information can be programmed with sign when outage or dormancy, have only the dma access of the device of tagging to be cushioned device/miniature high-speed memory buffer and handle (and do not consider be associated address) with access.
In certain embodiments, processor can " be filled " part of whole impact damper/miniature high-speed memory buffer or impact damper/miniature high-speed memory buffer in advance before entering low-power (such as non-monitoring) state.Processor is positioned at " effectively " row in impact damper/miniature high-speed memory buffer, and fills corresponding data according to the address information in the respective labels field.Under some operating positions (promptly, the simultaneous dma operation pair row identical with the exercisable first space before of impact damper/miniature high-speed memory buffer that is in low power state at processor carries out), the pre-filling reduced processing impact damper/miniature high-speed memory buffer failure required unexpected saltus step of processor power states in addition.
In certain embodiments, when processor is in low power state, processor can be filled the part of whole impact damper/miniature high-speed memory buffer or impact damper/miniature high-speed memory buffer in advance with the data that the device driver that is associated with the DMA device provides, wherein, the DMA device will produce dma access potentially.Device driver is determined the address that (maybe can learn) is associated with the request of initiating from the DMA device.Data from the address that is associated with request are copied in impact damper/miniature high-speed memory buffer, and label information is by relative set.
In certain embodiments, impact damper/miniature high-speed memory buffer is operated in combination with the relative fine granularity power control of processor high speed buffer memory system.For example, in all or any a part of processor all or any a part of cache memory do not comprise valid data (promptly, cut off the power supply fully and without hold mode) therefore and can not monitor, perhaps comprise valid data (promptly, fully powered on to keep data) but can not monitor the time, impact damper/miniature high-speed memory buffer activates.Again for example, when " interior " of cache memory system partly (such as one or more first order cache memories) can not monitor, and " outward " of cache memory system be (such as one or more second level or third level cache memory) in the time of can monitoring partly, impact damper/miniature high-speed memory buffer activates, and therefore can respond relevant issues.Again for example, when interior part can not be monitored, the part (such as second level cache memory) of outer part has been eliminated and has forbidden monitoring, and the remainder (such as third level cache memory) of outer part is in the time of can monitoring, and impact damper/miniature high-speed memory buffer activates.Can carry out removing by any combination of hardware and software agency.
The power dma access that reduces: associative operation
Fig. 5 shows the operation of the selection that the embodiment by the relevant impact damper/miniature high-speed memory buffer that realizes being used for satisfying backstage DMA device access (be included in the impact damper such as Fig. 1/miniature high-speed memory buffer 112A of processor or be positioned at the impact damper such as Fig. 1/miniature high-speed memory buffer 112B of processor outside) carries out.Handle according to two main flows, a main flow is that DMA reads access, and another main flow is that DMA writes access.With dma access (" free time " 501), this is handled according to the type (that is, read or write) of access and continues from DMA device (" receiving DMA " 502) in the processing of each flow process.
Can be satisfied by the data in impact damper/miniature high-speed memory buffer (such as among the impact damper among Fig. 1/miniature high-speed memory buffer 112A-B) Already in by determining to read (" success? " 503R), begin the processing (" reading " 502R) that DMA reads.If be not ("No" 503RN), then handle and continue, with determine impact damper/miniature high-speed memory buffer have any residue row of can be used for distributing (" space can be used? " 504R).If be not ("No" 504RN), then select the row (" selection process object " 505R) that is used to eliminate from impact damper/miniature high-speed memory buffer.If the row of selecting has the data (" dirty " 505RD) of any modification, then this row is stored in (" being written back to processor " 506R) in the domain of dependence.Then, this row is assigned to and is used for just processed DMA and reads (" keep row " 507R).If this row before be not dirty (" cleaning " 505RC), then do not carry out write-back, and this row is assigned with (" keeping row " 507R) immediately.If there is available residue row ("Yes" 504RY), then do not select process object (therefore also not having write-back), and the row of selecting is assigned with (" keeping row " 507R) immediately.
Being expert to be assigned to is used for after the DMA reading of data, and dma access is delivered to the domain of dependence to be used for further processing (" to the DMA request of processor " 508R).(such as jump to suddenly monitor initiate mode after) data provide by the domain of dependence, be stored in assigned buffers/miniature high-speed memory buffer row, and be marked as " removing " and " effectively " (" writes; Mark ' cleaning ' and ' effectively ' " 509R).Data also are provided for DMA device (" data auto levelizer " 510R), the finishing dealing with of dma access, and wait for the beginning (" free time " 501) of new dma access.If having had data necessary, impact damper/miniature high-speed memory buffer reads access ("Yes" 503RY) to satisfy DMA, then do not need failure to handle, and data are sent to DMA device (" data auto levelizer " 510R) immediately, have omitted capable distribution and padding.
By determine in impact damper/miniature high-speed memory buffer, whether to have divided be used in the row that writes (" success? " 503W), begin the processing that DMA writes (" writing " 502W).If be not ("No" 503WN), then handle and continue, with determine impact damper/miniature high-speed memory buffer have any residue row of can be used for distributing (" space can be used? " 504W).If be not ("No" 504WN), then select the row (" selection process object " 505W) that is used to eliminate from impact damper/miniature high-speed memory buffer.If the row of selecting has the data (" dirty " 505WD) of any modification, then this row is stored in (" being written back to processor " 506W) in the domain of dependence.Then, this row is assigned to and is used for just processed DMA and writes (" keep row " 507W).If this row before be not dirty (" cleaning " 505WC), then do not carry out write-back, and this row is assigned with (" keeping row " 507W) immediately.If there is available residue row ("Yes" 504WY), then do not select process object (therefore also not having write-back), and the row of selecting is assigned with (" keeping row " 507W) immediately.
Being expert to be assigned to is used for DMA and writes after the data, and DMA writes data and is stored in wherein, and is marked as non-cleaning and (" writes; Mark ' dirty ' " 508W).Finish the processing of dma access then, and wait for the beginning (" free time " 501) of new dma access.Be assigned to the row ("Yes" 503WY) that is used for DMA and writes if impact damper/miniature high-speed memory buffer has had, then do not need failure to handle, and DMA writes data instant and is stored in impact damper/miniature high-speed memory buffer and (" writes; Mark ' dirty ' " 508W), omitted capable batch operation.
The power dma access that reduces: irrelevant operation
Fig. 6 shows the operation of the selection that the embodiment by the incoherent impact damper/miniature high-speed memory buffer that realizes being used for satisfying backstage DMA device access (be included in the impact damper such as Fig. 1/miniature high-speed memory buffer 112A of processor or be positioned at the impact damper such as Fig. 1/miniature high-speed memory buffer 112B of processor outside) carries out.Handle according to two main flows, a main flow is to enter lower power state (" low C state " 600L), and another main flow is to enter higher-power state (" higher C state " 600H).
The data as much as possible in auto-correlation territory are filled (or " pre-filling ") impact damper/miniature high-speed memory buffer since being used for for the processing that enters lower power state, thereby DMA as much as possible can be satisfied by impact damper/miniature high-speed memory buffer, and need not power on to domain of dependence parts (such as CPU or the cache memory subsystem that is associated).Enter lower power state and depend on finishing of filling, and do not consider whether whole impact damper/miniature high-speed memory buffer is filled or whether does not have impact damper/miniature high-speed memory buffer to be filled.
Begin lower power state with the notice of the lower-wattage C state that is converted to (" enter comparatively low C state " 601L) (such as in the time entering dark C state (for example, C3, C4 etc.)) of expectation and enter processing (" free time " 601).Determine in impact damper/miniature high-speed memory buffer, whether to exist any residue row of can be used for receiving system data (that is, have " effectively " label and have " free time " state) (" multirow more? " 602L).If for being ("Yes" 602LY), then handling and continue, to select (" selecting row " 603L) in " effectively " and " free time " row.Then, obtain data with (" from the data of system " 604L) the row that is stored in selection from the domain of dependence.Described data are stored in this row, and are marked as cleaning and (" write; Mark ' cleaning ' " 605L), because this row no longer " free time ", so that this row is not useable for other system data.
Flow process is returned then, with determine in impact damper/miniature high-speed memory buffer, to exist any other row that can be used for the receiving system data (" multirow more? " 602L).If there is not other available row ("No" 602LN), then fill and be done for the impact damper/miniature high-speed memory buffer that enters the lower power state preparation, impact damper/miniature high-speed memory buffer is ready to enter lower power state, and flow process is returned, to wait for another C state exchange (" free time " 601).
In certain embodiments, omitted and the relevant processing of the power rating that enters reduction (" low C state " 600L), that is, do not had " the pre-filling " to impact damper/miniature high-speed memory buffer.
By impact damper/miniature high-speed memory buffer being emptied the data of speed buffering when any domain of dependence parts (such as CPU or the cache memory subsystem that is associated) are in lower-wattage (or monitoring forbidding) state, the impact damper/miniature high-speed memory buffer and the domain of dependence is synchronous for the processing that enters higher-power state (or monitoring initiate mode).Therefore, impact damper/miniature high-speed memory buffer is by the new data of corresponding data in all comparable domains of dependence of explicit cleaning.
Begin processing (" free time " 601) with the notice of the higher-wattage C state that is converted to (" entering higher C state " 601H) of expectation (such as monitoring when enabling C state (for example, C2, C1 or C0)) to entering higher-power state when entering.Determine in impact damper/miniature high-speed memory buffer, whether to exist any residue row that may have the new data (that is, having the state except " free time ") that will be written back to the domain of dependence such as " cleaning " or " dirty " (" multirow more? " 602H).If for being ("Yes" 602HY), then handling and continue, (" selecting row " 603H) in non-to select " free time " row.If the row of selecting has the data of any modification, such as indicating (" dirty " 603HD) by " dirty " state, then this row is stored in (" being written back to the domain of dependence " 604H) in the domain of dependence, and the state of going is then changed into " free time " (" mark ' free time ' " 605H).If the capable data of selecting that do not have modification such as by " cleaning " state indication (" cleaning " 603HC), then omit write-back, and described capable state are changed into the free time (" mark ' free time ' " 605H) immediately.
Flow process is returned then, with determine whether need to check the other row that is used for possible new data (" multirow more? " 602H).If do not need the other row ("No" 602HN) handled, then impact damper/miniature high-speed the memory buffer and the domain of dependence are synchronous, access to the domain of dependence can recover, impact damper/miniature high-speed memory buffer is ready to enter higher-power state, and flow process is returned, to wait for another C state exchange (" free time " 601).
Data compression/decompression contracts
In certain embodiments, be stored in data in impact damper/miniature high-speed memory buffer (such as figure new data more) and be retained with the form of compression, and decompressed when access.According to various embodiment, can in any combination of GPU and processor system (or chipset (if being implemented as isolated system)), carry out compression or one of decompression operation or both.
Decompressing than being compressed among the relatively low embodiment of cost in the calculating, processing can comprise following content.According to presenting corresponding address realm with frame buffer unpressed, GPU is from processor system (or chipset) request primitive frame buffer data.Processor system (or chipset) is original (promptly from the storer taking-up, unpressed) frame buffer data, described storer according to recently and the most accessible duplicate resident position comprise any combination of processor write buffer, the first order and second level cache memory, impact damper/miniature high-speed memory buffer and DRAM.
Then, GPU compresses raw data, and the packed data of gained is written to the corresponding address realm that presents with the compression of frame buffer (or its part), and described frame buffer (or its part) can map directly to graphics buffer.Point to the data (being adapted to pass through the expansion of GPU) that read the reception compression that present of compression, and unpressed reading from graphics buffer of presenting receives decompressed data, wherein, when launching the suitable part of packed data, provide this decompressed data by processor system (or chipset).Therefore, processor system (or chipset) offers device except GPU with the outward appearance (or view) of unpressed frame buffer.According to various embodiment, simple relatively graphics device (such as the simple CRT controller that can not decompress), debugging operations and software are played up any combination of function can use unpressed frame buffer view.
In the embodiment that the bus utilization will be minimized or GPU is simplified, processing can comprise following content.According to the corresponding address realm that presents of the compression of frame buffer, GPU is from the frame buffer data of processor system (or chipset) request compression.If also there are not the data of request in graphics buffer, then processor system (or chipset) takes out original (that is, the unpressed) frame buffer data that is fit to from storer.Therefore, according to recently and the most accessible duplicate resident position, storer comprises any combination of processor write buffer, the first order and second level cache memory, impact damper/miniature high-speed memory buffer and DRAM.
Then, processor system (or chipset) compresses raw data, and the packed data of gained is written in the graphics buffer.The packed data of GPU request returns from graphics buffer then, and is launched (that is decompression) by GPU.Therefore, packed data does not have only back and forth singlely on a bus, reduced energy consumption and bandwidth thus and used, and processor system (or chipset) keeps ability that unpressed frame buffer data is carried out access.
Carry out among some embodiment of compression and decompression at GPU, processor system (or chipset) lacks the direct access to unpressed frame buffer.Therefore, GPU is that virtual frame buffer is provided as the virtual frame buffer address realm that the access of device except GPU (such as CPU, video mirror image peripherals and other similar requestor of seeking frame buffer data) defines.
In the embodiment of some aforementioned compression/de-compression, graphics buffer is implemented as the part of whole impact damper/miniature high-speed memory buffer or impact damper/miniature high-speed memory buffer.In certain embodiments, the graphics buffer part of impact damper/miniature high-speed memory buffer is according to first impact damper/miniature high-speed buffer storage supervisory strategy is operated, and the remainder of impact damper/miniature high-speed memory buffer is operated according to second impact damper/miniature high-speed buffer storage supervisory strategy.For example, first impact damper/miniature high-speed buffer storage supervisory strategy can comprise maintenance whole graphics buffer " cleaning " (that is, not having the row that is in dirty situation).Keep the graphics buffer cleaning to make the necessity of cleaning graphics buffer be eliminated, and being written in for frame buffer address realm (different) in some designs, with the graphics buffer address realm such as the independent speed buffering of quilt in another part of impact damper/miniature high-speed memory buffer.Under some operating positions, write corresponding from the zone (video-in-window region) that video capture device (or card) is directed to video the window with data.The Video Capture data are rewritten continually, and this data storage can be reduced the DRAM access largely in impact damper/miniature high-speed memory buffer.
In certain embodiments, be independent of normal mode and impact damper pattern as described in Figure 3, provide graphics buffer by impact damper/miniature high-speed memory buffer.In other words, even when all CPU operate under full power state and/or high performance state, the graphics buffer by impact damper/miniature high-speed memory buffer also is exercisable, has therefore reduced the normal CPU DRAM access of operating period.
Although but the description of front concentrates on the non-speed buffering traffic that GPU provides, these are described also can be by application with being equal to, and be not limited to from any agency (such as any DMA device) but the non-speed buffering traffic.For example, according to various embodiment, can by impact damper/miniature high-speed memory buffer handle from various dma agents (such as network controller, memory interface and other similar high bandwidth I/O parts) but the non-speed buffering traffic.
Be included in the embodiment of the impact damper/miniature high-speed memory buffer in the processor
Fig. 7 A-7F show with Fig. 1 in entire process device chip 102 or the part of processor chips 102 relevant be included in processor in the contextual various embodiment that impact damper/the miniature high-speed memory buffer is associated.The various layouts that these illustrate CPU and the cache memory subsystem that is associated comprise several combinations of L1, L2 and L3 cache architecture.These figure also show the impact damper/miniature high-speed memory buffer that is included in the processor different with cache memory subsystem or with the embodiment of cache memory subsystem combination.
Fig. 7 A shows the processor chips 102A as the distortion of processor chips 102, processor chips 102A has 4 CPU and L1 unit 700.0-3, and these 4 CPU and L1 unit 700.0-3 are connected to the control module 130A with the impact damper/miniature high-speed memory buffer 112A that is included in the processor.Other parts (such as dram controller) can be included in the processor chips, but for the sake of simplicity, are omitted from figure.According to various embodiments, CPU and L1 unit can comprise one or more CPU and one or more L1 cache memory (such as the instruction and data cache memory) respectively.Although show 4 CPU and L1 unit, those of ordinary skills it should be understood that and can use more or less unit.In certain embodiments, each in these CPU and the L1 unit is identical, and in certain embodiments, one or more CPU can be different (that is, comprising CPU or the cache memory with higher or lower power or Performance Characteristics) with the L1 unit.In certain embodiments, in one or more CPU and L1 unit, realize the part of whole impact damper/miniature high-speed memory buffer or impact damper/miniature high-speed memory buffer.
Fig. 7 B shows the processor chips 102B as the distortion of processor chips 102, processor chips 102B has a pair of processor 701.0-1, and described a pair of processor 701.0-1 is connected to the control module 130A with the impact damper/miniature high-speed memory buffer 112A that is included in the processor.Other parts (such as dram controller) can be included in the processor chips, but for the sake of simplicity, are omitted from figure.As shown in the drawing, each in the described processor comprises a pair of CPU and L1 unit, and described a pair of CPU and L1 unit are connected to shared L2 cache memory (having CPU and L1 unit 710.0-1 and L2711.0 such as processor 701.0).Then, the L2 cache memory is connected to control module, with impact damper/miniature high-speed memory buffer swap data.Although show a pair of like this processor, each processor has a pair of CPU, and those of ordinary skills it should be understood that and can use more or less CPU in each processor, and can use more or less processor.In certain embodiments, each processor is identical, and in certain embodiments, one or more processors can be different (such as having more or less CPU).In certain embodiments, each in these CPU and the L1 unit is identical, and in certain embodiments, one or more CPU can be different (that is, comprising CPU or the cache memory with higher or lower power or Performance Characteristics) with the L1 unit.
Fig. 7 C shows the processor chips 102C as the distortion of processor chips 102, and except L2 cache resources in single processor 701 was individual unit (L2 711), processor chips 102C and processor chips 102B (Fig. 7 B) were similar.Other parts (such as dram controller) can be included in the processor chips, but for the sake of simplicity, are omitted from figure.In the embodiment shown in Fig. 7 A and Fig. 7 B, the quantity of CPU and L1, layout and characteristic can change according to embodiment.
Fig. 7 D shows the processor chips 102D as the distortion of processor chips 102, and except L2 and impact damper/miniature high-speed memory buffer were combined, processor chips 102D and processor chips 102C (Fig. 7 C) were similar.Control module 130D is except being suitable for managing by being included in the impact damper/miniature high-speed memory buffer 112D that realizes among the L2 711D, 130A is similar with control module, and L2 711D is similar with L2 711 except comprising described impact damper/miniature high-speed memory buffer.In certain embodiments, be used as impact damper/miniature high-speed memory buffer, realize comprising impact damper/miniature high-speed memory buffer by a part that keeps L2.Described reservation can be according to the quantity of the mode among the L2 or sign or any other similar mechanism (referring to " but minimizing of the DRAM access by the non-cache " chapters and sections in other place herein, to obtain more information).In embodiment shown in Fig. 7 A-7C, other parts can be included in the processor chips, and the quantity of CPU and L1, layout and characteristic can change according to embodiment.
Fig. 7 E shows the distortion as the processor chips 102 of processor chips 102E, except other one deck cache memory is inserted between CPU and the impact damper/miniature high-speed memory buffer as the L3720, processor chips 102E and processor chips 102B (Fig. 7 B) are similar.In the embodiment shown in Fig. 7 A-7D, other parts can be included in the processor chips, and the quantity of CPU, L1 and L2, layout and characteristic can change according to embodiment.
Fig. 7 F shows the distortion as the processor chips 102 of processor chips 102F, and except L3 and impact damper/miniature high-speed memory buffer were combined, processor chips 102F and processor chips 102E (Fig. 7 E) were similar.Control module 130F realizes among the L3 720F impact damper/miniature high-speed memory buffer 112F by being included in except being suitable for management, 130A is similar with control module, and L3 720F is similar with L3 720 except comprising described impact damper/miniature high-speed memory buffer.Similar with the embodiment shown in Fig. 7 D, be used as impact damper/miniature high-speed memory buffer by a part that keeps L3, realize comprising impact damper/miniature high-speed memory buffer.Described reservation can be according to the quantity of the mode among the L3 or sign or any other similar mechanism (referring to " but minimizing of the DRAM access by the non-cache " chapters and sections in other place herein, to obtain more information).
Conclusion
Though described the foregoing description in detail for the ease of the clear purpose of understanding, the details that is provided be provided.There is multiple realization mode of the present invention.The disclosed embodiments are schematically, rather than restrictive.
Should be appreciated that the various deformation in structure, layout and the use can be consistent with this instruction, and in the scope of the appended claim of this patent of issue.For example, the type of the technology of interconnection and function-unit bit-widths, clock speed and use can be changed in each blocking usually.The title of giving interconnection and logic only is schematically, and should not be interpreted as limiting the design of being instructed.The order of flow process, process flow diagram processing, action and functional part and layout can be changed usually.In addition, unless opposite description is arranged especially, otherwise the value scope of appointment, maximal value and the minimum value used, or other special standard (but quantity and the type quoted such as non-speed buffering, the quantity of DMA device and type, the quantity of impact damper/miniature high-speed memory buffer, capacity and tissue, the quantity of the field in impact damper/miniature high-speed buffer memory structure and the mode register that is associated, width and tissue, and the quantity in clauses and subclauses in register and the impact damper or stage) only be schematic embodiment, can be contemplated to improvement and the variation followed the tracks of in the realization technology, and should not be interpreted as restriction.
The technology (rather than the technology that has illustrated) that is equal on the known function of those of ordinary skills be can adopt, various assemblies, subsystem, function, operation, routine and subroutine realized.Will also be understood that, can be (promptly according to hardware, general special circuit) or software (that is, by the controller of programming or certain mode of processor) with a plurality of design functions aspect be embodied as fast processing (it is convenient to previous function transplanting in hardware in software) and more high density of integration (it is convenient to previous function transplanting in software in hardware) technological trend and realize the function of dependence design constraint.Specific distortion can include, but is not limited to: the difference of division; Different form factors and configuration; The different operating system and the use of other system software; The use of different interface standards, procotol or communication link; And other distortion that when the design that realizes here according to the exclusive engineering of application-specific and commercial constraint being instructed, will be expected.
By details and on every side context show these embodiment well, exceeded the required content of Minimal Realization of the many aspects of the design of being instructed.Those of ordinary skills it should be understood that these distortion can omit disclosed assembly or feature, and the different basic cooperation that changes between the remainder.Therefore, should be appreciated that, do not need much more so disclosed details to realize the various aspects of the design of being instructed.In this sense, remainder is different with prior art, and the assembly and the feature that can be omitted do not limit the design of being instructed here.
All these designed distortion comprise that the unsubstantiality of the instruction that illustrative examples is passed on changes.The design that should also be understood that here to be instructed is calculated for other and working application has widespread use, the industry of the embodiment that is not limited to illustrate or application-specific.Therefore, the present invention should be interpreted as comprising that the institute in the scope of the claim that this patent that falls into issue is appended might revise and be out of shape.

Claims (22)

1, a kind of method comprises:
But at least a portion of the data of non-speed buffering is stored in the cache memory, described cache memory be configured to that microprocessor operating is associated under the impact damper pattern.
2, method according to claim 1, wherein, but the data of non-speed buffering are display update data (DRD).
3, method according to claim 1, wherein, the impact damper pattern comprises: allow microprocessor to operate under low power state.
4, method according to claim 1, wherein, described storage comprises: use lower limit register and upper limit register, but be complementary with at least a portion and the cache size of the data of specifying described non-speed buffering.
5, method according to claim 1, wherein, described storage comprises: use base register and offset-limited register, but, but make the size and the cache size of at least a portion of data of described non-speed buffering be complementary with at least a portion of the data of specifying described non-speed buffering.
6, method according to claim 1, wherein, described storage comprises: use at least one request attribute, but the described request attribute is used to identify the source of the data of non-speed buffering.
7, method according to claim 1 wherein, is used direct mapping policy under the impact damper pattern.
8, method according to claim 1 wherein, starts described storage by normal mode to the impact damper mode event.
9, method according to claim 1, wherein, normal mode comprises to the impact damper mode event: microprocessor is in the power rating of reduction.
10, method according to claim 1, wherein, but the data storage of non-speed buffering in cache memory, but make the amount of data of non-speed buffering be no more than cache size.
11, a kind of microprocessor system comprises:
Processor; And
Cache memory system, described cache memory system comprises cache memory and is connected to the controller of described cache memory, described controller is configured to when microprocessor is operated under the impact damper pattern, but at least a portion of the data of non-speed buffering is stored in the described cache memory.
12, the microprocessor of stating according to claim 11, wherein, but the data of non-speed buffering are display update data (DRD).
13, microprocessor according to claim 11, wherein, the impact damper pattern comprises: allow microprocessor system to operate under low power state.
14, microprocessor according to claim 11, wherein, controller also is configured to use lower limit register and upper limit register, but with at least a portion of the data of specifying described non-speed buffering.
15, microprocessor according to claim 11, wherein, controller also is configured to use base register and offset-limited register, but is complementary with at least a portion and the cache size of the data of specifying described non-speed buffering.
16, microprocessor according to claim 11, wherein, described storage comprises: use at least one request attribute, but the described request attribute is used to identify the source of the data of non-speed buffering.
17, microprocessor according to claim 11, wherein, controller also is configured to use direct mapping policy when microprocessor is in the impact damper pattern.
18, microprocessor according to claim 11 wherein, is stored in the cache memory but started by controller at least a portion with the data of non-speed buffering to the impact damper mode event by normal mode.
19, microprocessor according to claim 11, wherein, normal mode comprises to the impact damper mode event: microprocessor is in the power rating of reduction.
20, microprocessor according to claim 11, wherein, but controller also is configured to data storage with non-speed buffering in cache memory, but makes the amount of data of non-speed buffering be no more than cache size.
21, a kind of computer-readable medium that comprises programmed instruction, described programmed instruction is used for:
But at least a portion of the data of non-speed buffering is stored in the cache memory, described cache memory be configured to that microprocessor operating is associated under the impact damper pattern.
22, a kind of medium that comprises description of computer system-readable, when described description during by computer system interprets, described description produces flow process, and described flow process comprises:
But at least a portion of the data of non-speed buffering is stored in the cache memory, described cache memory be configured to that microprocessor operating is associated under the impact damper pattern.
CN2006800508506A 2005-11-15 2006-11-14 Power conservation via DRAM access Active CN101356511B (en)

Applications Claiming Priority (13)

Application Number Priority Date Filing Date Title
US73673605P 2005-11-15 2005-11-15
US73663205P 2005-11-15 2005-11-15
US60/736,736 2005-11-15
US60/736,632 2005-11-15
US76122006P 2006-01-23 2006-01-23
US60/761,220 2006-01-23
US11/351,070 US7516274B2 (en) 2005-11-15 2006-02-09 Power conservation via DRAM access reduction
US11/351,070 2006-02-09
US11/559,133 2006-11-13
US11/559,192 US7899990B2 (en) 2005-11-15 2006-11-13 Power conservation via DRAM access
US11/559,192 2006-11-13
US11/559,133 US7904659B2 (en) 2005-11-15 2006-11-13 Power conservation via DRAM access reduction
PCT/US2006/044129 WO2007097791A2 (en) 2005-11-15 2006-11-14 Power conservation via dram access

Publications (2)

Publication Number Publication Date
CN101356511A true CN101356511A (en) 2009-01-28
CN101356511B CN101356511B (en) 2012-01-11

Family

ID=40308486

Family Applications (2)

Application Number Title Priority Date Filing Date
CN2006800508506A Active CN101356511B (en) 2005-11-15 2006-11-14 Power conservation via DRAM access
CN2006800507749A Active CN101356510B (en) 2005-11-15 2006-11-14 Small and power-efficient cache that can provide data for background DMA devices while the processor is in a low-power state

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN2006800507749A Active CN101356510B (en) 2005-11-15 2006-11-14 Small and power-efficient cache that can provide data for background DMA devices while the processor is in a low-power state

Country Status (1)

Country Link
CN (2) CN101356511B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262035A (en) * 2010-12-15 2013-08-21 超威半导体公司 Device discovery and topology reporting in a combined CPU/GPU architecture system
CN104407985A (en) * 2014-12-15 2015-03-11 泰斗微电子科技有限公司 Memorizer address mapping method and memorizer address mapping system
CN110569001A (en) * 2019-09-17 2019-12-13 深圳忆联信息系统有限公司 Solid state disk-based method and device for marking dirty bit of L2P table
CN111522754A (en) * 2012-08-17 2020-08-11 英特尔公司 Memory sharing through unified memory architecture
CN112969002A (en) * 2021-02-04 2021-06-15 浙江大华技术股份有限公司 Image transmission method and device based on PCIe protocol and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112012006164B4 (en) 2012-03-31 2020-03-19 Intel Corporation Control power management in microservers
CN109727183B (en) * 2018-12-11 2023-06-23 中国航空工业集团公司西安航空计算技术研究所 Scheduling method and device for compression table of graphics rendering buffer

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219745B1 (en) * 1998-04-15 2001-04-17 Advanced Micro Devices, Inc. System and method for entering a stream read buffer mode to store non-cacheable or block data
EP1157370B1 (en) * 1999-11-24 2014-09-03 DSP Group Switzerland AG Data processing unit with access to the memory of another data processing unit during standby
JP3857661B2 (en) * 2003-03-13 2006-12-13 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing apparatus, program, and recording medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262035A (en) * 2010-12-15 2013-08-21 超威半导体公司 Device discovery and topology reporting in a combined CPU/GPU architecture system
CN111522754A (en) * 2012-08-17 2020-08-11 英特尔公司 Memory sharing through unified memory architecture
CN111522754B (en) * 2012-08-17 2023-12-12 英特尔公司 Memory sharing through unified memory architecture
CN104407985A (en) * 2014-12-15 2015-03-11 泰斗微电子科技有限公司 Memorizer address mapping method and memorizer address mapping system
CN104407985B (en) * 2014-12-15 2018-04-03 泰斗微电子科技有限公司 Storage address mapping method and storage address mapped system
CN110569001A (en) * 2019-09-17 2019-12-13 深圳忆联信息系统有限公司 Solid state disk-based method and device for marking dirty bit of L2P table
CN112969002A (en) * 2021-02-04 2021-06-15 浙江大华技术股份有限公司 Image transmission method and device based on PCIe protocol and storage medium
CN112969002B (en) * 2021-02-04 2023-07-14 浙江大华技术股份有限公司 Image transmission method and device based on PCIe protocol and storage medium

Also Published As

Publication number Publication date
CN101356510B (en) 2013-04-03
CN101356511B (en) 2012-01-11
CN101356510A (en) 2009-01-28

Similar Documents

Publication Publication Date Title
US7958312B2 (en) Small and power-efficient cache that can provide data for background DMA devices while the processor is in a low-power state
US7412570B2 (en) Small and power-efficient cache that can provide data for background DNA devices while the processor is in a low-power state
US7516274B2 (en) Power conservation via DRAM access reduction
US7899990B2 (en) Power conservation via DRAM access
CN101356511B (en) Power conservation via DRAM access
EP0936555B1 (en) Cache coherency protocol with independent implementation of optimised cache operations
US6370622B1 (en) Method and apparatus for curious and column caching
CN100416515C (en) Cache line flush micro-architectural implementation method ans system
US5787478A (en) Method and system for implementing a cache coherency mechanism for utilization within a non-inclusive cache memory hierarchy
CN103246613B (en) Buffer storage and the data cached acquisition methods for buffer storage
US20050160234A1 (en) Multi-processor computing system that employs compressed cache lines' worth of information and processor capable of use in said system
US6334172B1 (en) Cache coherency protocol with tagged state for modified values
CN100419715C (en) Embedded processor system and its data operating method
CN102063406A (en) Network shared Cache for multi-core processor and directory control method thereof
CN103365794A (en) Data processing method and system
CN103348333A (en) Methods and apparatus for efficient communication between caches in hierarchical caching design
US20110302374A1 (en) Local and global memory request predictor
EP1552396B1 (en) Data processing system having a hierarchical memory organization and method for operating the same
US6247098B1 (en) Cache coherency protocol with selectively implemented tagged state
CN100592268C (en) Method and apparatus for joint cache coherency states in multi-interface caches
US6701416B1 (en) Cache coherency protocol with tagged intervention of modified values
US6341336B1 (en) Cache coherency protocol having tagged state used with cross-bars
JPH10301850A (en) Method and system for providing pseudo fine inclusion system in sectored cache memory so as to maintain cache coherency inside data processing system
US20070101064A1 (en) Cache controller and method
TWI352906B (en) Method, microprocessor system, medium, memory elem

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant