TWI524184B - A method for processing address conflict, device and system in a distributed memory organization in - Google Patents

A method for processing address conflict, device and system in a distributed memory organization in Download PDF

Info

Publication number
TWI524184B
TWI524184B TW103105661A TW103105661A TWI524184B TW I524184 B TWI524184 B TW I524184B TW 103105661 A TW103105661 A TW 103105661A TW 103105661 A TW103105661 A TW 103105661A TW I524184 B TWI524184 B TW I524184B
Authority
TW
Taiwan
Prior art keywords
memory
conflict
request
cache
memory access
Prior art date
Application number
TW103105661A
Other languages
Chinese (zh)
Other versions
TW201447580A (en
Inventor
Ramadass Nagarajan
Robert G Milstrey
Michael T Klinglesmith
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US13/785,908 priority Critical patent/US9405688B2/en
Application filed by Intel Corp filed Critical Intel Corp
Publication of TW201447580A publication Critical patent/TW201447580A/en
Application granted granted Critical
Publication of TWI524184B publication Critical patent/TWI524184B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Description

Method, device and system for handling address conflicts in a decentralized memory organization architecture Field of invention

The present invention relates to computing systems, and in particular (but not exclusively), to dealing with address conflicts in distributed memory organizations.

Background of the invention

As computing systems progress, the components become more complex. As a result, the complexity of the interconnect architecture used to couple components and communicate between components is also increased to ensure that the bandwidth requirements for optimal component operation are met. In addition, different market segments require different aspects of the interconnect architecture to meet market needs. For example, servers require higher performance, and the mobile ecosystem can sometimes sacrifice total performance to save power. However, providing the highest possible performance at the most power-saving situation is a prominent goal of most organizations. Numerous interconnects are discussed below that would potentially benefit from the aspects of the invention described herein.

Summary of invention

According to an embodiment of the present invention, a device is specifically provided, comprising: a memory access request arbiter configured to grant a memory access request from a plurality of input memory access requests, The plurality of input memory access requests include: a memory access request originating from a plurality of cache agents; a memory access request originating from a plurality of input/output (I/O) agents; and previously arbitrated by the arbitration a conflicting memory access request arbitrated by the device, each memory access request identifying an address associated with one of the cache lines for requesting access; a decentralized memory organization including the assembled a plurality of pipelines operating in parallel; at least one cache proxy conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine whether a currently evaluated memory access request is Conflicting with another pending memory access request and being configured to queue conflicting memory access requests from the cache agent into the at least one cache agent conflict queue, and from the I/O agent Conflict The body memory access request into at least one I / O queue proxy conflict.

100, 910‧‧ ‧ processor

101, 102, 202-0, 202-N, 806, 807‧‧ core

101a, 101b‧‧‧ Hardware Thread Slot/Architecture Status Register/Logical Processor

102a, 102b‧‧‧Architecture Status Register

105‧‧‧ busbar

110‧‧‧On-chip interface module/core (core upper part)

120‧‧‧Instruction Translation Buffer (I-TLB)/Branch Target Buffer/Extraction Unit

125‧‧‧Decoding Module/Decoding Logic/Decoder

126‧‧‧Decoder

130‧‧‧Distributor and Rename Block

135‧‧‧Reorder/Retirement Unit/Unordered Unit

140‧‧‧Execution unit

150‧‧‧Data Translator Buffer (D-TLB)

175, 308-0, 308-1‧‧‧ system memory

176‧‧‧Application code

177‧‧‧Translator code

180‧‧‧Graphics/Graphics Processor

200‧‧‧System Architecture

202‧‧‧Processor Core/Cache Agent

204‧‧‧Consistency unit

206, 206a‧‧‧System Agent

208-0, 206-1‧‧‧ memory controller

209‧‧‧ main interface

210‧‧‧IO complex

212‧‧‧Main IO Exchange Organization

214, 216, 218‧ ‧ exchange organization

220‧‧‧ Bridge

222, 222-1, 222-2, 222-3, 222-4, 222-5, 222-6, 222-7, 222-L, 222-M‧‧‧ IO agents

302‧‧‧Shared Memory Organization

304‧‧‧Common Memory Access Request Arbiter (Arbiter)

306-0, 306-1‧‧‧ memory tissue pipeline

310‧‧‧Shared Consistency Organization

312-0, 312-1‧‧ ‧ Consistent organization pipeline

314‧‧‧ Miscellaneous functions

316‧‧‧Common snooping/responding to arbitration logic

400-0, 400-N‧‧‧ cache proxy request queue

401‧‧‧I/O root complex request queue

402‧‧‧ by category "I/O conflict queue

404-0, 404-1‧‧‧ Conflict Check Logic Block

406-0, 406-1‧‧‧ cache proxy conflict queue

407-0, 407-1‧‧‧ scoreboard

408‧‧‧ Conflict Sorting Block (COB)

410, 411, 412, 413, 415, 417 ‧ ‧ forward and reverse

414‧‧‧Reject IO request multiplexer (mux)

416‧‧‧Reject IO request to demultiplexer

418‧‧‧Clash Sequence Arbitrator

420‧‧‧Hatch Logic

700, 900‧‧‧ system

702, 800‧‧‧System Single Chip (SoC)

704‧‧‧ motherboard

706‧‧‧Chassis

708, 815‧‧‧Graphic Processing Unit (GPU)

710, 845‧‧‧ flash controller

712‧‧‧ Large capacity storage device

714‧‧‧ display

716-0, 716-1‧‧‧ memory

Section 718‧‧‧

720‧‧‧PCIe root complex

722, 724‧‧‧PCIe roots

726‧‧‧ IEEE 802.11 (also known as "WiFi") interface on the chip

728‧‧‧WiFi radio chip

730‧‧‧Common Serial Bus (USB) 2 or USB3 Interface

734‧‧‧USB2/USB3 interface chip

736‧‧‧Antenna

738‧‧‧USB2/USB3埠

740‧‧‧Mobile Phone

742‧‧‧ Tablet PC

744‧‧‧ portable computer (eg laptop, laptop or Ultrabook TM )

746‧‧‧Basic Input/Output Software (BIOS)

808‧‧‧Cache memory control

809‧‧‧ bus interface unit

810‧‧‧L2 cache memory

820‧‧‧Video codec

825‧‧‧Video interface

830, 957‧‧‧ User Identity Identification Module (SIM)

835‧‧‧ boot rom

840‧‧‧SDRAM controller

860‧‧‧ Dynamic Random Access Memory (DRAM)

865‧‧‧flash

870‧‧‧Bluetooth Module

875‧‧3G data machine

880‧‧‧Global Positioning System (GPS)

885‧‧‧WiFi

915‧‧‧System Memory

920‧‧‧large capacity storage device

922‧‧‧flash device

924‧‧‧ display

925‧‧‧ touch screen

930‧‧‧ Trackpad

935‧‧‧ embedded controller

936‧‧‧ keyboard

937‧‧‧fan

938‧‧‧Trusted Platform Module (TPM)

939‧‧‧ Thermal Sensor

940‧‧‧Sensor Hub

941‧‧‧Accelerometer

942‧‧‧ Ambient Light Sensor (ALS)

943‧‧‧ compass

944‧‧‧Gyt

945‧‧‧Near Field Communication (NFC) unit

946‧‧‧ Thermal Sensor

950‧‧‧ WLAN unit

952‧‧‧Blue Unit

954‧‧‧ camera module

955‧‧‧GPS module

956‧‧‧WWAN unit

960‧‧‧Digital Signal Processor (DSP)

962‧‧‧Integrated Encoder/Decoder (CODEC) and Amplifier

963‧‧‧Output speakers

964‧‧‧ headphone jack

965‧‧‧Microphone

1 illustrates an embodiment of a block diagram of a computing system including a multi-core processor.

2 illustrates an embodiment of a block diagram of a computing system including a system agent that implements decentralized consistency and memory organization.

Figure 3 illustrates an embodiment of a system agent that includes decentralized and consistent memory organization.

4 illustrates additional details of the decentralized memory organization of FIG. 3 in accordance with one embodiment.

5 is a flow diagram illustrating operations and logic for forcing a memory access request sequencing of a virtual channel, in accordance with one embodiment.

6 is a flow diagram illustrating operations and logic for performing a conflict sorting operation, in accordance with one embodiment.

Figure 7 illustrates an embodiment of a block diagram of a system architecture in which various aspects of the embodiments disclosed herein are implemented.

Figure 8 illustrates an embodiment of a computing system on a wafer.

Figure 9 illustrates an embodiment of a block diagram of a computing system.

Detailed description of the preferred embodiment

In the following description, numerous specific details are set forth, such as specific types of processors and system combinations, specific hardware structures, specific architecture and micro-architectural details, specific register combinations, specific instruction types, specific system components, specific Examples of dimensions/height, specific processor pipeline levels, operations, etc., to provide a comprehensive understanding of the present invention. However, it will be apparent to those skilled in the art that <RTIgt; In other instances, well-known components or methods (such as specific and alternative processor architectures, specific logic/code for the described algorithms, specific firmware code, specific interconnect operations, specific logical combinations, Specific manufacturing techniques and materials, specific compiler implementations, specific expressions of algorithms in the code, specific power-offs and gating techniques/logic, and other specific operational details of the computer system are not described in detail to avoid unnecessarily making this The invention is ambiguous.

While the following embodiments may be described with reference to energy savings and energy efficiency in a particular integrated circuit, such as in a computing platform or microprocessor, other embodiments are applicable to other types of integrated circuits and logic devices. Similar techniques and teachings of the embodiments described herein are applicable to other types of circuits or semiconductor devices that may also benefit from better energy efficiency and energy savings. For example, the disclosed embodiments may be used in a device, such as, but not limited to, a handheld device, a tablet computer, Ultrabooks TM and other thin notebook computers, system-on-chip (SOC) device, a desktop computer system and Embedded applications. Some examples of handheld devices include cellular phones, internet protocol devices, digital cameras, personal digital assistants (PDAs), and handheld PCs. Embedded applications typically include microcontrollers, digital signal processors (DSPs), system single-chips, network computers (NetPCs), set-top boxes, network hubs, wide area network (WAN) switches, or as described below. Any other system that functions and operates. Moreover, the devices, methods, and systems described herein are not limited to physical computing devices, but may also relate to software optimization for energy savings and efficiency. As will be apparent from the description below, embodiments of the methods, devices, and systems described herein (whether based on hardware, firmware, software, or a combination thereof) are for the future of "green technology" that balances performance considerations. Words are essential.

As computing systems progress, the components become more complex. As a result, the complexity of the interconnect architecture used to couple components and communicate between components is also increased to ensure that the bandwidth requirements for optimal component operation are met. In addition, different market segments require different aspects of the interconnect architecture to meet market needs. For example, the server requires higher performance, and the mobile ecosystem Sometimes it is possible to sacrifice total performance in order to save power. However, providing the highest possible performance at the most power-saving situation is the sole purpose of most organizations. Numerous interconnects are discussed below that would potentially benefit from the aspects of the invention described herein.

figure 1

Referring to Figure 1, an embodiment of a block diagram of a computing system including a multi-core processor is depicted. The processor 100 includes any processor or processing device such as a microprocessor, an embedded processor, a digital signal processor (DSP), a network processor, a handheld processor, an application processor, a coprocessor, a system single A chip (SOC) or other device used to execute a code. In one embodiment, processor 100 includes at least two cores (cores 101 and 102), which may include an asymmetric core or a symmetric core (the illustrated embodiment). However, processor 100 can include any number of processing elements that can be symmetric or asymmetrical.

In one embodiment, a processing element refers to hardware or logic used to support a software thread. Examples of hardware processing elements include: threading units, thread slots, threads, processing units, contexts, context units, logical processors, hardware threads, cores, and/or capable of maintaining processor state (such as execution) Any other component of the state or architectural state). In other words, in one embodiment, a processing element refers to any hardware that can be independently associated with a code (such as a software thread, operating system, application, or other code). A physical processor (or processor socket) generally refers to an integrated circuit that potentially includes any number of other processing elements, such as a core or hardware thread.

The core often refers to logic on an integrated circuit that is capable of maintaining an independent architectural state, with each independently maintained architectural state associated with at least some dedicated execution resources. In contrast to the core, a hardware thread generally refers to any logic located on an integrated circuit that is capable of maintaining an independent architectural state, wherein the independently maintained architectural state shares access to the execution resources. As can be seen, when certain resources are shared and other resources are dedicated to an architectural state, the boundaries between the hardware thread and the core naming overlap. Often, however, core and hardware threads are treated by the operating system as individual logical processors, where the operating system is capable of individually scheduling operations on each logical processor.

As illustrated in FIG. 1, the physical processor 100 includes two cores, cores 101 and 102. Here, cores 101 and 102 are considered to be symmetric cores, ie, cores having the same composition, functional units, and/or logic. In another embodiment, core 101 includes an out-of-order processor core and core 102 includes an in-order processor core. However, cores 101 and 102 may be individually selected from any type of core, such as a native core, a software management core, a core adapted to execute a native instruction set architecture (ISA), and adapted to execute a translated instruction set architecture (ISA). The core, the core of collaborative design or other known cores. In a heterogeneous core environment (i.e., an asymmetric core), some form of translation (such as binary translation) can be utilized to schedule or execute code on one or both cores. However, for further discussion, the functional units illustrated in core 101 are described in more detail below, as in the depicted embodiment, the units in core 102 operate in a similar manner.

As depicted, the core 101 includes two hardware threads 101a and 101b, the hardware threads may also be referred to as hardware thread slots 101a and 101b. Thus, in one embodiment, a software entity, such as an operating system, potentially views processor 100 as four separate processors, i.e., four logical processors or processing elements capable of executing four software threads simultaneously. As indicated above, the first thread is associated with the architectural state register 101a, the second thread is associated with the architectural state register 101b, and the third thread is associated with the architectural state register 102a, and The four threads can be associated with the architectural state register 102b. Here, each of the architectural state registers (101a, 101b, 102a, and 102b) may be referred to as a processing element, a thread slot, or a threading unit, as described above. As illustrated, the architectural state register 101a is replicated in the architectural state register 101b so that the individual architectural states/contexts of the logical processor 101a and the logical processor 101b can be stored. In core 101, other smaller resources of threads 101a and 101b (such as instruction metrics and rename logic in allocator and renamer block 130) may also be replicated. Some resources, such as the reordering buffers in the reordering/retreating unit 135, the ILTB 120, the load/store buffers, and the queues, may be shared via partitioning. Potentially other resources are shared (such as the general internal register, the page table base register, the low level data cache and the data TLB 115, the execution unit 140, and the unordered unit 135).

Processor 100 often includes other resources that may be fully shared, shared via partitioning, or dedicated/dedicated by processing elements. In FIG. 1, an embodiment of a pure exemplary processor with an exemplary logical unit/resource of a processor is illustrated. It should be noted that the processor may include or omit any of these functional units, as well as any other known function list not depicted. Meta, logical or firmware. As illustrated, core 101 includes a simplified, representative out-of-order (OOO) processor core. However, an in-order processor can be utilized in different embodiments. The OOO core includes a branch target buffer 120 for predicting branches to be executed/adopted and an instruction translation buffer (I-TLB) 120 for storing address translation entries of the instructions.

The core 101 further includes a decoding module 125 coupled to the extraction unit 120 to decode the extracted elements. In one embodiment, the extraction logic includes separate sequencers associated with thread slots 101a, 101b, respectively. Typically, core 101 is associated with a first ISA that defines/specifies instructions that can be executed on processor 100. Machine code instructions that are part of the first ISA often include a portion (referred to as a job code) that references/specifies an instruction or operation to be executed. Decode logic 125 includes circuitry that recognizes such instructions from their job code and passes the decoded instructions in a pipeline for processing (as defined by the first ISA). For example, as discussed in greater detail below, in one embodiment, decoder 125 includes logic that is designed or adapted to recognize particular instructions, such as transaction instructions. As a result of being recognized by decoder 125, architecture or core 101 takes specific, predefined actions to perform the tasks associated with the appropriate instructions. It is noted that any of the tasks, blocks, operations, and methods described herein can be performed in response to a single or multiple instructions; some of the single or multiple instructions can be new or old instructions . It is noted that in one embodiment, decoder 126 recognizes the same ISA (or a subset thereof). Alternatively, in a heterogeneous core environment, decoder 126 identifies the second ISA (a subset of the first ISA or a distinct ISA).

In one example, the allocator and renamer block 130 includes A dispatcher for retaining resources (such as a scratchpad file for storing instruction processing results). However, threads 101a and 101b are potentially capable of out-of-order execution, where the allocator and renamer block 130 also retains other resources, such as a reordering buffer to track the results of the instructions. Unit 130 may also include a scratchpad renamer to rename the program/instruction reference register to other registers internal to processor 100. The reordering/retreating unit 135 includes components such as the reordering buffer, load buffer, and storage buffer mentioned above to support out-of-order execution and later orderly retirement of instructions for out-of-order execution.

In one embodiment, the scheduler and execution unit block 140 includes a scheduler unit for scheduling instructions/operations on the execution unit. For example, a floating point instruction is scheduled on an array of execution units having available floating point execution units. A scratchpad file associated with the execution unit is also included to store the information instruction processing results. Exemplary execution units include floating point execution units, integer execution units, jump execution units, load execution units, storage execution units, and other known execution units.

The lower level data cache memory and data translation buffer (D-TLB) 150 is coupled to the execution unit 140. The data cache memory is used to store recently used/operated elements (such as data operands) that are potentially maintained in a memory coherency state. The D-TLB is used to store recent virtual/linear to physical address translations. As a specific example, the processor can include a page table structure to decompose the physical memory into a plurality of virtual pages.

Here, cores 101 and 102 share a cache memory for a higher level or outer layer (such as a second level associated with on-wafer interface 110) Access to the cache memory. It should be noted that a higher level or more outer layer refers to a cache memory level that is added or further away from the execution unit. In one embodiment, the higher level cache memory is the last level data cache memory (the last cache memory in the memory hierarchy on processor 100), such as the second or third level data cache memory. body. However, the higher level cache memory is not limited thereto as it may be associated with the instruction cache or include instruction cache memory. Alternatively, trace cache memory (a type of instruction cache) can be coupled to decoder 125 to store the most recently decoded trace. Here, an instruction potentially refers to a macro instruction (ie, a general-purpose instruction recognized by a decoder) that can be decoded into a number of micro-instructions (micro-operations).

In the depicted assembly, processor 100 also includes an on-wafer interface module 110. Historically, memory controllers have been described in more detail below in computing systems external to processor 100. In this case, the on-chip interface 110 will be external to the processor 100 (such as the system memory 175, the chipset (often including the memory controller hub coupled to the memory 175 and the peripherals connected to the input/output) (I/O or IO) controller hub), memory controller hub, north bridge or other integrated circuit) communication. And in this case, the bus bar 105 can include any known interconnects, such as a multi-drop bus, a point-to-point interconnect, a tandem interconnect, a parallel bus, a consistent (eg, cache coherency) bus, and Layer protocol architecture, differential bus and GTL bus.

Memory 175 can be dedicated to processor 100 or shared with other devices in the system. Common examples of types of memory 175 include DRAM, SRAM, non-electrical memory (NV memory) and other known storage devices. It should be noted that the device 180 can include a graphics accelerator, a processor or card coupled to the memory controller hub, a data storage device coupled to the I/O controller hub, a wireless transceiver, a flash device, an audio controller, Network controller or other known device.

However, recently, as more logic and devices are integrated on a single die, such as implemented by a system single-chip (SoC), each of these devices can be incorporated on processor 100. For example, in one embodiment, the memory controller hub is on the same package and/or die as the processor 100. Here, a portion of the core 110 (the upper portion of the core) includes one or more controllers for interfacing with other devices, such as the memory 175 or the graphics device 180. The inclusion of interconnects and controllers for interfacing with such devices is often referred to as core (or non-core assembly). By way of example, in one embodiment, the on-wafer interface 110 includes a ring interconnect for on-wafer communication and a high speed serial point-to-point link 105 for off-chip communication. Alternatively, on-wafer communication may be facilitated by one or more exchange organizations having a grid type combination. However, in a SoC environment, even more devices such as a network interface, coprocessor, memory 175, graphics processor 180, and any other known computer device/interface can be integrated into a single die or integrated circuit. To provide a small form factor with high functionality and low power consumption.

In one embodiment, processor 100 can execute compiler, optimization, and/or translator code 177 to compile, translate, and/or optimize application code 176 to support the devices and methods described herein. Or interface with it. The compiler often includes the ability to translate source text/code into a target. Text/code program or a set of programs. Typically, the compiler compiles the program/application code in multiple stages and passes to transform the higher-level programming language code into a lower-order machine or a combined language code. Also, a single compiler can still be utilized for simple compilation. The compiler can utilize any known compilation technique and perform any known compiler operations such as lexical analysis, preprocessing, profiling, semantic analysis, code generation, code conversion, and code optimization.

Larger compilers often include multiple phases, but these phases are most often included in two general phases: (1) front-end, which usually involves syntax processing, semantic processing, and some transformation/optimization, and 2) The back end, that is, where analysis, transformation, optimization, and code generation usually occur. Some compilers involve an intermediate part that illustrates the blurring of the outline between the front end and the back end of the compiler. As a result, references to compiler insertion, association, generation, or other operations may occur in any of the foregoing stages or passes, as well as in any other known phase or pass of the compiler. As an illustrative example, a compiler potentially inserts operations, calls, functions, etc. in one or more phases of compilation, such as inserting a call/operation in a pre-compilation phase, and then transforming the call/operation during the transformation phase Into lower-level code. It should be noted that during dynamic compilation, the compiler code or dynamic optimization code can be inserted into such operations/calls, and the code is optimized for execution during the execution phase. As a specific illustrative example, the binary code (the compiled code) can be dynamically optimized during the execution phase. Here, the code may include a dynamic optimization code, a binary code, or a combination thereof.

Similar to compilers, translators such as binary translators Programmatically or dynamically translate code to optimize and/or translate code. Thus, reference to the execution of code, application code, code, or other software environment can mean: (1) dynamically or statically executing a compiler program, optimizing a code optimizer or translator to compile Code, maintain software structure, perform other operations, optimize code or translate code; (2) execute main code including operation/call, such as optimized/compiled application code; (3) execute Other code associated with the main code (such as a library) to maintain the software structure, perform other software-related operations or optimize the code; or (4) a combination thereof.

Referring now to Figure 2, a block diagram of an embodiment of a system architecture 200 is shown. In one embodiment, system architecture 200 corresponds to a SoC architecture that includes a multi-core processor. More generally, the components illustrated in Figure 2 can be implemented as one or more integrated devices, such as semiconductor wafers.

The system architecture 200 includes a plurality of processor cores 202 (depicted as 202-0...202-N) coupled to the coherency unit 204 of the system agent 206 via a coherent bus bar. The system agent 206 supports various system functions, including interfacing the cache agent and other non-cached IO agents in the core 202 to the memory controllers 208-0 and 208-1. (As used herein, the terms IO and I/O both refer to input/output and are used interchangeably.) As described in further detail below, in one embodiment, system agents 206 are assembled to implement decentralized Consistency and memory organization, including facilities provided by conformance unit 204, support consistent operations.

In one embodiment, the cache agent is associated with each "logical" processor core associated with processor core 202. For example, in Intel's® Hyperthreading TM architecture, each entity is implemented as two core logic core. In general, a cache agent can be associated with one or more logical processors and other entities that access consistent memory, such as a graphics engine or the like.

The system agent 206 is also coupled to the IO root complex 210 via a primary interface 209, which in turn is coupled to or integrated into the primary IO switching organization 212. The IO root complex 210 and the primary IO switch fabric 212 are implemented in a top level of a multi-level IO interconnect hierarchy that uses multiple switching organizations residing at different levels in the hierarchy, such as by an exchange organization Depicted in 214, 216 and 218. A given branch of the IO interconnect hierarchy may use the same type of switching organization (which uses a common architecture and agreement), or different types of switching organizations (which use different architectures and/or protocols). In the latter case, the bridge will typically be implemented to operate as an interface between one of the different architectures/agreements to the exchange organization, such as depicted by bridge 220. As used herein, an exchange organization may generally include interconnects, busbar structures, multi-dimensional grid organizations, or other known interconnect architectures that are associated to facilitate communication between components coupled to an exchange organization and associated Interface logic. IO exchange Organization / protocol may include (but are not limited to) fast Peripheral Component Interconnect (PCIe TM), open core protocol (OCP), the system tissue (the IOSF) on the wafer and advanced Intel microcontroller bus architecture (the AMBA) interconnection.

Various IO devices are coupled to various switching organizations in the IO hierarchy, wherein each IO device is configured to implement or be associated with an IO agent 222, as depicted by IO agents 222-1 through 222-M (also Marked as IO Agent 1-M). In the illustrated embodiment, each of these IO agents Contains non-cached agents. Typically, IO agents are assembled to perform operations (such as communication operations) on behalf of their associated devices, enabling the device to communicate with other components in the system. Thus, IO agents are often implemented in conjunction with communication interfaces and other types of interfaces, which may be depicted herein as separate components or may be integrated into other components such as devices, bridges, and switching organizations.

Under system architecture 200, system agent 206 implements the central portion of the consistency and memory interconnect organization in the system. The system agent implements a cache coherency protocol for requesting access to various agents of the memory and other devices in the system, a producer-consumer ordering rule specified by a stylized model of the architecture, an IO root complex function, and Quality of Service (QoS)-aware arbitration.

Under a typical architecture, the consistency and memory organization within the system agent is typically only directed to cache coherency request operations for distinct cache lines to ensure correct consistent behavior across all cache agents in the system. Therefore, the processing of the request must be suspended until the previous request for the same cache line as the currently requested cache line has been completed. These pause conditions are often referred to as address conflicts. Proper handling of address conflicts is a challenging problem for a number of reasons, including: a) ordering, with conflicting requests having some sorting requirements between themselves and also with unrelated newer requests; b Uncompromising performance, conflicts from one agent should not downgrade the performance and QoS of unrelated agents sharing the same interconnect link and conformance organization; c) distribution must be maintained across any decentralized pipeline in the organization The ordering of conflicts and related requests; and d) low cost, for optimal power constrained performance, the system resources needed to handle address conflicts should be kept as low as possible.

FIG. 3 shows additional details of one embodiment of a system agent 206a that includes decentralized and consistent memory organization. The system agent 206a includes a shared memory organization 302 that includes a common memory access request arbiter (arbiter) 304 and two memory organization pipelines 306-0 and 306-1 to schedule system memory. Requests for bodies 308-0 and 308-1 (accessed via memory controllers 208-0 and 208-1, respectively); shared consistency organization 310, which includes two coherent organizations for managing cache coherency Lines 312-0 and 312-1, a non-conformance engine to handle miscellaneous functions 314 (such as interrupts), and common snoop/response arbitration logic 316. The cache agent 202 presents its request directly to the arbiter 304. The IO agent 222 (not shown) issues its request via the IO root complex 210, which then issues a request to the arbitrator 304 on behalf of the IO agent.

To implement differentiated QoS, in one embodiment, both the IO organization and the primary interface 209 implement multiple virtual channels that share a single physical interconnect interface. For ease of design, in one embodiment, a typical system assembly can perform a common hardware shared by two memory tissue lines and two coherent tissue lines at twice the frequency of individual pipelines. In addition, multiple pipeline architectures are assembled to perform pipelined operations for servicing memory access requests in parallel. Although two consistency and memory pipelines are depicted in FIG. 2, it should be understood that the general teachings and principles disclosed herein can be extended to implement similar parallel operations across multiple pipelines.

The arbiter 304 arbitrates and grants a request each time, and routes the request to one of the two memory organization pipelines, such as by being applied to the information contained in the request (such as a cache line address). Miscellaneous Make up algorithm decision. The hash algorithm ensures that requests for the same cache line are always routed to the same memory organization pipeline. Because the arbiter 304 is the common entry point for both memory organization pipelines 306-0 and 306-1, the arbiter in a typical system can be executed at up to twice the frequency of the memory tissue pipeline.

FIG. 4 shows additional details of memory organization 302 in accordance with one embodiment. In addition to the arbiter 304 and the memory organization pipelines 306-0 and 306-1, the components and logic illustrated in FIG. 4 include cache proxy request queues 400-0 and 400-N, I/O root complex requests. Queue 401, "category" I/O conflict queue 402, conflict check logic blocks 404-0 and 404-1, cache agent conflict queues 406-0 and 406-1, scoreboard 407-0 and 407-1, conflict sorting block (COB) 408, flip-flops 410, 411, 412, 413, 415, and 417, reject IO request multiplexer (mux) 414, reject IO request demultiplexer 416, conflict 伫Column arbiter 418 and hash logic 420.

Each of the memory organization pipelines 306-0 and 306-1 implements a collision check to ensure that the entire system agent only operates for consistency requests for distinct cache lines. This situation is implemented via address matching logic to detect collisions with previously pending requests, and if there are any conflicts, it prevents further processing of the request. If there is no conflict, the memory organization pipeline notifies its corresponding consistency pipeline to perform the cache consistency operation. The memory tissue pipeline also records the request into a scoreboard 407 in the memory tissue pipeline for scheduling to system memory.

A request picked by the arbiter 304 but encountering a conflict condition (as described above) cannot be accepted. Instead, the requests are queued into many conflicts In one of the queues. The conflict queue is separate for the cache agent (i.e., cache agent conflict queues 406-0 and 406-1) and the I/O agent (i.e., by category I/O conflict queue 402). Each memory organization pipeline implements a single conflict queue intended for requests from all cache agents. For sorting and QoS reasons, the conflict queue for the I/O agent is maintained separately and implemented as a separate queue shared by the two pipelines.

Conflict queue for cache proxy

The cache proxy conflict queues 406-0 and 406-1 are intended to be dedicated to requests from the cache proxy. In general, the cache agent conflict queue can have different configurable depths that can be implemented between implementations. In one embodiment, the cache agent conflict queue retains one entry for each logical processor, and a shared pool to cover the depth of the pipeline from the arbiter 304 to the conflict check block 404 associated with the conflict queue. .

In one embodiment, each entry in the cache proxy conflict queue has the following attributes: Valid: Is the entry valid? ;is_mmio: Whether the request targets MMIO (memory mapped IO); conflict_btag: the label of the previous request with which this request conflicts; conflict_cleared: whether the conflict has been cleared for this entry. It will initially be set to 0 (with the exceptions noted below). When the previous request causing the conflict is retired, it will be set to 1; and all original request attributes associated with the memory access request received at the arbiter.

In one embodiment, in order to maintain program ordering, fast replacement The rules governing conflicts are as follows:

1. All requests to the same address are processed in the order they are granted by the arbiter 304. An address match check is performed for each entry request relative to each entry in the cache agent conflict queue and the scoreboard to which the request is routed to the memory organization pipeline.

2. If the incoming request is from a cache proxy: if there is an address matching the pending request, the request is queued to the cache proxy conflict queue (for the applicable memory organization pipeline) and will be used The previously requested tag is stored in the conflict_btag attribute and the conflict_cleared is set to zero. If there is a match with the previous entry in the conflict queue, but there is no match with the pending request, the new request is also queued to the conflict queue, but conflict_cleared is set to 1. This situation means that when this entry reaches the head of the conflict queue, re-arbitration is appropriate immediately.

I/O conflict queue

The system agent implements a number of I/O conflict queues 402 for the request from the I/O root complex (by category "I/O conflict queue"). The number of queues can vary depending on the system implementation. These conflicting queues ensure that locks are avoided and QoS guarantees are provided to agents that require such guarantees. The depth of each queue is expected to be required to cover the pipeline latency from the I/O root complex 210 to the conflict checking logic. If the request is determined to have a conflict, the request is placed in the appropriate I/O conflict category queue. In one embodiment, a mapping of each virtual channel (VC) to a conflicting queue class may be specified by a BIOS staging set of (other system firmware) (ie, each VC is assigned to a class) ). A given VC can belong to only one conflict category. System agent based on please The VC and the mapping provided by the grouped scratchpad determine the category to determine which I/O conflict queue to queue the request to.

The conflict category takes advantage of the observation that conflicts across I/O agents are rare events and are rare for certain types of I/O agents. Thus, in some embodiments, the system compresses virtual channels into the same conflict queue category, which are non-dependent (to ensure that locks are avoided) and are expected to cause conflicts only in rare cases and require similar QoS guarantee. This scheme allows multiple virtual channels to be compressed into relatively few categories for conflict checking purposes. In this case, the area overhead is reduced. The I/O conflict checking hardware only needs to deploy dedicated resources for each category, instead of deploying dedicated resources for each virtual channel, where the number of categories is less than the number of virtual channels. .

In one embodiment, each entry of the I/O conflict queue contains attributes similar to the conflict queues for the cache agent presented above. In the case of an I/O request, the original request attribute will include the VC associated with the I/O agent from which the request originated. In one embodiment, the rules for the I/O conflict queue are as follows:

1. All requests for the same address from a given VC are processed in the order they are granted by the arbiter 304.

2. Once a write request encounters a conflict and is queued into one of the category queues in the I/O conflict queue 402, all write requests from the same VC are processed in order, regardless of the address.

In the absence of Rule #2 above, producer-consumer rankings involving I/O agents may be violated. Consider the following sequence.

1. The I/O device generates data in the cacheable memory. Therefore, the CPU must be snooped before the write can be observed globally. However, this request encounters an address conflict in one of the memory organization pipelines, which can occur if the device performs several writes to the same address. The conflict will cause the request to be placed in the conflict queue.

2. The I/O device updates the cacheable memory. Assume that the flag update targets other memory organization pipelines and does not have conflicts and continues.

3. The CPU reads the flag. The CPU can get the updated value from step 2.

4. The CPU reads the data. Since the request from 1 has not been fully observed, the CPU may be reading expired data from its cache. This situation is a sort violation.

Thus, in one embodiment, even if consecutive I/O requests are targeted to different memory organization pipelines, request ordering from the same VC is maintained after the conflict. From the perspective of the I/O proxy request initiator, requests from I/O agents sent via the same virtual channel appear to be served in the order in which they are received in the decentralized memory organization. Referring to flowchart 500 of FIG. 5, in one embodiment, this situation is achieved as follows.

The processing of the I/O request (i.e., the memory access request originating from the IO proxy) begins at block 502 where an I/O request is received from the I/O root complex request queue 401 at the arbiter 304. And sending it to the conflict checking logic 404 of the appropriate memory organization pipeline 306-0 or 306-1 based on the result of the request hash algorithm. Two conditions apply when each incoming I/O root complex is requested (as described by the result of determining whether the VC has an existing conflicting decision block). painted):

A. Condition A, VC currently does not have a conflict: In block 506, conflict checking logic 404 waits for memory organization in the pool of previously accepted pending requests in scoreboard 407 and in cache proxy conflict queue 406 The pipeline 306 processes the request for both to perform an address match. As depicted by decision block 508 and block 510, if there is a conflict, the request is sent along with other attributes to the conflict sorting block (COB) to rank the request into the appropriate category conflicts in the correct age order. in. A status bit is also set to indicate that the VC has encountered a collision. If there is no conflict, the result of the conflicting address match check is sent to the COB along with the VC and age indication, as shown in block 512. The COB provides a final indication of whether the request is acceptable to the pipeline, as depicted by decision block 514. If the request cannot be accepted, then in block 516, the COB will queue the request into the appropriate category I/O conflict queue 402. A status bit is also set to indicate that the VC has encountered a collision. If the request is admissible, it is forwarded to the applicable system memory queue for further processing of the requested memory transaction, as depicted by block 518.

B. Condition B, VC already has a conflict: as determined by decision block 504, if the VC already has a conflict, the conflict checking logic bypasses the address match and in block 510 along with the other attributes Sent to the COB to queue requests into their category conflicts in the correct age order.

During each cycle, the COB uses the age token of the request sent to it by the two pipelines and determines which one is older and which one is newer. In one embodiment, the age token is recorded by arbitration with each request. The 2-bit counter value passed by the device 304. The arbiter increments the counter each time it grants a request. It passes the token value along with the remainder of the request attribute to the applicable memory organization pipeline 306. If the arbitration is sufficient, the counter can overflow. Therefore, the order decision logic in the COB is assembled to handle the overflow condition.

For IO proxy requests, the COB provides a final notification of whether a request to accept the memory organization pipeline is available to the memory organization pipeline. The operations and logic for performing COB operations and related operations are illustrated in flowchart 600 of FIG. As depicted at the top of flowchart 600, operations and logic are performed in parallel for each of the memory organization pipelines. During each cycle, a pair of memory requests will advance to the same location in each pipeline. In this case, one of the IO agents originating from the memory access request is processed in parallel, thus supporting simultaneous processing of a plurality of memory access requests.

As depicted in blocks 602-0 and 602-1, the conflict checking logic in each of pipelines 0 and 1 performs an address match with respect to the pool of previously accepted pending requests for which the service is yet to be completed. As discussed above, the identification codes for such requests are maintained by the scoreboard 407 for each pipeline. Passing the attribute of each request including the conflict_status bit indicating the result of the address conflict check to block 604, wherein the conflict_status value is checked, and the relative ages of the two requests are determined based on the age token, resulting in an older and better identification New request.

In decision block 606, it is determined whether both requests have no conflicts. If the answer is yes, then the logic proceeds to block 608 where the two pipelines are signaled by the COB to indicate that their request is acceptable. therefore, At each pipeline, a request associated with the pipeline is added to the pipeline's admission request pool and the pipeline's scoreboard is updated.

Next, if the answer to decision block 606 is no, then in decision block 610 it is determined if the older request does not have a conflict and the newer request has a conflict. If the answer is yes, then the logic proceeds to block 612 where the COB signals the pipeline associated with the request indicating that the request can be accepted. The request is then added to the admission request pool of the pipeline and the scoreboard for the pipeline is updated. In block 614, the conflicting newer request is queued into the per-class I/O conflict queue to which its VC is mapped. The COB also notifies the pipeline associated with the older request that future requests sent via the same VC as for the newer request will be queued into the Category I/O conflict queue of the VC until acceptance during the subsequent processing period. New request.

If the answer to decision block 610 is no, then in decision block 618 it is determined if the older request has a conflict and two requests have been sent by the same VC. If the answer is yes, then the logic proceeds to block 620 where the COB signals the two pipelines to indicate that their request could not be accepted. The request is then placed in the rank-by-category I/O conflict queue assigned to the VC in chronological order (i.e., the older request, followed by the newer request). The COB also notifies the two pipelines that future requests sent via the same VC will be queued into the Category I/O conflict queue of the VC until an older request is accepted during subsequent processing.

If the answer to decision block 618 is no, then it is determined in decision block 622 whether the older request has a conflict and the request has been sent by a different VC. If the answer is yes, then the logic proceeds to block 624 where the COB signals the pipeline of the older request: its request cannot be accepted, and the letter is used No. Notify the pipeline of newer requests: its request can be accepted. The COB also notifies the pipeline associated with the newer request: future requests sent via the same VC as for the older request will be queued into the Category I/O conflict queue of the VC until the older one is accepted during subsequent processing The request is as depicted in block 626.

In the flowchart 600, it should be noted that although some operations are depicted in order, this is for the purpose of explanation only and is not intended to be limiting. Rather, various operations can be performed in parallel. For example, each of the decisions associated with decision blocks 606, 610, 618, and 622 can be performed in parallel. Similarly, the operations performed in blocks 612, 614, and 616 and the operations performed in blocks 624 and 626 can be performed in parallel.

As described above, the system agent will queue each request from the same VC after the collision into the conflict queue in the original request order. In effect, this results in line end blocking (HOL) for each VC after the collision, but results in a simpler microarchitecture. To mitigate performance degradation due to HOL blocking, the system agent uses a request combination solution for certain types of requests. For example, a non-peep request can be combined under the following conditions: a. the requests are all from the same VC; b. the requests are all read or all written; and c. can only be combined for the same 32-bit A maximum of N requests for chunks. The next request will be tagged as having a conflict. (N is the number of combinations that can be implemented depending on the implementation).

Self-conflict re-arbitration

Once the conflict condition has been cleared, the queue from the cache proxy And the request for both I/O conflicts can be returned to the arbitrator for re-arbitration. When a request retire from a system agent, the request broadcasts its label to all entries in all conflict queues. If the tag matches an entry, the tag clears the request in the conflicting queue entry for re-arbitration. In one embodiment, the simple loop arbiter selects among the header requests in all conflict queues. The request is re-arbitrated by the arbitrator at the highest priority.

Anti-depletion: To avoid exhaustion, the conflict checking logic continues to flag the conflict until it sees the original request for re-arbitration from the conflict queue. In one embodiment, this is done by additional bookkeeping logic within the scoreboard 407 in the memory organization pipeline as follows.

‧ When the conflict is queued back to the arbitrator for re-arbitration, the bit called "prior_conflict" is validated and sent along with the request to the memory organization pipeline for address match detection.

‧ When the request clears the subsequent conflict and retire, it confirms the bit in the scoreboard called "conflict_block".

‧ "new_conflict" with revoked confirmation but any new request with an address match with a scoreboard entry with a validated conflict_block will be flagged as a conflict and placed in the conflict queue. The "conflict_cleared" bit will be set to indicate that the request is suitable for re-arbitration when the request reaches the head of its conflict queue.

‧ Any new request with a validated "prior_conflict" but with an address match for a scoreboard entry with a validated conflict_block will not be flagged as a conflict. The new request will be assigned to the same scoreboard entry and the "conflict_block" bit will be cleared.

Conflict queue process control

Conflicting queue for the cache agent: Flow control for the cache agent conflict queues 406-0 and 406-1 is fully managed between the arbiter 304 and the memory organization pipelines 306-0 and 306-1. Each memory organization pipeline announces both the reserved credit of each logical processor and the credit of the shared pool. The arbiter 304 may grant the request only if the request from the cache agent has a credit (reserved credit or shared credit) for the conflict queue in the hash pipeline. The arbiter consumes credit when granting a request. The memory organization pipeline returns the credit when the conflict check passes without conflict or when the request is removed from its cache agent conflict queue 406.

I/O Agent Conflicts: The flow control for the IO conflict queue 402 is managed by the I/O Root Complex 210. The I/O Root Complex maintains a credit counter for each of the categories of conflicting queues. The credit of each IO conflict queue 402 is initialized and subsequently exchanged with memory organization pipelines 306-0 and 306-1. The I/O root complex consumes credit before initiating a request for a conflicting queue category. The memory organization pipeline returns credits when it is detected that the request does not have a conflict or when the request is removed from the category conflict queue.

Aspects of the embodiments described and illustrated herein may be implemented in various system architectures as described above. By way of example and not limitation, FIG. 7 shows a system 700 of an embodiment embodiment. System 700 includes a SoC 702 that is mounted on or otherwise coupled to motherboard 704 in chassis 706. The SoC includes a multi-core processor that uses architectural aspects and logic similar to the architectural aspects and logic illustrated in Figures 2 through 6 and described above, wherein like components share common reference numerals. These component packages The cores 202-0 to 202-N, the system agent 206a including the decentralized consistency and memory organization and consistency unit 204, the primary interface 209, the IO root complex 210, and the primary IO exchange organization 212 are included.

In one embodiment, the decentralized consistency and memory organization includes the pipeline architecture illustrated in Figures 3 and 4, as described above. In addition, system agents can be combined to facilitate various other operations, such as interfacing with other components. In addition to the memory controllers 208-0 and 208-1, these components include an interface (not shown) to a graphics processing unit (GPU) 708 and are coupled to a mass storage device 712 that includes flash memory. Flash controller 710. GPU 708 is assembled to interface to display 714, such as an LCD type display. In some embodiments, display 714 includes a touch screen display, and the SoC includes other circuitry for facilitating touch screen operation (see other details below with respect to FIG. 9).

Memory controllers 208-0 and 208-1 are coupled to memories 716-0 and 716-1, respectively, which collectively comprise the system memory of system 700. In general, memory controllers 208-0 and 208-1 can be integrated on SoC 702 (as shown), and can be implemented as separate components or integrated into memory 716 off-chip (ie, separate from SoC 702). -0 and 716-1. Similarly, GPU 708 can be integrated on SoC 702 or include off-chip components.

As described above, the primary IO switching organization 212 is located at the top of the IO interconnect hierarchy that includes two or more switching organizations. For convenience and clarity, a portion of the interconnect hierarchy depicted below the right hand side of the primary IO switching fabric 212 is labeled 718 and includes an IO interconnect sub-hierarchy that includes various IO devices and IOs. Agent coupled to one of Or multiple exchange organizations. These include IO Agents (IOAs) implemented as interfaces labeled IF in the block.

Depicted below the left hand side of the primary IO exchange organization 212 is a PCIe root complex 720 that includes a pair of PCIe roots 722 and 724. The PCIe root port 722 facilitates communication with the IEEE 802.11 (also referred to as "WiFi") interface 726 on the wafer coupled to the WiFi radio chip 728 mounted on the motherboard 704 (via PCIe protocol using PCIe protocol). Similarly, PCIe root 724 facilitates communication with a universal serial bus (USB) 2 or USB 3 interface 730 that is coupled to USB 2 / USB 3 interface chip 734 on motherboard 704. The WiFi radio chip 728 is coupled to the antenna 736, and the USB2/USB3 interface chip 734 is coupled to the USB2/USB3埠738.

As depicted near the bottom of FIG. 7, system 700 is illustrated as having a system architecture that can be implemented in a device such as, but not limited to, mobile phone 740, tablet 742, and a portable computer (eg, a notebook computer, laptop or Ultrabook TM) 744. In addition to the components illustrated in system 700, those skilled in the art will recognize that other components will typically be included in a particular device, such as a mobile radio subsystem for a mobile phone, a keyboard for a portable computer, and the like. . In addition, each device will use a power subsystem, power management logic (eg, implemented on SoC 702), and typically can support other types of communication ports, such as Thunderbolt (TM) , external display (eg, HDMI, DVI, Small 埠 or display 埠), Ethernet 埠, etc.

During initialization operations in response to a power-on event or reset, firmware such as depicted by BIOS 746 is loaded into system memory. The protected portion is used to initialize and assemble various system components, including system agents and IO interconnect layer switching organizations, bridges, and interfaces. As used herein, various endpoint components or devices are operatively coupled to other system components via the use of switching organizations, bridges, interfaces, and IO agents, and using corresponding protocols for a particular interconnect architecture. These interconnect structures and protocols facilitate virtual connections between components during operation of SoC 702 and system 700.

Turning now to Figure 8, an embodiment of a system single wafer (SOC) design in accordance with the present invention is depicted. As a specific illustrative example, SOC 800 is included in a User Equipment (UE). In one embodiment, a UE refers to any device used by an end user to communicate, such as a hand-held phone, a smart phone, a tablet, an ultra-thin notebook, a notebook with a wideband adapter, or any other Similar to communication devices. The UE is often connected to a base station or node, which potentially corresponds essentially to a mobile station (MS) in the GSM network.

Here, the SOC 800 includes two cores, 806 and 807. Similar to the discussion above, the core 806 and 807 may conform to the instruction set architecture, such as based on the Intel® Architecture Core TM processors, Advanced Micro Systems Corporation (Advanced Micro Devices, Inc., AMD ) processor, based on the MIPS processor , ARM-based processor design or its customers, and the processor design of its users or adopters. Cores 806 and 807 are coupled to cache memory control 808 associated with bus interface unit 809 and L2 cache 810 to communicate with other portions of system 800. Interconnect 810 includes on-wafer interconnects (such as IOSF, AMBA, or other interconnects discussed above) that potentially implement one or more aspects of the described invention.

The interface 810 provides a communication channel to other components, such as a SIM 830 for interfacing with a User Identity Identification Module (SIM) card, for holding a boot code for core 806 and 807 to initialize and power up the SOC 800. The boot ROM 835, the SDRAM controller 840 for interfacing with an external memory (for example, the DRAM 860), the flash controller 845 for interfacing with the non-electric memory (for example, the flash 865), The Q1650 (eg, serial peripheral interface), the video codec 820 and the video interface 825 for displaying and receiving input (eg, touch input) are connected to peripheral devices to perform graphics related calculations. GPU 815 and more. Any of these interfaces may incorporate aspects of the invention as described herein.

In addition, the system exemplifies peripheral devices for communication, such as Bluetooth module 870, 3G modem 875, GPS 885, and WiFi 885. Note that as stated above, the UE includes a radio for communication. As a result, all such peripheral communication modules are not required. However, some form of radio for external communication will be included in the UE.

It should be noted that the apparatus, methods, and systems described above can be implemented in any of the electronic devices or systems as described above. As a specific illustration, the following figures provide an exemplary system for utilizing the present invention, as described herein. As the following systems are described in more detail, several different interconnections are revisited, described, and re-mentioned from the above discussion. And as is apparent, the progress described above can be applied to any of their interconnections, organizations, or architectures.

Referring now to Figure 9, a block diagram of components present in a computer system in accordance with an embodiment of the present invention is illustrated. As shown in Figure 9, system 900 includes Any combination of components. Such components can be implemented as an IC suitable for use in a computer system, a portion thereof, a discrete electronic device or other module, logic, hardware, software, firmware, or a combination thereof, or otherwise embodied in a computer system. The components inside the chassis. It should also be noted that the block diagram of Figure 9 is intended to show a high-level view of many of the components of a computer system. However, it should be understood that some of the illustrated components may be omitted, additional components may be presented, and different configurations of the components shown may occur in other implementations. As a result, the invention described above can be implemented in any part of one or more of the interconnections exemplified or described below.

As seen in FIG. 9, in one embodiment, processor 910 includes a microprocessor, a multi-core processor, a multi-thread processor, an ultra low voltage processor, an embedded processor, or other known processing elements. In the illustrated implementation, processor 910 acts as a primary processing unit and central hub for communicating with many of the various components of system 900. As an example, processor 900 is implemented as a system single chip (SoC). As a specific illustrative example, processor 910 based on Intel® Architecture Core TM including the processors (such as, i3, i5, i7 or available from Intel Corporation (Santa Clara, CA) of another such processor). However, it should be understood that other low power processors (such as Advanced Micro Devices (AMD) available from Sunnyvale, CA), MIPS based designs from MIPS Technologies, Sunnyvale, CA, provided by ARM Holdings, Ltd. The design of the ARM-based design of the usage rights or its customers, or their users or adopters, may instead exist in other embodiments, such as an Apple A5/A6 processor, a Qualcomm Snapdragon processor, or a TI OMAP processor. It should be noted that many of the client versions of such processors are modified and changed; however, they can support or recognize a particular set of instructions that execute a defined algorithm as stated by the processor licensor. Here, the microarchitecture implementation can vary, but the architectural capabilities of the processor are generally consistent. In one implementation, certain details regarding the architecture and operation of processor 910 are discussed further below to provide illustrative examples.

In one embodiment, processor 910 is in communication with system memory 915. As an illustrative example, system memory can be implemented in embodiments by a plurality of memory devices to provide a given amount of system memory. As an example, the memory may be based on the JEDEC Low Power Double Data Rate (LPDDR) design, such as the current LPDDR2 standard according to JEDEC JESD 209-2E (published in April 2009), or will be Called the next-generation LPDDR standard called LPDDR3 or LPDDR4, the next-generation LPDDR standard will provide an extension to LPDDR2 to increase bandwidth. In various implementations, individual memory devices can have different package types, such as single die package (SDP), dual die package (DDP), or four die package (Q17P). In some embodiments, the devices are soldered directly to the motherboard to provide a thinner solution, while in other embodiments, the devices are assembled into one or more memory modules, the memory modules It is also coupled to the motherboard by a given connector. Of course, other memory implementations are possible, such as other types of memory modules, such as different types of dual in-line memory modules (DIMMs), including but not limited to microDIMMs, MiniDIMMs. In a particular exemplary embodiment, the memory is sized between 2GB and 16GB and can be assembled as a DDR3LM package or LPDDR2 or LPDDR3 memory soldered to the motherboard via a ball grid array (BGA). .

In order to provide continuous storage of information such as data, applications, one or more operating systems, and the like, the mass storage device 920 can also be coupled to the processor 910. In various embodiments, to achieve a thinner and lighter system design and improved system responsiveness, the mass storage device can be implemented via an SSD. However, in other embodiments, the mass storage device can be implemented primarily using a hard disk drive (HDD) having a small amount of SSD storage to act as an SSD cache memory to enable non-power during a power down event. Scenarios store contextual status and other such information so that fast power-on can occur when system activity is re-initiated. Also shown in FIG. 9, flash device 922 can be coupled to processor 910, for example, via a serial peripheral interface (SPI). This flash device provides system software (including basic input/output software (BIOS)) and non-electrical storage of other firmware of the system.

In various embodiments, the mass storage device of the system is implemented solely by the SSD or as a disk drive, optical disk drive, or other disk drive having SSD cache memory. In some embodiments, the mass storage device is implemented as an SSD or an HDD along with a restore (RST) cache memory module. In various implementations, the HDD provides between 320 GB and 4 terabytes (TB) and greater storage, while the RST cache memory system is implemented with SSDs having a capacity of 24 GB to 256 GB. It should be noted that this SSD cache memory can be configured as a single level cache memory (SLC) or multi-level cache memory (MLC) option to provide appropriate level of responsiveness. In the SSD only option, the modules can be housed in a variety of locations, such as in mSATA or NGFF slots. As an example, an SSD has a capacity ranging between 120 GB and 1 TB.

Various input/output (IO) devices may be present within system 900. In particular, the display 924 is shown in the embodiment of Figure 9, which may be a high definition LCD or LED panel that is assembled into the cover portion of the chassis. The display panel can also provide a touch screen 925 (eg, adapted to the outside of the display panel) such that user interaction with the touch screen can provide user input to the system to allow for desired operation, such as About the display of information, access to information, and so on. In one embodiment, display 924 can be coupled to processor 910 via a display interconnect that can be implemented as a high performance graphics interconnect. Touch screen 925 can be coupled to processor 910 via another interconnect that can be an I 2 C interconnect in an embodiment. As further shown in FIG. 9, in addition to the touch screen 925, user input by touch can also occur via the touchpad 930, which can be assembled into the chassis, and It can also be coupled to the same I 2 C interconnection as the touch screen 925.

The display panel can operate in multiple modes. In the first mode, the display panel can be configured in a transparent state, wherein the display panel is transparent to visible light. In various embodiments, most display panels can be displays, except for the bezel around the perimeter. When operating in the notebook mode and the display is operating in a transparent state, the user can view the information present on the display panel and also view the objects behind the display. Additionally, the information displayed on the display panel can be viewed by a user positioned behind the display. Alternatively, the operational state of the display panel may be an opaque state in which visible light is not transmitted through the display panel.

In the tablet mode, the system is folded closed such that when the bottom surface of the base panel rests on the surface or is held by the user, the back display surface of the display panel rests in a position such that it faces outward toward the user. In the tablet mode of operation, the back display surface performs the role of the display and the user interface because the surface can have touch screen functionality and can execute other conventional touch screen devices such as tablet devices. Known features. To this end, the display panel may include a transparency adjustment layer disposed between the touch screen layer and the front display surface. In some embodiments, the transparency adjustment layer can be an electrochromic layer (EC), an LCD layer, or a combination of EC and LCD layers.

In various embodiments, the display can have different sizes, such as a 11.6" or 13.3" screen, and can have a 16:9 aspect ratio, and a brightness of at least 300 nits. In addition, the display can have full high definition (HD) resolution (at least 1920 x 1080p), is compatible with embedded display (eDP), and is a low power panel with panel renewed.

Regarding the touch screen capability, the system can provide a display multi-touch panel that is multi-touch capacitive and can use at least 5 fingers. And in some embodiments, 10 fingers can be used on the display. In one embodiment, the touch screen housed in damage resistance for achieving the low friction and abrasion of the glass and coating (e.g., Gorilla Glass TM or Gorilla Glass 2 TM) inside, to reduce the "finger burning" and avoid " Finger jumping." In order to provide an enhanced touch experience and responsiveness, in some implementations, the touch panel has multi-touch functionality, such as less than 2 frames per static view (30 Hz) during pinch zoom, and Single touch functionality of less than 1 cm (30 Hz) per frame in 200 ms (finger to index lag). In some implementations, the display supports edge-to-edge glass with a minimum screen bezel that is also flush with the panel surface, and limited IO interference when using multi-touch.

Various sensors may be present within the system for perceptual computing and other purposes, which may be coupled to the processor 910 in different manners. Certain inertial and environmental sensors may be coupled to the processor 910 via a sensor hub 940 (eg, via an I 2 C interconnect). In the embodiment shown in FIG. 9, the sensors may include an accelerometer 941, an ambient light sensor (ALS) 942, a compass 943, and a gyroscope 944. Other environmental sensors may include one or more thermal sensors 946 that are coupled to the processor 910 via a system management bus (SMBus) bus bar in some embodiments.

Many different conditions of use can be achieved through the use of various inertial and environmental sensors present in the platform. These usage conditions allow for the implementation of advanced computational operations including perceptual calculations, and also allow for enhancements in power management/battery life, safety, and system responsiveness.

For example, regarding power management/battery life issues, ambient light conditions in the location of the platform are determined based at least in part on information from the ambient light sensor, and the intensity of the display is controlled accordingly. Therefore, the power consumed in operating the display is reduced under certain light conditions.

Regarding the security operation, based on context information such as location information obtained from the sensor, it can be determined whether the user is allowed to access certain security files. For example, the user may be permitted to access such files at the work location or at home. However, the user is prevented from accessing such files when the platform is present at a public location. In one embodiment, this determination is based on location information determined, for example, via a GPS sensor or camera identification of the landmark. Other security operations may include providing pairing of devices within close proximity to each other, for example, a portable platform and a user as described herein Desktop computers, mobile phones, and more. In some implementations, some sharing is achieved via near field communication when such devices are so paired. However, this share can be deactivated when the device is outside a certain range. Moreover, when paired with a platform and a smart phone as described herein, the alerts can be configured to trigger when the device moves past a predetermined distance apart from each other when in a public location. In contrast, when such paired devices are in a secure location (eg, at a work location or at home), the device may exceed this predetermined limit without triggering the alert.

Sensor information can also be used to enhance responsiveness. For example, the sensor can be enabled to perform at a relatively low frequency even when the platform is in a low power state. Thus, any change in the position of the platform is determined, for example, as determined by an inertial sensor, a GPS sensor, or the like. If these changes have not yet registered, it occurs prior to the wireless hub (such as, Wi-Fi TM wireless access point or the like is energized) faster connection, because in this situation does not require scanning to find available wireless network resources . Therefore, a greater level of responsiveness when waking up from a low power state is achieved.

It should be understood that many other conditions of use may be implemented using sensor information obtained via an integrated sensor within a platform as described herein, and the above examples are for illustrative purposes only. By using a system as described herein, the perceptual computing system can allow for the addition of alternate input modalities (including gesture recognition) and enables the system to sense user operations and intent.

In some embodiments, there may be one or more infrared or other thermal sensing elements or any other element for sensing the presence or movement of the user. Such sensing elements can include a plurality of different elements that work together, work in sequence, or operate in either of these ways. For example, the sensing element includes Initial sensing (such as light or sound projection) is provided followed by elements for sensing of gesture detection by, for example, an ultrasonic time-of-flight camera or a patterned light camera.

Again, in some embodiments, the system includes a light generator to generate an illumination line. In some embodiments, this line provides a visual cue about the virtual boundary (ie, the imaginary or virtual position in space), where the user's action of crossing or breaking through the virtual boundary or plane is interpreted as the intent to use the computing system. . In some embodiments, the illumination lines can change color as the computing system transitions to a different state with respect to the user. The illumination line can be used to provide the user with a visual cue of the virtual boundary in the space, and can be used by the system to determine a change in the state of the user's computer, including determining when the user wishes to use the computer.

In some embodiments, the computer senses the user's position and operates to interpret the movement of the user's hand through the virtual boundary as a gesture indicating the user's intention to use the computer. In some embodiments, the light produced by the light generator can be changed as the user passes through the virtual line or plane, thereby providing visual feedback to the user having entered the area for providing a gesture to provide input to the computer. To the user.

The display screen provides a visual indication of the transition of the user's computing system status. In some embodiments, a first screen is provided in the first state, wherein the presence of the user is sensed by the system, such as via the use of one or more of the sensing elements.

In some implementations, the system acts to sense the identity of the user, such as by facial recognition. Here, it can be provided to the second screen in the second state The transition in which the computing system has identified the user identity, wherein the second screen provides visual feedback to the user that the user has transitioned to the new state. A transition to the third picture may occur in the third state, wherein the user has confirmed the identification of the user.

In some embodiments, the computing system can use a transition mechanism to determine the location of the virtual boundary for the user, where the location of the virtual boundary can vary with the user and context. The computing system can generate light, such as an illumination line, to indicate a virtual boundary for use of the system. In some embodiments, the computing system can be in a wait state and can produce light in a first color. The computing system can detect if the user has reached a virtual boundary, such as by sensing the presence and movement of the user by using the sensing element.

In some embodiments, if the user has been detected as having crossed the virtual boundary (such as the user's hand is closer to the computing system than the virtual boundary line), the computing system can transition to receiving the gesture from the user. The state of the input, wherein the mechanism to indicate the transition may include changing the light of the virtual boundary to a second color.

In some embodiments, the computing system can then determine if a gesture movement is detected. If a gesture movement is detected, the computing system can continue with the gesture recognition process, which can include using data from the gesture database, which can reside in the memory in the computing device, or can otherwise Accessed by the computing device.

If the user's gesture is recognized, the computing system can perform a function in response to the input and return to receiving the additional gesture if the user is within the virtual boundary. In some embodiments, if the gesture is not recognized, the calculation system The system can be converted into an error state, wherein the mechanism for indicating the error state can include changing the light of the virtual boundary to a third color, wherein if the user is within the virtual boundary for using the computing system, the system returns to receiving the additional gesture .

As mentioned above, in other embodiments, the system can be configured as a convertible tablet system that can be used in at least two different modes: tablet mode and notebook mode. The convertible system can have two panels, namely a display panel and a base panel, such that in the tablet mode, the two panels are placed to be stacked on one another. In tablet mode, the display panel faces outward and provides touch screen functionality as found in conventional tablets. In the notebook mode, the two panels can be configured in an open 蛤売 group.

In various embodiments, the accelerometer can be a 3-axis accelerometer having a data rate of at least 50 Hz. A gyroscope can also be included, which can be a 3-axis gyroscope. Additionally, an electronic compass/magnetometer may be present. Also, one or more proximity sensors may be provided (eg, opening the cover to sense a person approaching (or not approaching) the system, and adjusting power/performance to extend battery life). For some OSs, sensor fusion capabilities including accelerometers, gyroscopes, and compasses provide enhanced features. Additionally, via a sensor hub with instant clock (RTC), a sensor wake-up mechanism can be implemented to receive the sensor input while the rest of the system is in a low power state.

In some embodiments, the internal lid/display opens a switch or sensor to indicate when the lid is closed/opened and can be used to place the system in a stand-by for use, or to automatically wake up from a connection inactive state. Other system sensors can be packaged An ACPI sensor is included for internal processor, memory, and skin temperature monitoring to enable changes in processor and system operational status based on sensed parameters.

In an embodiment, the OS may be a Microsoft® Windows® 8 OS that implements connection inactivity (also referred to herein as Win8 CS). Another OS that is inactive or has a similar state can provide a very low super-idle power via a platform as described herein to enable an application to remain connected to, for example, cloud-based at very low power consumption. position. The platform can support 3 power states, ie the screen is on (normal); the connection is inactive (as a preset "off" state); and the shutdown (zero watts of power consumption). Therefore, in the connection inactive state, the platform is logically turned on (at the minimum power level) even if the screen is off. In this platform, power management can be made transparent to applications and maintain constant connectivity, in part due to offloading techniques that enable the lowest power components to perform operations.

As also seen in FIG. 9, various peripheral devices can be coupled to the processor 910 via a low pin count (LPC) interconnect. In the illustrated embodiment, various components can be coupled via embedded controller 935. Such components can include a keyboard 936 (e.g., coupled via a PS2 interface), a fan 937, and a thermal sensor 939. In some embodiments, the touchpad 930 can also be coupled to the EC 935 via a PS2 interface. In addition, a security processor such as the TPM 938 based on the Trusted Computing Group (TCG) Trusted Platform Module (TPM) Specification Version 1.2 (date October 2, 2003) may also pass this LPC. The interconnect is coupled to the processor 910. However, it should be understood that the scope of the present invention is not limited in this respect, and the safe handling and storage of safety information may be in another In a protected location, such as static random access memory (SRAM) in a secure coprocessor, or as encrypted data that is only decrypted when protected by a secure enclave (SE) processor mode Two-digit large objects.

In a particular implementation, the peripheral ports may include high definition media interface (HDMI) connectors (which may have different form factors, such as full size, small size or miniature); one or more USB ports, such as according to a universal serial confluence A full-size external port of the Rev. 3.0 specification (November 2008) in which at least one USB port is powered for use with a USB device when the system is in a standby state and plugged into an AC wall power source (such as Smart phone) charging. Additionally, one or more Thunderbolt (TM) cartridges may be provided. Other ports may include externally accessed card readers, such as full size SD-XC card readers and/or SIM card readers for WWAN (eg, 8-pin card readers). For audio, there may be a 3.5mm jack with stereo and microphone capabilities (eg, combined functionality) that support socket detection (eg, only headphones that use a microphone in the cover, or a microphone in the cable) Headphones). In some embodiments, this jack can reassign tasks between the stereo headset and the stereo microphone input. Also, a power outlet for coupling to the AC transformer can be provided.

System 900 can communicate with external devices in a variety of ways, including wireless. In the embodiment shown in Figure 9, there are various wireless modules, each of which may correspond to a radio that is configured for a particular wireless communication protocol. One way for short-range (such as near-field) wireless communication may be via a near field communication (NFC) unit 945, which in one embodiment may be via The SMBus communicates with the processor 910. It should be noted that via this NFC unit 945, devices that are in close proximity to each other can communicate. For example, a user can enable system 900 to carry with another (eg,) by adapting the two devices to be close to each other and enabling information to be transmitted, such as identification information, payment information, materials such as image data, and the like. A type of device (such as a user's smart phone) communicates. The NFC system can also be used to perform wireless power transfer.

By using the NFC unit described herein, the user can cause the devices to collide side to side and place the devices side to side by utilizing the coupling between the coils of one or more of the devices. Near field coupling functions (such as near field communication and wireless power transfer (WPT)). More specifically, embodiments provide a device with strategically shaped and placed ferrite material to provide better coupling of the coils. Each coil has an inductance associated therewith that can be selected in conjunction with the resistance, capacitance, and other characteristics of the system to allow for achieving a common resonant frequency of the system.

As further seen in FIG. 9, the additional wireless unit can include other short range wireless engines, including WLAN unit 950 and Bluetooth unit 952. By using WLAN unit 950, may be implemented (IEEE) 802.11 standards Wi-Fi TM communication according to a given Institute of Electrical and Electronics Engineers, via the Bluetooth unit 952, short-range communication may occur via the Bluetooth protocol. These units can communicate with the processor 910 via, for example, a USB link or a Universal Asynchronous Receiver Transmitter (UART) link. Alternatively, these units can be quickly according to the peripheral component interconnect TM (PCIe TM) agreement, such as the basic specification version 3.0 (published January 17, 2007) in accordance with specifications such as PCI TM fast serial data input / output (SDIO) standard Another such agreement is coupled to the processor 910 via an interconnect. Of course, the actual physical connection between such peripheral devices that can be assembled on one or more add-on cards can be performed by an NGFF connector that is mated to the motherboard.

In addition, wireless wide area communication, such as in a cellular or other wireless wide area protocol, may occur via WWAN unit 956, which in turn may be coupled to a User Identity Identification Module (SIM) 957. In addition, in order to enable reception and use of location information, a GPS module 955 may also be present. It should be noted that in the embodiment shown in FIG. 9, the WWAN unit 956 and the integrated capture device (such as the camera module 954) may be via a given USB protocol (such as a USB 2.0 or 3.0 link) or a UART or I. 2 C agreement to communicate. Again, the actual physical connection of these units can be via the NGFF add-on card to the NGFF connector that is assembled on the motherboard.

In a particular embodiment, it may be provided to the modular wireless functionality, for example, by the Windows 8 CS support WiFi TM 802.11ac solutions (e.g., backward compatible with the IEEE 802.11abgn additional card). This card can be assembled into an internal slot (eg, via an NGFF adapter). Additional modules provide Bluetooth capabilities (for example, Bluetooth 4.0 with backward compatibility) and Intel® wireless display functionality. Additionally, NFC support can be provided via a separate device or multi-function device, and as an example, the NFC support can be located in the right front portion of the chassis for easy access. Another additional module can be a WWAN device that provides support for 3G/4G/LTE and GPS. This module can be implemented in an internal (eg, NGFF) slot. Integrated antenna support for WiFi TM , Bluetooth, WWAN, NFC and GPS, enabling smooth transition from WiFi TM to WWAN radio, wireless billion based on wireless Gigabit specification (July 2010) Bits (wireless gigabit, WiGig), and vice versa.

As described above, the integrated camera can be incorporated into the cover. As an example, the camera can be a high resolution camera, for example having a resolution of at least 2.0 megapixels (MP) and extending to 6.0 MP and higher.

To provide audio input and output, the audio processor can be implemented via a digital signal processor (DSP) 960 that can be coupled to the processor 910 via a high definition audio (HDA) link. Similarly, the DSP 960 can be in communication with an integrated encoder/decoder (CODEC) and amplifier 962, which in turn can be coupled to an output speaker 963 that can be implemented in a chassis. Similarly, the amplifier and CODEC 962 can be coupled to receive audio input from the microphone 965, which in one embodiment can be implemented via a dual array microphone, such as a digital microphone array, to provide high quality audio input to allow implementation. Control of voice activation for various operations within the system. It should also be noted that the audio output can be provided from the amplifier / CODEC 962 to the headset jack 964. Although shown as having such specific components in the embodiment of FIG. 9, it should be understood that the scope of the invention is not limited in this respect.

In a particular embodiment, the digital audio codec and amplifier are capable of driving a stereo headset jack, a stereo microphone jack, an internal microphone array, and a stereo speaker. In various implementations, the codec can be integrated into the audio DSP or coupled to a Peripheral Controller Hub (PCH) via the HD audio path. In some implementations, in addition to the integrated stereo speakers, one or more subwoofers may be provided, and the speaker solution may support DTS audio.

In some embodiments, processor 910 can be externally regulated The device (VR) is powered by a plurality of internal voltage regulators integrated within the processor die. These internal voltage regulators are referred to as fully integrated voltage regulators (FIVRs). The use of multiple FIVRs in the processor enables the components to be grouped into separate power planes such that power is regulated by the FIVR and only supplied to their components in the group. During power management, a given power plane of a FIVR can be powered down or turned off while the processor is placed in a certain low power state, while another power plane of another FIVR remains active or fully powered.

In one embodiment, the continuous power plane can be used during some deep sleep states to power up I/O pins for a number of I/O signals, such as an interface between the processor and the PCH, Interface with external VR and interface with EC 935. The continuous power plane also supplies power to the on-die voltage regulator that supports the SRAM or other cache memory on the die, where the processor context is stored on the board SRAM or other cache memory during the sleep state. body. The continuous power plane is also used to power up the wake-up logic of the processor, which monitors and processes various wake-up source signals.

During power management, while other power planes are powered off or turned off when the processor enters certain deep sleep states, the continuous power plane remains energized to support the components referenced above. However, this situation can result in unnecessary power consumption or dissipation when components are not needed. To this end, embodiments may provide a connection inactive sleep state to maintain a processor context using a dedicated power plane. In one embodiment, the connection sleep state uses the resources of the PCH to facilitate processor wake-up, which may itself be present in the package with the processor. In one embodiment, the connection inactive sleep state facilitates support for processor architecture functions in the PCH until the processor wakes up, This situation makes it possible to turn off all unnecessary processor components that were previously powered on during the deep sleep state, including turning off all clocks. In one embodiment, the PCH contains a timestamp counter (TSC) and connection inactivity logic for controlling the system during the connection inactive state. The integrated voltage regulator of the continuous power plane can also reside on the PCH.

In an embodiment, the integrated voltage regulator can act as a dedicated power plane that remains energized to support dedicated cache memory during the connection inactive state, when the processor enters the deep sleep state and connects to the inactive state. The memory context is stored in memory, such as significant state variables. This significant state may include state variables associated with the architecture, microarchitecture, debug status, and/or similar state variables associated with the processor.

The wake-up source signal from the EC 935 may be sent to the PCH instead of the processor during the connection inactive state so that the PCH, rather than the processor, can manage the wake-up process. In addition, TSC is maintained in the PCH to facilitate support for processor architecture functionality. Although shown as having such specific components in the embodiment of FIG. 9, it should be understood that the scope of the invention is not limited in this respect.

Power control in the processor can result in increased power savings. For example, power can be dynamically distributed between cores, individual cores can change frequency/voltage, and multiple deep low power states can be provided to allow for very low power consumption. In addition, dynamic control of the core or independent core portion can provide reduced power consumption by turning off its power when the component is not in use.

Some implementations may provide a specific power management IC (PMIC) to control platform power. By using this solution, the system can continue in a given inactive state (such as when in a Win8 connection inactive state) A very low (eg, less than 5%) battery degradation is experienced within time (eg, 16 hours). In the Win8 idle state, battery life of more than, for example, 9 hours can be achieved (for example, at 150 nits). For video playback, long battery life can be achieved, for example, full HD video playback can be performed for a minimum of 6 hours. In one implementation, for Win8 CS using SSD, the platform may have an energy capacity of, for example, 35 Watts (Whr), and for Win8 CS using HDD with RST cache memory, the platform may have, for example, 40 Energy capacity up to 44 Whr.

Specific implementations provide support for 15W nominal CPU Thermal Design Power (TDP), which can be combined with CPU TDP up to approximately 25W TDP design points. Due to the thermal characteristics described above, the platform can include a minimum of vents. In addition, the platform is pillow-friendly (because there is no hot air blowing to the user). Different maximum temperature points can be achieved depending on the chassis material. In one implementation of a plastic chassis (having at least a plastic cover or base portion), the maximum operating temperature may be 52 degrees Celsius (C). And for metal chassis implementation, the maximum operating temperature can be 46 °C.

In various implementations, a security module such as a TPM can be integrated into the processor or can be a discrete device, such as a TPM 2.0 device. With integrated security modules (also known as Platform Trust Technology (PTT)), BIOS/firmware can be enabled for exposure to certain security features (including security instructions, secure boot, Intel® anti-theft technology (Anti) -Theft Technology), Intel® Identity Protection Technology, Intel® Trusted Execution Technology (TXT) and Intel® Manageability Engine Technology) and secure user interface (such as , some features of the security keyboard and display).

The embodiments described herein provide several advantages and differences from the current system. Decentralized consistency and memory organization architecture facilitates simultaneous access to cache agents and non-cached IO agents via the use of parallel pipelines, including support for shared access to cache lines by both cache and non-cache agents While maintaining memory consistency and forcing the correct ordering. The use of parallel pipelines facilitates a greater amount of memory transport than is available under conventional architectures using a single pipeline. By providing shared access to memory resources for both the cache agent and the non-cache agent, the architecture provides an improvement over existing methods that use separate pipelines for cache and non-cache agents, Separate pipelines operate independently and do not provide shared access. By decoupling the address matching hardware from the sequencing hardware, the architecture allows for efficient, decentralized collision checking of I/O requests while maintaining proper sorting behavior. By mapping multiple virtual channels to fewer collision classes using the methods described above, the architecture reduces the associated area overhead typically incurred by typical systems that use dedicated resources for each virtual channel while achieving the required QoS. .

Although the embodiments described and exemplified herein focus on address conflicts, embodiments of the present invention may include other types of conflict checking, such as resource conflicts due to full shared resources being unavailable to any agent, or resources such as resources. Over-subscribed resource conflicts within the same agent.

The following examples are related to other embodiments. In an embodiment, a method is implemented in a computer system having system memory. Receiving a memory access request originating from a plurality of cache agents and a plurality of I/O agents in the computer system, each memory access request identifying at least one cache access requesting access The address of the line, wherein at least a portion of the system memory is accessible by both the at least one cache agent and the I/O agent. Simultaneously servicing multiple memory access requests via a decentralized memory organization using parallel pipelines while maintaining memory consistency of cache lines associated with the cache agent and forcing memory from I/O agents Memory access ordering for volume access requests.

In an embodiment of the method, a memory access request from an I/O agent is sent via a plurality of virtual channels, and wherein the memory access ordering of the memory access request originating from the I/O agent is enforced includes enforcement The memory accesses the ordering such that requests from the I/O agent sent via the same virtual channel appear to be served in the order in which they were received at the decentralized memory organization.

In an embodiment of the method, an address conflict check is performed on each memory access request originating from the cache agent to determine if the request conflicts with a previous memory access request whose service is pending. If a single address conflict check is detected, the request is queued into a cache proxy conflict queue; otherwise, the request is allowed to continue.

In an embodiment, the first and second lines are implemented in a decentralized memory structure. For each of the first and second pipelines, a conflict checking logic is implemented at each pipeline, and a request to detect an address conflict is placed in a cache proxy conflict queue associated with the pipeline .

In an embodiment, for each of the first and second pipelines, a scoreboard is implemented to track pending memory access requests that have been accepted to continue in the pipeline. In addition, for each memory access request received at each pipeline, by comparing the corresponding to the pipeline associated with the pipeline Determining whether there is an address in the proxy conflict queue and the address of the cache line of the memory access request in the scoreboard and the address of one of the cache lines included in the memory access request conflict.

In an embodiment of the method, a plurality of virtual channels are used to transmit a memory access request from an I/O agent, each memory access request being sent via a virtual channel associated with the request. Performing an address conflict check on each memory access request originating from an I/O agent to determine if the request conflicts with a pending memory access request, and if an address conflict is detected, The request is identified as a conflicting request and a conflict sorting operation is performed to order the conflicting request relative to other pending requests associated with the same virtual channel in order to maintain the same order of the requests received via the virtual channel, And arranging the conflicting request into one of the I/O conflict queues associated with the virtual channel.

In an embodiment, each virtual channel is mapped to a category, wherein the number of categories is less than the number of virtual channels, and based on the service class to which the virtual channel associated with each conflicting request is mapped, there will be a conflicting request queue Enter multiple service category I/O conflict queues.

In an embodiment, the first and second lines are implemented in the decentralized memory structure. Parallelly performing, for each of the first and second pipelines, an address conflict check on a memory access request originating from an I/O agent to determine whether the request is stored with a pending memory Retrieving the request conflict, and if there is no address conflict for each of the memory access requests being processed by the first and second pipelines within a given loop, accepting the two requests For further processing by its associated pipeline.

In an embodiment, the operations are performed in parallel by each of the first and second pipelines. The operation includes performing an address conflict check on a memory access request originating from an I/O agent to determine whether the request conflicts with a pending memory access request, and if it is for a given loop One of the memory access requests processed by the first and second pipelines has an address conflict, and one of the two requests is determined to be relative age. If one of the two requests does not have a single address conflict and a newer request has a one-address conflict, the older request is admitted for further processing by its associated pipeline.

In an embodiment of the method, the operations further include placing the newer request in the I/O conflict queue and notifying the pipeline associated with the older request: future requests for the same virtual channel to which the newer request is associated will It is queued into an I/O conflict queue until a newer request is accepted for further processing by its associated pipeline.

In an embodiment, a plurality of virtual channels are used to send a memory access request from an I/O agent, each memory access request being sent via a virtual channel associated with the request. Performing operations in parallel for each of the first and second pipelines includes performing an address conflict check on a memory access request originating from an I/O agent to determine whether the request is related to a pending memory a body access request conflict, and if there is an address conflict for one of the memory access requests being processed by the first and second pipelines within a given loop, then the two are determined One of the requests is relative to the age. If one of the two requests has an address conflict and the two requests are sent via the same virtual channel, both requests are queued to the same I/O A conflict queue in which the older request precedes the newer request.

In an embodiment of the method, the operations are performed in parallel by the first and second pipelines. The operation includes performing an address conflict check on a memory access request originating from an I/O agent to determine whether the request conflicts with a pending memory access request, from being implemented for transmission from the I/O One of the plurality of virtual channels requested by the proxy sends a memory access request and, if in a given loop, the memory access requests being processed by the first and second pipelines In one case, if there is an address conflict, it is determined that one of the two requests is relative to the age. If an older request of the two requests has an address conflict and the request is sent via a different virtual channel, the newer request is admitted for further processing by its associated pipeline. In an embodiment, the method further includes placing the older request in an I/O conflict queue; and notifying the pipeline associated with the newer request that future requests for the same virtual channel to which the older request is associated will be queued Enter the I/O conflict queue until the older request is accepted for further processing by its associated pipeline.

In an embodiment, the method includes using a hash algorithm for data contained in a memory access request to route memory access request routing to one of the first or second pipelines for use in Further processing. In an embodiment, arbitration is performed for each of a plurality of loops, wherein a plurality of memory access requests are received as input to an arbiter, the inputs including memory access requests originating from a cache agent An associated plurality of inputs and at least one input associated with a memory access request originating from an I/O agent. For each loop, there is an arbitration loop winner and the arbitration loop winner is forwarded to the logic that is assembled to implement the hash algorithm Series. In one embodiment, the inputs to the arbiter further include at least one input corresponding to a memory access request previously arbitrated by the arbiter and detecting an address conflict. In another embodiment, the anti-depletion mechanism is implemented and assembled to prevent repeated blocking of memory access requests within multiple conflict checking cycles of the same memory access request.

According to other embodiments, the apparatus is assembled with means for performing the operations of the aforementioned methods. In an embodiment, the device includes a memory access request arbiter configured to grant a memory access request from a plurality of input memory access requests, the plurality of input memory access requests including : a memory access request originating from a plurality of cache agents, a memory access request originating from a plurality of I/O agents, and a conflicting memory access request previously arbitrated by the arbiter, wherein each memory The access request identifies an address that requests access to one of the cache lines. The apparatus further includes: a decentralized memory organization including a plurality of pipelines configured to operate in parallel; at least one cache proxy conflict queue; at least one I/O conflict queue; and address conflict processing logic Corresponding to determine whether a currently evaluated memory access request conflicts with another pending memory access request and is configured to queue a conflicting memory access request from the cache agent into the at least A cache proxy conflict queue, and a conflicting memory access request from the I/O proxy is queued into the at least one I/O proxy conflict queue.

In an embodiment, the decentralized memory organization includes a decentralized consistency and memory organization including a plurality of coherent pipelines, each coherent pipeline being operatively coupled to an associated memory tissue pipeline, Each of the consistency pipelines is configured to facilitate memory access from the cache agent The memory consistency of the request.

In an embodiment, the memory access request originating from the I/O agent is sent via a plurality of virtual channels, and the device further includes conflict sorting logic that is configured to ensure pending memory sent via the same virtual channel The bulk access request appears to be processed in the order it was originally granted by the memory access request arbiter.

In an embodiment of the apparatus, the address conflict processing logic includes address conflict checking logic for each memory organization pipeline, and the at least one cache agent conflict queue includes one of the caches associated with each pipeline Agent queue. In an embodiment, each memory organization pipeline includes a scoreboard that buffers one of the received memory requests, and a scoreboard that stores the addresses of the received memory requests. In an embodiment, the address conflict checking logic in each memory organization pipeline is assembled to compare memory access requests in the cache queue and its scoreboard corresponding to its associated cache agent. The address of the cache line and the address of one of the cache lines referenced in the currently evaluated memory access request determine whether there is an address conflict.

In an embodiment, the apparatus further includes a conflict queue arbitrator configured to arbitrate the conflict between the at least one cache agent conflict queue and the at least one I/O conflict queue A memory access request, wherein one of the conflict queue arbiters outputs an input coupled to the memory access request arbitrator.

In an embodiment, the at least one I/O conflict queue comprises a plurality of class I/O conflict queues, and each virtual channel is assigned to a category. In an embodiment, the device further includes coupling to the memory access request a plurality of queues input by the arbitrator, the plurality of queues comprising: a plurality of cache proxy request queues, each of which is configured to queue requests from a respective cache agent; and An I/O request queue that is configured to queue requests from the plurality of I/O agents into a queue. In an embodiment, the apparatus further includes an anti-depletion mechanism configured to prevent the memory access request from being repeatedly blocked within a plurality of conflict checking cycles of the same memory access request.

In an embodiment, an apparatus includes: a plurality of processor cores each having at least one associated cache agent; a system agent operatively coupled to each of the processor cores, The system agent includes a distributed and consistent memory organization including a plurality of consistency pipelines and a plurality of memory organization pipelines, each memory organization pipeline being assembled to interface with a respective memory controller; a /O root complex operatively coupled to the system agent; an I/O interconnect hierarchy including at least one switching organization communicatively coupled to the I/O root complex; and a plurality of I/Os Agents, each of which is coupled to one of the exchange organizations in the I/O interconnect hierarchy. After the device is installed in a computer system including system memory accessed via respective memory controllers coupled to a plurality of memory organization pipelines, and after operation of the computer system, the devices are assembled to simultaneously serve And a memory access request from the plurality of cache agents and the plurality of I/O agents for accessing the cache line while maintaining memory consistency of the cache line associated with the cache agent, wherein A portion of the cache lines can be accessed by both at least one cache agent and at least one I/O agent.

In an embodiment, the device is configured to enforce execution from I/O Memory access ordering of the agent's memory access request. In an embodiment, each memory organization pipeline includes a scoreboard that buffers one of the accepted memory requests, a scoreboard that stores the addresses of the accepted memory requests, and an associated cache agent conflict. a column, and the device further includes address conflict checking logic for each memory organization pipeline that is assembled to compare the associated cache agent conflict queues and scoreboards corresponding to the pipeline The address of the cache line of the memory access request and the address of one of the cache lines referenced in the currently evaluated memory access request determine whether there is an address conflict.

In an embodiment of the apparatus, the system agent further comprises: at least one cache agent conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine a currently evaluated memory Whether the access request conflicts with another pending memory access request, and is configured to queue the conflicting memory access request from the cache agent into the at least one cache agent conflict queue, and will be from I The conflicting memory access request of the /O agent is placed in the at least one I/O agent conflict queue. In an embodiment, the system agent further comprises: a conflict queue arbitrator configured to arbitrate the at least one cache agent conflict queue and the conflicting memory access in the at least one I/O conflict queue request. In an embodiment, after the device is installed in a computer system and the operation of the computer system, the device is assembled to facilitate inter-aproxy and system agents via a plurality of virtual channels having associated categories The communication, wherein at least one I/O conflict queue includes a plurality of category I/O conflict queues, and wherein the number of categories is less than the number of virtual channels. In an embodiment, the system agent further includes memory access Requesting an arbiter configured to grant a memory access request from a plurality of input memory access requests, the plurality of input memory access requests including: memory from a plurality of cache agents A request for fetching; a memory access request originating from a plurality of input/output (I/O) agents; and a conflicting memory access request previously arbitrated by the arbiter.

In an embodiment of the apparatus, the apparatus includes an integrated circuit including a plurality of cache agents, a plurality of I/O agents, and a decentralized memory organization. The decentralized memory organization includes at least two pipelines and is configured to receive a plurality of requests from a plurality of cache agents and a plurality of I/O agents, wherein each pipeline includes a first conflict storage device and a second conflict storage device. Each pipeline further: in response to determining that there is no address conflict between the particular request in the plurality of requests and one or more of the pending requests, accepting the particular request; and responding to the determination of the particular request with a An address conflict exists between the plurality of pending requests, based on whether the specific request originates from one of the plurality of cache agents or one of the plurality of I/O agents, directing the specific request to the first conflicting storage device or The second conflict storage device.

According to other embodiments, a system for implementing a prior method is disclosed. In an embodiment, the system includes a motherboard and a multi-core processor coupled to the motherboard or mounted on the motherboard, the multi-core processor including: a system agent operatively coupled to the processor cores Each of the system agents includes a plurality of consistent pipelines and a plurality of memory organization pipelines, a decentralized and consistent memory organization, each memory organization pipeline being assembled to control with a respective memory Interfacing; an I/O root complex operatively coupled to the system agent; an I/O interconnect hierarchy including The information is coupled to at least two switching organizations of the I/O root complex; and a plurality of I/O interfaces, each of which is coupled to an exchange organization and includes an I/O agent. The system further includes: at least two memory devices coupled to the motherboard or mounted on the motherboard, the at least two memory devices being assembled into the first and second blocks of the system memory; a second memory controller operatively coupled to the respective memory tissue lines, each of the memory tissue lines being coupled to the at least one memory device and configured to access one of the system memories a plurality of I/O devices coupled to the motherboard or to the motherboard and coupled to a respective I/O interface; and a flash memory coupled to the multi-core processor, The flash memory has BIOS instructions stored therein to assemble the multi-core processor. After the operation of the system, the multi-core processor facilitates simultaneously servicing the memory access request from the plurality of cache agents and the plurality of I/O agents for accessing the cache line while maintaining and caching The memory consistency of the associated cache line, wherein one portion of the cache lines is accessible by both the at least one cache agent and the at least one I/O agent.

In an embodiment, after operation of the system, the multi-core processor is configured to enforce a memory access ordering of memory access requests originating from an I/O agent. In the system, each memory organization pipeline includes a scoreboard that buffers one of the accepted memory requests, a scoreboard that stores the addresses of the accepted memory requests, and an associated cache proxy conflict queue. And the system further includes address conflict checking logic for each memory organization pipeline that is assembled to compare the memory in the associated cache proxy queue and the scoreboard corresponding to the pipeline Bus access request cache line The address of the cache line and one of the addresses of the cache line referenced in the currently evaluated memory access request determine whether there is an address conflict.

In an embodiment of the system, the system agent further comprises: at least one cache agent conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine a currently evaluated memory Whether the access request conflicts with another pending memory access request, and is configured to queue the conflicting memory access request from the cache agent into the at least one cache agent conflict queue, and will be from I The conflicting memory access request of the /O agent is placed in the at least one I/O agent conflict queue.

While the invention has been described in terms of a limited number of embodiments, many modifications and variations are apparent to those skilled in the art. All such modifications and variations are intended to be included within the true spirit and scope of the invention.

Design can go through various stages from build to simulation to manufacturing. Information indicating design can be expressed in a variety of ways. First, as useful in simulations, a hardware description language or another functional description language can be used to represent the hardware. In addition, circuit level models with logic and/or transistor gates can be generated at certain stages of the design process. In addition, at some stage, most designs reach the level of data that represents the physical placement of the various devices in the hardware model. In the case of conventional semiconductor fabrication techniques, the data representing the hardware model may be the presence or absence of information specifying the various features on the different mask layers of the mask used to produce the integrated circuit. In any representation of the design, the material may be stored in any form of non-transitory machine readable medium.

A module or component as used herein refers to any combination of hardware, software, and/or firmware. As an example, a module or component includes a hardware (such as a microcontroller) associated with a non-transitory medium for storing code that is adapted to be executed by a microcontroller. Thus, in one embodiment, a reference to a module or component refers to a hardware that is specifically configured to recognize and/or execute a code to be stored on a non-transitory medium. Moreover, in another embodiment, the use of a module or component refers to a non-transitory media that includes a code that is specifically adapted to be executed by a microcontroller to perform a predetermined operation. And as can be inferred, in yet another embodiment, the term module (in this example) can refer to a combination of a microcontroller and non-transitory media. Modules and/or component boundaries that are typically illustrated as separate are often changing and potentially overlapping. For example, the first and second modules can share hardware, software, firmware, or a combination thereof while potentially maintaining some separate hardware, software, or firmware. In one embodiment, the use of the term logic includes hardware, such as a transistor, a scratchpad, or other hardware such as a programmable logic device.

In one embodiment, the use of the phrase "to" or "associated with" means configuring, assembling, manufacturing, selling, importing, and/or designing equipment, hardware, logic, or components to perform the designation or determination. The task. In this example, devices that are not in operation or their components are still "assigned to" perform the specified tasks if they are designed, coupled, and/or interconnected to perform the specified tasks. As a purely illustrative example, the logic gate can provide 0 or 1 during operation. However, the logic gate that provides the enable signal to the clock does not include every potential logic gate that can provide 1 or 0. Alternatively, the logic gate is a logic gate that is coupled in a manner that will enable the clock during operation 1 or 0. Again noticed, "the group The use of the term "comprising" does not require an operation, but instead focuses on the latent state of the device, hardware, and/or component, wherein in the latent state, the device, hardware, and/or components are designed to be in the device, hard The body and/or component performs a specific task while in operation.

In addition, in one embodiment, the use of the phrase "capable of" or "operable" means that some device, logic, hardware, and/or component is designed to enable the device, logic, and hardware to be used in a specific manner. Body and / or components. As mentioned above, in one embodiment, the use of "to", "capable" or "operable" refers to the latency of devices, logic, hardware and/or components, where devices, logic The hardware and/or components are not in operation, but are designed to enable the device to be used in a particular manner.

As used herein, a value includes any known representation of a number, state, logic state, or binary logic state. The use of logic levels, logic values, or logical values is often referred to as 1 and 0, which simply represents the binary logic state. For example, 1 refers to a high logic level and 0 refers to a low logic level. In one embodiment, a storage cell such as a transistor or a flash cell may be capable of holding a single logical value or multiple logical values. However, other representations of values have been used in computer systems. For example, the decimal digit 10 can also be expressed as a binary value 1010 and a hexadecimal letter A. Thus, the value includes any representation of the information that can be stored in the computer system.

In addition, the state can be represented by a value or a portion of a value. As an example, a first value such as a logical one may represent a preset or an initial value, and a second value such as a logical zero may represent a non-preset state. Additionally, in one embodiment, the word The language reset and settings refer to presets and updated values or status, respectively. For example, the preset value potentially includes a high logic value, ie, a reset, and the update value potentially includes a low logic value, ie, a setting. It should be noted that any combination of values can be utilized to represent any number of states.

Embodiments of the methods, hardware, software, firmware or code set forth above may be implemented via instructions or code stored on a machine accessible, machine readable, computer readable or computer readable medium. The instructions or code may be executed by the processing element. Non-transitory machine-accessible/readable media includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine, such as a computer or electronic system. For example, non-transitory machine-accessible media includes: random access memory (RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic or optical storage media; flash memory devices; Electrical storage device; optical storage device; acoustic storage device; other forms of storage device for storing information received from temporary (propagating) signals (eg, carrier waves, infrared signals, digital signals), etc., should be temporary (propagation) The signal is distinguished from the non-transitory media from which it can receive information.

The instructions for planning logic to perform embodiments of the present invention may be stored in a memory (such as a DRAM, cache memory, flash memory, or other storage device) in the system. In addition, the instructions can be distributed via the network or via other computer readable media. Accordingly, a machine-readable medium can include any mechanism for storing or transmitting information in a form readable by a machine (eg, a computer), but is not limited to a flexible magnetic disk, optical disk, compact disk read-only memory (CD-ROM) ) and magneto-optical disc, read-only memory (ROM), random access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical memory card, flash memory or for use via electrical, optical, acoustic A tangible, machine-readable storage device that transmits information over the Internet, or other form of propagated signal (eg, carrier wave, infrared signal, digital signal, etc.). Accordingly, computer readable medium includes any type of tangible machine readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, a computer).

The reference to "one embodiment" or "an embodiment" in this specification means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Therefore, the phrase "in one embodiment" or "in the embodiment" Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In the foregoing specification, the detailed description has been described with reference to the specific exemplary embodiments. It will be apparent, however, that various modifications and changes can be made in the present invention without departing from the spirit and scope of the invention. The specification and drawings are to be regarded as illustrative and not restrictive. In addition, the foregoing uses of the embodiments and other illustrative language are not necessarily referring to the same embodiments or the same examples, but may refer to different and different embodiments, and potentially the same embodiments.

304‧‧‧ Arbitrator

306-0, 306-1‧‧‧ memory tissue pipeline

400-0, 400-N‧‧‧ cache proxy request queue

401‧‧‧I/O root complex request queue

402‧‧‧ by category "I/O conflict queue

404-0, 404-1‧‧‧ Conflict Check Logic Block

406-0, 406-1‧‧‧ cache proxy conflict queue

407-0, 407-1‧‧‧ scoreboard

408‧‧‧ Conflict Sorting Block (COB)

410, 411, 412, 413, 415, 417 ‧ ‧ forward and reverse

414‧‧‧Reject IO request multiplexer (mux)

416‧‧‧Reject IO request to demultiplexer

418‧‧‧Clash Sequence Arbitrator

420‧‧‧Hatch Logic

Claims (29)

  1. An apparatus comprising: a plurality of cache agents each associated with a different processor core; a memory access request arbiter configured to grant one of a plurality of input memory access requests a memory access request, the plurality of input memory access requests including: a memory access request originating from the plurality of cache agents; and a memory access from a plurality of input/output (I/O) agents a request; and a conflicting memory access request previously arbitrated by the arbiter, each memory access request identifying an address associated with one of the cache lines for requesting access; a decentralized memory An organization comprising a plurality of pipelines configured to operate in parallel; at least one cache proxy conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine a current assessment Whether the memory access request conflicts with another pending memory access request and is configured to queue the conflicting memory access request from the cache agent into the at least one cache proxy conflict queue, And will Since I / O agents of conflicting memory access request into at least one I / O queue proxy conflict.
  2. The device of claim 1, wherein the distributed memory organization comprises Included is a decentralized consistency and memory organization of a plurality of coherent pipelines, each coherent pipeline being operatively coupled to an associated memory tissue pipeline, wherein each coherent pipeline is assembled Promotes memory consistency for memory access requests originating from the cache agent.
  3. The device of claim 1, wherein the memory access request originating from the I/O agent is sent via a plurality of virtual channels, the device further comprising conflict sorting logic configured to ensure pending memory sent via the same virtual channel Access requests appear to be processed in the order they were originally granted by the memory access request arbiter.
  4. The device of claim 1, wherein the address conflict processing logic includes address conflict checking logic for each memory organization pipeline, and the at least one cache agent conflict queue includes one of each pipeline associated with each Take the proxy queue.
  5. The device of claim 4, wherein each of the memory organization pipelines includes a buffer receiving one of the receiving memory requests, and a scoreboard storing the addresses of the accepted memory requests.
  6. The device of claim 5, wherein the address conflict checking logic in each memory organization pipeline is configured to compare the memory in the associated cache proxy queue and the scoreboard corresponding thereto The address of the cache line of the body access request and the address of one of the cache lines referenced in the currently evaluated memory access request determine whether there is an address conflict.
  7. The device of claim 1, further comprising a conflict queue arbitrator configured to arbitrate the conflicting memory access in the at least one cache agent conflict queue and the at least one I/O conflict queue Request, where An output of the conflict queue arbiter is coupled to an input of the memory access request arbiter.
  8. The device of claim 1, wherein the at least one I/O conflict queue comprises a plurality of class I/O conflict queues, and wherein each of the virtual channels is assigned to a category.
  9. The device of claim 1, further comprising a plurality of queues coupled to respective inputs of the memory access request arbitrator, the plurality of queues comprising: a plurality of cache proxy request queues, each of which is The grouping is to queue requests from a respective cache agent; and an I/O request queue is configured to queue requests from the plurality of I/O agents into the queue.
  10. The device of claim 1, wherein the device further comprises an anti-exhaustion mechanism configured to prevent the memory access request from being repeatedly blocked within the plurality of conflict checking cycles for the same memory access request.
  11. A method comprising the steps of: receiving a memory access request originating from a plurality of cache agents and a plurality of input/output (I/O) agents in a computer system having one of system memory, each memory being stored The request identifier identifies an address associated with one of the cache lines requesting access, wherein at least a portion of the system memory is accessible by both the at least one cache agent and an I/O agent; and in parallel One of the parallel pipelines decentralized memory organization concurrently servicing at least a portion of the system memory accesses accessible by at least one cache agent and an I/O agent for at least a portion of the memory access requests While maintaining for association with such cache agents The memory of the cache line is consistent and enforces memory access ordering for memory access requests originating from the I/O agent.
  12. The method of claim 11, wherein the memory access requests from the I/O agents are transmitted via a plurality of virtual channels, and wherein memory for memory access requests originating from the I/O agents is enforced The bulk access ordering includes enforcing memory access ordering such that requests from I/O agents sent via the same virtual channel appear to be served in the order in which they were received at the decentralized memory organization.
  13. The method of claim 11, further comprising: performing an address conflict check on each memory access request originating from a cache agent to determine whether the request is a pending previous memory access request with the service Conflicting; and if an address conflict is detected, the following action is taken: the request is queued into a cache proxy conflict queue; otherwise, the request is accepted to continue.
  14. The method of claim 13, further comprising: implementing the first and second pipelines in the decentralized memory organization; and performing the following actions for each of the first and second pipelines: A pipeline conflict check logic; and a request to detect an address conflict is queued into one of the cache proxy conflict queues associated with the pipeline.
  15. The method of claim 14, further comprising: Performing the following actions for each of the first and second pipelines: tracking a pending memory access request that has been accepted to continue in the pipeline via a scoreboard; and for each received at the pipeline a memory access request by comparing an address of the cache line corresponding to the cache proxy queue associated with the pipeline and a memory access request in the scoreboard with the memory access The request contains one address of one of the cache lines to determine if there is an address conflict.
  16. The method of claim 11, further comprising: using a plurality of virtual channels to send a memory access request from the I/O agent, each memory access request being sent via a virtual channel associated with the request Performing an address conflict check on each memory access request originating from an I/O agent to determine if the request conflicts with a pending memory access request; and if an address conflict is detected, Then performing the following actions: identifying the request as a conflicting request and performing a conflicting sorting operation to sort the conflicting request with respect to other pending requests associated with the same virtual channel to maintain receipt of the same via the virtual channel The same order of requests; and the conflicting request is queued into one of the I/O conflict queues associated with the virtual channel.
  17. The method of claim 11, further comprising: Implementing first and second pipelines in the decentralized memory organization; performing the following actions in parallel for each of the first and second pipelines: accessing memory from one of the I/O agents Requesting an address conflict check to determine if the request conflicts with a pending memory access request; and if the memory is processed by the first and second pipelines for a given loop If there is no address conflict for each of the access requests, then the two requests are accepted for further processing by their associated pipeline.
  18. The method of claim 11, further comprising: implementing the first and second pipelines in the decentralized memory organization; performing the following actions in parallel for each of the first and second pipelines: An I/O agent memory access request performs an address conflict check to determine if the request conflicts with a pending memory access request; and if used for a given loop by the first If there is an address conflict with one of the memory access requests processed by the second pipeline, the following action is performed: determining one of the two requests relative to the age; if one of the two requests is The old request does not have a single address conflict, and a newer request has an address conflict, accepting the older request For further processing by the pipeline associated therewith.
  19. The method of claim 11, further comprising: implementing the first and second pipelines in the distributed memory organization; using a plurality of virtual channels to send a memory access request from the I/O agent, each memory An access request is sent via a virtual channel associated with the request; for each of the first and second pipelines, the following actions are performed in parallel: storing one memory from an I/O agent Requesting to perform a one-bit address check to determine if the request conflicts with a pending memory access request; and if for a given loop, the memory processed by the first and second pipelines If one of the access requests has an address conflict, the following action is taken: determining that one of the two requests is relative to the age; if one of the two requests has an address conflict with the old request, and via the same The virtual channel sends both of these requests, and the two requests are queued to the same I/O conflict queue, where the older request precedes the newer request.
  20. The method of claim 11, further comprising: implementing the first and second pipelines in the memory organization; using a hash algorithm for the data contained in a memory access request to access the memory access request Arranging routing to one of the first or second pipelines for further processing; For each of the plurality of loops, the arbitration is a plurality of memory access requests received as input to an arbiter, the inputs including a plurality of memory association requests associated with the memory access request originating from the cache agent Entering, and at least one input associated with a memory access request originating from an I/O agent; and, for each cycle, granting an arbitration loop winner and forwarding the arbitration loop winner to the assembled The logic of the hash algorithm is implemented, wherein the inputs to the arbiter further comprise at least one input corresponding to a memory access request previously arbitrated by the arbiter for detecting an address conflict.
  21. An apparatus comprising: an integrated circuit comprising: a plurality of processor cores each having at least one associated cache agent; a system agent operatively coupled to the processor cores Each of the system agents includes a plurality of consistent pipelines and a plurality of memory organization pipelines, a decentralized and consistent memory organization, each memory organization pipeline being assembled with a respective memory controller Interfacing; an input/output (I/O) root complex operatively coupled to the system agent; an I/O interconnect layer including a communication coupling to the I/O root complex At least one exchange organization; a plurality of I/O agents each coupled to one of the I/O interconnecting layers, wherein the integrated circuit is configured to simultaneously serve the plurality of cache agents from the plurality of cache agents The memory access request of the I/O agent for accessing the cache line while maintaining memory consistency for the cache line associated with the cache agent, wherein one of the cache lines is at least A cache agent is accessed by both the at least one I/O agent.
  22. The device of claim 21, wherein the device is configured to enforce a memory access ordering for a memory access request originating from an I/O agent.
  23. The device of claim 21, wherein each memory organization pipeline includes a scoreboard buffering one of the accepted memory requests, a scoreboard storing the address of the accepted memory request, and an associated cache The proxy conflict queue, the device further comprising address conflict checking logic for each memory organization pipeline, which is configured to compare the associated cache proxy conflict queues and scoreboards corresponding thereto The address of the cache line of the memory access request and the address of one of the cache lines referenced in the currently evaluated memory access request determine whether there is an address conflict.
  24. The device of claim 21, wherein the system agent further comprises: at least one cache agent conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine a current evaluation memory Whether the volume access request conflicts with another pending memory access request and is configured to access the conflicting memory from the cache proxy. The request is queued into the at least one cache proxy conflict queue, and the conflicting memory access request from the I/O proxy is queued into the at least one I/O proxy conflict queue.
  25. The device of claim 24, wherein the system agent further comprises: a conflict queue arbitrator configured to arbitrate the conflict between the at least one cache agent conflict queue and the at least one I/O conflict queue Memory access request.
  26. A system comprising: a chassis; a motherboard disposed in the chassis; a multi-core processor coupled to the motherboard or mounted on the motherboard, the multi-core processor comprising: a plurality of processor cores Each having at least one associated cache agent; a system agent operatively coupled to each of the processor cores, the system agent including a plurality of coherent pipelines and a plurality of memories One of the organizational pipelines is decentralized and consistent memory organization, each memory organization pipeline is assembled to interface with a separate memory controller; an input/output (I/O) root complex, its operability Is coupled to the system agent; an I/O interconnect layer comprising at least two switching organizations communicatively coupled to the I/O root complex; and a plurality of I/O interfaces each coupled to An exchange organization Including an I/O agent; at least two memory devices coupled to the motherboard or mounted on the motherboard, the at least two memory devices being assembled as the first and second blocks of the system memory; First and second memory controllers operatively coupled to respective memory tissue lines, each of the memory tissue lines being coupled to the at least one memory device and configured to access the system memory a plurality of I/O devices coupled to the motherboard or mounted on the motherboard and coupled to a respective I/O interface; a touch screen display mounted on the chassis And operatively coupled to the multi-core processor; a flash memory coupled to the multi-core processor, the flash memory having a BIOS instruction stored therein to assemble the multi-core processor; After loading the BIOS instructions, the multi-core processor is configured to facilitate simultaneous service of memory from the plurality of cache agents and the plurality of I/O agents for accessing the cache line Take the request while maintaining the record for the cache line associated with the cache agent Consistency body, wherein a part of such a cache line of the cache lines by at least one agent and at least one I / O both of access agents.
  27. The system of claim 26, wherein the multi-core processor is configured to enforce memory access sequencing for memory access requests originating from an I/O agent.
  28. The system of claim 26, wherein each of the memory organization pipelines includes a scoreboard that buffers one of the accepted memory requests, a scoreboard that stores the addresses of the accepted memory requests, and an associated cache An agent conflict queue, the system further comprising address conflict checking logic for each memory organization pipeline, which is assembled to compare the associated cache proxy conflict queues and scoreboards corresponding thereto The address of the cache line of the memory access request and the address of one of the cache lines referenced in the currently evaluated memory access request determine whether there is an address conflict.
  29. The system of claim 26, wherein the system agent further comprises: at least one cache agent conflict queue; at least one I/O conflict queue; and address conflict processing logic configured to determine a current evaluation memory Whether the bulk access request conflicts with another pending memory access request and is configured to queue the conflicting memory access request from the cache proxy into the at least one cache proxy conflict queue, and A conflicting memory access request from the I/O agent is queued into the at least one I/O agent conflict queue.
TW103105661A 2013-03-05 2014-02-20 A method for processing address conflict, device and system in a distributed memory organization in TWI524184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/785,908 US9405688B2 (en) 2013-03-05 2013-03-05 Method, apparatus, system for handling address conflicts in a distributed memory fabric architecture

Publications (2)

Publication Number Publication Date
TW201447580A TW201447580A (en) 2014-12-16
TWI524184B true TWI524184B (en) 2016-03-01

Family

ID=51489345

Family Applications (1)

Application Number Title Priority Date Filing Date
TW103105661A TWI524184B (en) 2013-03-05 2014-02-20 A method for processing address conflict, device and system in a distributed memory organization in

Country Status (3)

Country Link
US (1) US9405688B2 (en)
TW (1) TWI524184B (en)
WO (1) WO2014137864A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9405688B2 (en) 2013-03-05 2016-08-02 Intel Corporation Method, apparatus, system for handling address conflicts in a distributed memory fabric architecture
CN104346283B (en) * 2013-08-01 2018-09-28 腾讯科技(北京)有限公司 The inquiry amount distribution method and device of network media information amount of storage
US9911477B1 (en) * 2014-04-18 2018-03-06 Altera Corporation Memory controller architecture with improved memory scheduling efficiency
US10289585B1 (en) * 2014-08-20 2019-05-14 Altera Corporation Cross-point programming of pipelined interconnect circuitry
US9971711B2 (en) * 2014-12-25 2018-05-15 Intel Corporation Tightly-coupled distributed uncore coherent fabric
US20160191420A1 (en) * 2014-12-27 2016-06-30 Intel Corporation Mitigating traffic steering inefficiencies in distributed uncore fabric
US10133670B2 (en) * 2014-12-27 2018-11-20 Intel Corporation Low overhead hierarchical connectivity of cache coherent agents to a coherent fabric
US10474569B2 (en) 2014-12-29 2019-11-12 Toshiba Memory Corporation Information processing device including nonvolatile cache memory and processor
US9971542B2 (en) 2015-06-09 2018-05-15 Ultrata, Llc Infinite memory fabric streams and APIs
US10235063B2 (en) 2015-12-08 2019-03-19 Ultrata, Llc Memory fabric operations and coherency using fault tolerant objects
US10241676B2 (en) 2015-12-08 2019-03-26 Ultrata, Llc Memory fabric software implementation
US9965185B2 (en) 2015-01-20 2018-05-08 Ultrata, Llc Utilization of a distributed index to provide object memory fabric coherency
US9886210B2 (en) 2015-06-09 2018-02-06 Ultrata, Llc Infinite memory fabric hardware implementation with router
US9858190B2 (en) * 2015-01-27 2018-01-02 International Business Machines Corporation Maintaining order with parallel access data streams
US10157160B2 (en) 2015-06-04 2018-12-18 Intel Corporation Handling a partition reset in a multi-root system
US9990327B2 (en) * 2015-06-04 2018-06-05 Intel Corporation Providing multiple roots in a semiconductor device
CN108885607A (en) * 2015-12-08 2018-11-23 乌尔特拉塔有限责任公司 Use the memory construction operation of fault tolerant object and consistency
TWI557577B (en) * 2016-01-12 2016-11-11 Inventec Corp The system is used to prevent address conflicts and method thereof
US10037150B2 (en) * 2016-07-15 2018-07-31 Advanced Micro Devices, Inc. Memory controller with virtual controller mode
US10372642B2 (en) * 2016-09-29 2019-08-06 Intel Corporation System, apparatus and method for performing distributed arbitration

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832304A (en) * 1995-03-15 1998-11-03 Unisys Corporation Memory queue with adjustable priority and conflict detection
US5875472A (en) * 1997-01-29 1999-02-23 Unisys Corporation Address conflict detection system employing address indirection for use in a high-speed multi-processor system
US6622225B1 (en) * 2000-08-31 2003-09-16 Hewlett-Packard Development Company, L.P. System for minimizing memory bank conflicts in a computer system
US20050149654A1 (en) * 2004-01-06 2005-07-07 Holloway Marty M. Modular audio/video device and method
US7360069B2 (en) 2004-01-13 2008-04-15 Hewlett-Packard Development Company, L.P. Systems and methods for executing across at least one memory barrier employing speculative fills
US8560795B2 (en) * 2005-06-30 2013-10-15 Imec Memory arrangement for multi-processor systems including a memory queue
US7437518B2 (en) * 2005-09-07 2008-10-14 Intel Corporation Hiding conflict, coherence completion and transaction ID elements of a coherence protocol
US7447844B2 (en) * 2006-07-13 2008-11-04 International Business Machines Corporation Data processing system, processor and method of data processing in which local memory access requests are serviced on a fixed schedule
US20080034146A1 (en) 2006-08-04 2008-02-07 Via Technologies, Inc. Systems and Methods for Transactions Between Processor and Memory
US8239633B2 (en) * 2007-07-11 2012-08-07 Wisconsin Alumni Research Foundation Non-broadcast signature-based transactional memory
KR100922732B1 (en) 2007-12-10 2009-10-22 한국전자통신연구원 Apparatus and method for reducing memory access conflict
US8205057B2 (en) 2009-06-30 2012-06-19 Texas Instruments Incorporated Method and system for integrated pipeline write hazard handling using memory attributes
US8521963B1 (en) * 2009-09-21 2013-08-27 Tilera Corporation Managing cache coherence
US8533399B2 (en) 2010-01-15 2013-09-10 International Business Machines Corporation Cache directory look-up re-use as conflict check mechanism for speculative memory requests
US8375171B2 (en) 2010-04-08 2013-02-12 Unisys Corporation System and method for providing L2 cache conflict avoidance
US9021306B2 (en) * 2012-12-13 2015-04-28 Apple Inc. Debug access mechanism for duplicate tag storage
US9405688B2 (en) 2013-03-05 2016-08-02 Intel Corporation Method, apparatus, system for handling address conflicts in a distributed memory fabric architecture

Also Published As

Publication number Publication date
WO2014137864A1 (en) 2014-09-12
US20140258620A1 (en) 2014-09-11
TW201447580A (en) 2014-12-16
US9405688B2 (en) 2016-08-02

Similar Documents

Publication Publication Date Title
KR101700261B1 (en) High performance interconnect coherence protocol
US9841807B2 (en) Method and apparatus for a zero voltage processor sleep state
US9304570B2 (en) Method, apparatus, and system for energy efficiency and energy conservation including power and performance workload-based balancing between multiple processing elements
EP2075696A2 (en) Interrupt- related circuits, systems and processes
US8347012B2 (en) Interrupt morphing and configuration, circuits, systems, and processes
CN104364750B (en) The pretreated methods, devices and systems of distribution controlled for touch data and display area
EP3210093B1 (en) Configurable volatile memory data save triggers
US9075610B2 (en) Method, apparatus, and system for energy efficiency and energy conservation including thread consolidation
JP2013545205A (en) Interrupt distribution scheme
CN104583900A (en) Dynamically switching a workload between heterogeneous cores of a processor
CN104063290B (en) Handle system, the method and apparatus of time-out
US9753529B2 (en) Systems, apparatuses, and methods for synchronizing port entry into a low power status
US10168758B2 (en) Techniques to enable communication between a processor and voltage regulator
CN103631656A (en) Task scheduling in big and little cores
US9292076B2 (en) Fast recalibration circuitry for input/output (IO) compensation finite state machine power-down-exit
KR101637075B1 (en) Method, apparatus, and system for improving resume times for root ports and root port integrated endpoints
CN104956347A (en) Leveraging an enumeration and/or configuration mechanism of one interconnect protocol for a different interconnect protocol
US9494998B2 (en) Rescheduling workloads to enforce and maintain a duty cycle
US9405688B2 (en) Method, apparatus, system for handling address conflicts in a distributed memory fabric architecture
US9575537B2 (en) Adaptive algorithm for thermal throttling of multi-core processors with non-homogeneous performance states
EP3155521B1 (en) Systems and methods of managing processor device power consumption
US10084698B2 (en) Selectively enabling first and second communication paths using a repeater
CN105009101B (en) The monitoring filtering associated with data buffer is provided
TWI522792B (en) The apparatus required to generate, the method for memory requirements, and the computing system
DE112016002913T5 (en) Dynamic configuration of connection modes to a system based on host device capabilities