CN107278295B

CN107278295B - Buffer overflow detection for byte level granularity of memory corruption detection architecture

Info

Publication number: CN107278295B
Application number: CN201680012160.5A
Authority: CN
Inventors: T·史塔克; A·塔尔; R·加伯; J·努兹曼
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2015-03-25
Filing date: 2016-01-20
Publication date: 2021-04-27
Anticipated expiration: 2036-01-20
Also published as: WO2016153586A1; TWI587127B; EP3274832A1; CN107278295A; TW201643715A; EP3274832A4

Abstract

Memory corruption detection techniques are described. The processor may include a memory to store data from an application, wherein the memory includes a Memory Corruption Detection (MCD) table. The processor may also include a processor core coupled to the memory. The processor core may receive a memory access request from an application to access data of one or more contiguous memory blocks in a memory object of the memory. The processor core may also retrieve data stored in one or more contiguous memory blocks based on the location indicated by the pointer. The processor core may also retrieve allocation information associated with one or more contiguous memory blocks from the MCD table. The processor core may be further operative to send an error message to the application based on the allocation information when an error event associated with the retrieved data occurs.

Description

Buffer overflow detection for byte level granularity of memory corruption detection architecture

Background

Memory corruption can be a major resource problem leading to system failure and can negatively impact the performance of the system. Memory corruption can be caused by a variety of causes, including: programming errors on memory, out-of-range accesses, dangling pointers, and malicious attacks. Using corrupted memory content in a computer program may cause the computer program to crash or behave abnormally. Software solutions may be used for memory corruption detection, such as a debug tool. However, software solutions may cause computer programs to run significantly slower and may be difficult to use when debugging the computer program.

Drawings

FIG. 1 illustrates a Memory Corruption Detection (MCD) system according to one embodiment.

Fig. 2 illustrates an architecture of an MCD system having a system memory and an MCD table, according to one embodiment.

FIG. 3 depicts a flowchart of a method for associating one or more memory blocks of a memory object with memory allocation information of an MCD metadata word, in accordance with one embodiment.

Figure 4A illustrates an MCD metadata word associated with a memory block, according to one embodiment.

Fig. 4B illustrates another MCD metadata word associated with a memory block in accordance with one embodiment.

FIG. 5A illustrates a first memory block and a second memory block allocated to a memory object, according to one embodiment.

FIG. 5B depicts a flowchart of a method for checking out of range memory accesses in memory, according to one embodiment.

FIG. 6A is a block diagram illustrating an in-order pipeline and a register renaming stage, out-of-order issue/execution pipeline, according to one embodiment.

FIG. 6B is a block diagram illustrating a microarchitecture of a processor implementing secure memory repartitioning, according to one embodiment.

FIG. 7 illustrates a block diagram of a microarchitecture of a processor including logic circuitry to perform secure memory repartitioning, in accordance with one embodiment.

FIG. 8 is a block diagram of a computer system according to one implementation.

FIG. 9 is a block diagram of a computer system according to another implementation.

FIG. 10 is a block diagram of a system-on-chip in accordance with one implementation.

FIG. 11 illustrates another implementation of a block diagram of a computing system.

FIG. 12 illustrates another implementation of a block diagram of a computing system.

Detailed Description

Memory corruption can occur when the contents of a memory location are accessed. The contents of the memory locations may be inadvertently accessed due to programming errors, or intentionally modified due to malicious attacks. There may be a variety of different causes of memory corruption. One cause of memory corruption may be coding errors, where an application wrongly writes or reads an unexpected memory block of system memory. Another cause of memory corruption may be when an application uses an invalid pointer to write data to a memory block that has been freed. Another cause of memory corruption may be when an application attempts to write data to a header of a memory block (or other restricted or reserved memory area) that may be managed by an Operating System (OS). Various other causes of memory corruption may exist. Using corrupted memory can lead to data corruption (e.g., corrupted contents of a database system), memory management problems, and result in performance degradation, unpredictable program execution, or program crashes. Memory Corruption Detection (MCD) may be used to detect memory corruption. However, the MCD will attach metadata to a fixed size block of N bytes of memory (such as a 64 byte block). When the MCD uses fixed-size memory blocks, an N-byte (B) granular allocation may be used to prevent multiple allocations from sharing the same metadata. However, conventionally, a buffer overflow within a memory block of fixed-size metadata (e.g., a metadata block) may not be detected. Furthermore, conventional MCD architectures may use performance or memory overhead that limits the use of conventional MCDs to debug or pre-production uses.

Embodiments described herein may address the above-described deficiencies by using an MCD architecture with byte-level out-of-range detection. The MCD architecture may include a metadata table with MCD unique identifiers (such as MCD color) and MCD boundary values for indicating used (legal) and unused (illegal) access regions of memory blocks or memory objects. The MCD architecture may also include a processor or software library (such as an allocation library) executed by the processor for setting or allocating MCD metadata words. Furthermore, the MCD architecture may use MCD metadata words to check when load or store data is legitimate or authorized (such as at byte level granularity).

Heap memory (heap memory) is an area of reserved memory that a program or application may use to store a variable amount of data that may be used while the program is running. For example, an application may receive different amounts or types of input data for processing (such as from different users) and store the input data in heap memory. While an application may be running, the application may process different amounts or types of input data. An allocation bank executed by the processor may be used for memory allocation, memory release, and Memory Corruption Detection (MCD) data management. To prevent the incidence of memory corruption, the processing system or processor may be used to validate pointers generated by memory access instructions of an application being executed by the processing system or processor. In one example, a processing system may maintain a metadata table that stores identifiers for different allocated buffers (e.g., memory allocations), including one or more contiguous memory blocks of system memory. In another example, consecutive memory blocks of system memory may be the same predefined size, such as 64 bytes (B) or 32B. In another example, contiguous memory blocks of system memory may be of different sizes.

When a portion of the processor's memory may be allocated to a newly created memory object, a unique Identifier (ID) may be generated and associated with one or more contiguous memory blocks that may store data written to the memory object. The unique identifier of the contiguous memory block may be an MCD unique identifier or an MCD color designation. For example, a contiguous block of memory allocated to a memory object may be assigned an MCD color value, such as a 6-bit data value.

The MCD unique identifiers of the different memory objects may be stored in one or more MCD table entries of the MCD table, the one or more MCD table entries corresponding to contiguous memory blocks allocated to the memory objects. The MCD unique identifier may also be stored in one or more bits (e.g., upper bits) of a pointer, which may be returned by the memory allocation routine to the application that requested the memory allocation. When the processor receives the memory access instruction, the processor may compare the MCD unique identifier retrieved from the MCD table with the MCD unique identifier extracted from the pointer specified by the memory access instruction. When the MCD unique identifiers do not match, an error may be generated.

Fig. 1 illustrates an MCD system 100 according to one embodiment. MCD system 100 may include a pointer 102 and a system memory 104. The pointer 102 may include an MCD unique ID field or MCD color value field 110 and a memory address field. For example, the pointer 106 may include an MCD unique ID 110 and a memory address 112, and the pointer 108 may include an MCD unique ID 114 and a memory address 118. The MCD

unique IDs

110 and 114 may be stored in one or more bits (such as upper bits, which may not be part of a linear address) of the pointers 106 and 108, respectively. The memory addresses 112 and 118 may reference the starting address locations of the memory objects 138 and 140 in the system memory 104. For example, memory address 112 may reference an address location to contiguous memory block 128, and memory address 118 may reference an address location to contiguous memory block 132. Memory objects 138 and 140 may comprise one or more contiguous memory blocks. For example, memory object 138 may include contiguous memory blocks 128 and 130, and memory object 140 may include contiguous memory blocks 132, 134, and 136. While a portion of system memory 104 may be allocated to newly created

memory objects

138 and 140 for memory object data 122 and 126, a memory allocation routine (e.g., by a calloc routine, a malloc routine, or a realloc routine) may generate MCD

unique IDs

120 and 124 to be associated with contiguous memory blocks 128 and 130, respectively, 136.

MCD system 100 may receive memory access instructions from an application requesting object data for a contiguous block of memory. For example, MCD system 100 may receive a memory access instruction, where the memory access instruction includes pointer 106 having memory address 112 indicating a starting location of object data 122 at contiguous memory block 128. When executing the memory access instruction, the MCD system 100 can compare the MCD unique ID 110 of the pointer 106 with the MCD unique ID 120 associated with the contiguous memory block 128. When the MCD unique ID 110 matches the MCD unique ID 120, the MCD system 100 can pass the object data 122 to the requesting application. The MCD system 100 can iterate through the contiguous memory blocks 128 and 130 of the memory object 138 until the MCD system 100 reaches the contiguous memory block 132. When the MCD unique ID 124 does not match the MCD unique ID 110, the MCD system 100 can determine that it has reached the end of the contiguous memory blocks 128 and 130. When the MCD unique ID 124 does not match the MCD unique ID 110, the MCD system 100 may generate an error message (such as an exception) indicating that the end of the memory object 138 has been reached.

Fig. 2 illustrates an architecture of an MCD system 200 having a system memory 104 and an MCD table 202, according to one embodiment. The MCD system 200 includes a pointer 106, the pointer 106 including an MCD unique ID 110 and a memory address 112 referencing a memory object 138. The memory object 138 may include contiguous memory blocks 122A-122N. The MCD table 202 may include MCD unique IDs 120A-120N and MCD boundary values 121A-121N associated with contiguous memory blocks 122A-122N, respectively. The MCD unique IDs 120A-120N and MCD boundary values 121A-121N may be stored at offsets derived from base addresses of corresponding contiguous memory blocks 122A-122N.

Fig. 3 depicts a flow diagram of a method 300 for associating one or more memory blocks of a memory object with memory allocation information of an MCD metadata word. The method 300 may be performed by a computer system that may comprise hardware (e.g., circuitry, dedicated logic, and/or programmable logic), software (e.g., instructions executable on a computer system to perform hardware simulation), or a combination thereof. The method 300 and/or each of its functions, routines, subroutines, or operations may be performed by one or more physical processors of a computer system executing the method. Two or more functions, routines, subroutines, or operations of method 300 may be performed in parallel or in a different order than described above. In some implementations, the method 300 may be performed by a single processing thread. Alternatively, the method 300 may be performed by two or more processing threads, each thread performing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 300 may be synchronized (e.g., using semaphores, critical sections, and/or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 300 may be executed asynchronously with respect to each other.

Referring to fig. 3, a method 300 may begin with a processor or a software library executed by a processor, such as a runtime library, receiving a memory access request from an application to access data of one or more contiguous memory blocks in a memory object of a memory (block 310). A memory object may be a contiguous portion of memory (such as a contiguous block of memory) that includes one or more memory blocks. In one example, a processor or library may receive an allocation request (e.g., an initial allocation request for memory from an application) when the application starts or may be started. In another example, a processor or software library may receive an allocation request (e.g., a subsequent allocation request for memory from an application) when the application may be running.

The method may further include determining, by the processor or software library, a size of the memory object requested by the allocation request, such as a number of bytes (N bytes) of memory (block 320). In one example, the memory may be partitioned into fixed block-size memory (e.g., contiguous memory blocks). In the following paragraphs, it may be assumed for exemplary purposes that the memory block may be 64 bytes (B) of contiguous memory. However, the memory block size of 64B is not intended to be limiting, and the memory block size may be any size set by the processor or allocation bank of the MCD system.

In one example, the software library may determine the size of the requested memory based on allocation size information included in the allocation request. The method may include allocating, by the processor or the software library, one or more contiguous memory blocks for the memory object in view of the size of the requested memory object (block 330). The method may also include writing the MCD metadata word to an MCD table (step 340). In one example, for allocation sizes larger than an MCD block (e.g., 64B), an MCD metadata word may be written into the MCD table for each block. The MCD metadata word may include: an MCD unique ID and an MCD boundary value associated with one or more contiguous memory blocks. The MCD boundary value may indicate the size of a used (legal) memory region or an unused (illegal) memory region of the memory block. The method may include determining a location of the MCD boundary based on the MCD boundary value, as discussed in the following paragraphs. At one isIn an example, each of the memory blocks may be 64B, e.g., consecutive bytes from 1 to 64 bytes. For a 64B memory block, the MCD metadata word may include 6 bits (B) for the MCD boundary value. In another example, the MCD boundary value may be log₂(Total memory Block size) bits to describe the size of a legal or illegal memory region. In this example, the MCD metadata word may be a plurality of MCD unique ID bits plus a plurality of MCD boundary value bits, e.g., MCD unique ID bit + MCD boundary value bit. The method may also include creating a pointer (block 350). In one example, the pointer may include a memory address indicating a location of the memory object in memory. In another example, the pointer may include a second MCD unique ID associated with the memory object. The method may also include sending the pointer to the application (block 360).

Fig. 4A illustrates an MCD metadata word 406 associated with a memory block 416, according to one embodiment. In one exemplary embodiment, the memory block 416 may be 64B (e.g., 0B-63B) and the MCD metadata word may be 1B in size. The memory block 416 may include a legal access portion 412 of memory that may be a first length 418 and an illegal access portion 414 of memory that may be a second length 420, where the first length 418 plus the second length 420 may be 64 bytes in length. The MCD metadata word 406 may include an MCD unique ID 402, which may be a third length 408, and an MCD boundary value 404, which may be a fourth length 410. In this example, the third length 408 may be 2b in length (stored in bits 0-1 of the MCD metadata word 406) and the fourth length 410 may be 6b in length (stored in bits 2-7 of the MCD metadata word 406).

In one embodiment, the MCD boundary value 404 may indicate a size of the illegitimate access portion 414 of the memory. In one example, the processor or software library may determine the location of the MCD boundary between the lawful access portion of memory 412 and the illegitimate access portion of memory 414 by subtracting the size of the illegitimate access portion of memory 414 from the last byte of the memory block. For example, to determine the MCD boundary for a 64-byte memory block, the processor or software library may subtract the MCD boundary value 404 from the last byte (e.g., byte 63) in the 64-byte memory block. For example, when the MCD boundary value 404 for a memory block 416 that may be 64 bytes is 24 bytes, the legal access portion 412 of memory is 40 bytes and the illegal access portion 414 of memory is 24 bytes (e.g., for a total 64 byte memory block). In another example, when the MCD boundary value 404 is 0 bytes, the legal access portion 412 of memory is 64 bytes and the illegal access portion 414 of memory is 0 bytes (e.g., for a total of 64 bytes).

In another embodiment, the MCD boundary value 404 may indicate the size of the legally accessed portion 412 of memory. In one example, the processor or software library may determine the location of the MCD boundary between the legitimate-access portion 412 of memory and the illegitimate-access portion 414 of memory by adding the size of the legitimate-access portion 414 of memory from the first byte of the memory block. For example, the processor or software library may determine that the MCD boundary value 404 may be added to the first byte in a byte 0, 64 byte memory block. For example, when the MCD boundary value 404 for a block of memory 416 that may be 64 bytes is 20 bytes, the legal access portion 412 of the memory is 20 bytes and the illegal access portion 414 of the memory is 44 bytes (e.g., for a total of 64 bytes). In another example, when the MCD boundary value 404 is 1b, the legal access portion 412 of memory is 1 byte and the illegal access portion 414 of memory is 63 bytes (e.g., for a total of 64 bytes). The memory block size and/or MCD metadata word size are not intended to be limiting, and the memory block size and/or MCD metadata word size may be any size set by the processor or software library of the MCD system.

Fig. 4B illustrates another MCD metadata word 430 associated with the memory block 416, according to one embodiment. In one exemplary embodiment, the memory block 416 may be 64 bytes (e.g., from byte 0 to byte 63) and the MCD metadata word 430 may be 2 bytes in size. The memory block 416 may include a legal access portion 412 of memory that may be a first length 418 and an illegal access portion 414 of memory that may be a second length 420, where the first length 418 plus the second length 420 may be 64 bytes in length. MCD metadata word 406 may include: an MCD unique ID 402 that may be a third length 408; an MCD boundary value 404 that may be a fourth length 410; a first reserved bit 422, which may be of a fifth length 424, wherein the first reserved bit 422 may be located between the MCD unique ID 402 and the MCD boundary value 404; and a second reserved bit 426, which may be a sixth length 426, wherein the second reserved bit 426 may be located after the MCD boundary value 404. In this example, the MCD word 430 may be 16 bits in length, where the third length 408 may be 6 bits in length (located in bits 0-5), the fifth length 424 may be 2 bits in length (located in bits 6-7), the fourth length 410 may be 6 bits in length (located in bits 8-13), and the sixth length 428 may be 2 bits in length (located in bits 14-15). The total number of bits used in the MCD metadata word 430 is the size of the MCD unique ID 402 (length 408) plus the size of the MCD boundary value 404 (length 410). In this diagram, the sum of the lengths is 12 bits in total. When the standard computer memory access size is a single byte (8 bits), the 12 used bits may be filled with reserved bits (e.g., 4 reserved bits). For example, the reserved bits may be reserved bits 422 having a length 424 and reserved bits 426 having a length 428. The arrangement and size of the bits of the MCD metadata word 430 are not intended to be limiting, and the reserved bits may be any size set by the processor or an allocation bank of the MCD system.

In one embodiment, the MCD boundary value 404 may indicate a size of the illegitimate access portion 414 of the memory. In one example, the processor or distribution library may determine the location of the MCD boundary between the lawful access portion of memory 412 and the illegitimate access portion of memory 414 by subtracting the size of the illegitimate access portion of memory 414 from the last byte of the memory block 416. For example, the processor or allocation bank may determine the location of the MCD boundary by subtracting the MCD boundary value from the last byte (byte 63) in the 64-byte memory block 416. In another embodiment, the MCD boundary value 404 may indicate the size of the legally accessed portion 412 of memory. In one example, the processor or the allocation bank may determine the location of the MCD boundary between the legitimate-access portion 412 of memory and the illegitimate-access portion 414 of memory by increasing the size of the legitimate-access portion 414 of memory to the first byte of the memory block. For example, the processor or allocation bank may determine the location of the MCD boundary by adding the boundary from the first byte (byte 0) in the 64-byte memory block 416. The memory block size and/or MCD metadata word size are not intended to be limiting, and the memory block size and/or MCD metadata word size may be any size set by the processor or software library of the MCD system.

FIG. 5A illustrates a first memory block 540 and a second memory block 562 allocated to a memory object 560 according to one embodiment. In an exemplary embodiment, an application may request 100b of a memory object 560, where the memory object 560 may include a first memory block 540 and a second memory block 562. In this embodiment, each of first memory block 540 and second memory block 562 may be 64 bytes. The first MCD metadata word 530 may be associated with a first memory block 540 and the second MCD metadata word 562 may be associated with a second memory block 562.

When a processor or allocation bank allocates 100 bytes for the memory object 560, the processor or allocation bank may allocate all of the memory (64 bytes) of the first memory block 540 for storing data. The first MCD metadata word 530 associated with the first memory block 540 may have a first MCD boundary value 504 of 0 indicating that all bytes (64 bytes) of the first memory block 540 may be legal (e.g., available). The processor or software library may allocate 36 bytes of memory of the second memory block 562 for storing data (64B +36B ═ 100 bytes in total). A second MCD metadata word 550 associated with a second memory block 562 may have a second MCD boundary value 546 of 28. A second MCD boundary value 546 of 28 may indicate that the first 36 consecutive bytes of memory block 562 may be legal and the last 28 consecutive bytes may be illegal. The memory block size and/or MCD metadata word size are not intended to be limiting, and the memory block size and/or MCD metadata word size may be any size set by the processor or software library of the MCD system.

FIG. 5B depicts a flowchart of a method 500 for checking out of range memory accesses in memory, according to one embodiment. Method 500 may be performed by a computer system that may comprise hardware (e.g., circuitry, dedicated logic, and/or programmable logic), software (e.g., instructions executable on a computer system to perform hardware simulation), or a combination thereof. The method 500 and/or each of its functions, routines, subroutines, or operations may be performed by one or more physical processors of a computer system executing the method. Two or more functions, routines, subroutines, or operations of method 500 may be performed in parallel or in a different order than described above. In some implementations, the method 300 may be performed by a single processing thread. Alternatively, the method 500 may be performed by two or more processing threads, each thread performing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 500 may be synchronized (e.g., using semaphores, critical sections, and/or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 500 may be executed asynchronously with respect to each other.

Referring to FIG. 5B, the method 500 may begin with the processor or software library receiving a memory access request from an application to access data of a memory object having contiguous memory blocks (block 570). The memory access request may include a pointer indicating a location of a start of a memory object that the application may request access to. The pointer may also include a first MCD unique ID of the memory object. The method may include the processor or software library retrieving data (e.g., MCD metadata) stored in the contiguous memory block based on the location indicated by the pointer (block 572). The method may include retrieving allocation information associated with the contiguous memory block from an MCD table (block 574). The allocation information may include a second MCD unique identifier associated with the contiguous memory block and an MCD boundary value indicating a size of the first memory region of the contiguous memory block. The method may include the processor or software library comparing the first MCD unique ID to the second MCD unique ID to determine when the retrieved data is from the memory object indicated by the pointer (block 576). The method may include the processor or software library determining when the retrieved data is from an available region of memory based on the allocation information (block 578). The method may include the processor or software library sending an error message to the application based on the allocation information when an error event associated with the retrieved data occurs (block 580). The error events may include: a mismatch between the first MCD unique ID and the second MCD unique ID or an access to an illegal region within a memory block of the memory object.

Each memory block of the memory may have a different MCD unique ID. In one example, an application may perform a memory block access underflow (e.g., an out-of-range access prior to the start of the current memory block) when storing data in memory or reading data in memory. When an underflow memory block is applied, the processor or software library may detect an underflow when the first MCD unique ID of the current memory block mismatches the second MCD unique ID of the previous memory block.

In another example, an application may perform a memory block access overflow (e.g., an out-of-range access after the current memory block) when storing data in memory or reading data in memory. When an overflow memory block is applied, the processor or software library may detect the overflow. In one example, when an application overflows into an illegitimate access region of a current memory block, the processor or software library may determine that an access is in the illegitimate access region based on an MCD boundary value (as discussed in the preceding paragraph) associated with a last memory block of the memory object. For example, in FIG. 5A, the processor or software library may use MCD boundary value 546 to determine that the illegitimate access area is between bytes 35 and 63. In this example, when an application attempts to access any of the bytes between bytes 35 and 63, the processor or software library may determine that the application has overflowed into illegal access region 534.

In another example, when an application attempts to access a byte beyond the current memory object, the processor or software library may determine that the application has overflowed into the next memory object based on a mismatch between the first MCD unique ID of the current memory block and the third MCD unique ID of the next memory block. For example, in fig. 5A, when an application attempts to access memory beyond 63B of memory block 562, the processor or software library may determine that the application has overflowed because the first MCD unique ID 542 of memory block 562 may not match the third MCD unique ID of the next memory block. As discussed in the previous paragraph, an advantage of the processor or software library detecting an underflow or overflow is that the processor or software library is enabled to detect underflows or overflows as small as 1 byte.

In one example, when the error message indicates that an illegal region has been accessed, e.g., to a blank region in memory (such as a region of all zeros), the access may enable software to access any of the specification pointers. For example, the non-MCD pointer may access any data in memory when a blank or empty region in memory may be accessed. An advantage of the non-MCD pointer accessing any data in memory may be to avoid changing behavior of legacy programs (e.g., programs not configured for the MCD architecture).

FIG. 6A is a block diagram illustrating a microarchitecture of a processor 600 that implements secure memory repartitioning, according to one embodiment. In particular, processor 600 depicts an in-order architecture core and register renaming logic, out-of-order issue/execution logic included in a processor in accordance with at least one embodiment of the present disclosure. Embodiments of page addition and content copying may be implemented in the processor 600.

The processor 600 comprises a front end unit 630 coupled to an execution engine unit 650, and both coupled to a memory unit 670. Processor 600 may include Reduced Instruction Set Computing (RISC) cores, Complex Instruction Set Computing (CISC) cores, Very Long Instruction Word (VLIW) cores, or hybrid or alternative core types. As yet another option, processor 600 may include dedicated cores, such as network or communication cores, compression engines, graphics cores, and so forth. In one embodiment, processor 600 may be a multi-core processor or may be part of a multi-processor system.

The front end unit 630 includes a branch prediction unit 632 coupled to an instruction cache unit 634, the instruction cache unit 634 coupled to an instruction Translation Lookaside Buffer (TLB)636, the instruction translation lookaside buffer 636 coupled to an instruction fetch unit 638, the instruction fetch unit 638 coupled to a decode unit 660. Decode unit 660 (also referred to as a decoder) may decode the instructions and generate as output one or more micro-operations, micro-code entry points, micro-instructions, other instructions, or other control signals decoded from or otherwise reflective of the original instructions. Decoder 660 may be implemented using a variety of different mechanisms. Examples of suitable mechanisms include, but are not limited to: look-up tables, hardware implementations, Programmable Logic Arrays (PLAs), microcode read-only memories (ROMs), and the like. Instruction cache unit 634 is further coupled to memory unit 670. Decode unit 660 is coupled to rename/allocator unit 652 in execution engine unit 650.

Execution engine unit 650 includes a rename/allocator unit 652 coupled to a retirement unit 654, and a set of one or more scheduler units 656. Scheduler unit 656 represents any number of different schedulers, including a Reservation Station (RS), a central instruction window, etc. Scheduler unit 656 is coupled to physical register file unit 658. Each physical register file unit 658 represents one or more physical register files, where different physical register files store one or more different data types (such as scalar integer, scalar floating point, packed integer, packed floating point, vector integer, vector floating point, etc.), states (such as an instruction pointer that is the address of the next instruction to be executed), and so forth. Physical register file unit 658 overlaps retirement unit 654 to illustrate the various ways in which register renaming and out-of-order execution may be implemented (e.g., using reorder buffers and retirement register files; using future files, history buffers, and retirement register files; using register maps and register pools, etc.).

Typically, architectural registers are visible from outside the processor or from the programmer's perspective. These registers are not limited to any particular circuit type known. Various different types of registers are applicable as long as they are capable of storing and providing the data described herein. Examples of suitable registers include, but are not limited to: dedicated physical registers, dynamically allocated physical registers using register renaming, a combination of dedicated physical registers and dynamically allocated physical registers, and the like. Retirement unit 654 and physical register file unit 658 are coupled to execution clusters 660. The execution cluster 660 includes a set of one or more execution units 662 and a set of one or more memory access units 664. The execution units 662 may perform various operations (e.g., shifts, additions, subtractions, multiplications) and operate on various types of data (e.g., scalar floating point, packed integer, packed floating point, vector integer, vector floating point).

While some embodiments may include several execution units dedicated to a particular function or group of functions, other embodiments may include only one execution unit or multiple execution units all performing all functions. The scheduler unit 656, physical register file unit 658, and execution cluster 660 are shown as possibly plural in that certain embodiments create separate pipelines for certain types of data/operations (e.g., a scalar integer pipeline, a scalar floating point/packed integer/packed floating point/vector integer/vector floating point pipeline, and/or a memory access pipeline, each having its own scheduler unit, physical register file unit, and/or execution cluster-and in the case of a separate memory access pipeline, certain embodiments are implemented in which only the execution cluster of this pipeline has a memory access unit 664). It should also be understood that where separate pipelines are used, one or more of these pipelines may be issued/executed out-of-order, while the rest are in-order.

The set of memory access units 664 are coupled to memory units 670, and memory units 670 may include a data prefetcher 680, a data TLB unit 672, a Data Cache Unit (DCU)674, and a level 2 (L2) cache unit 676, to name a few examples. In some embodiments, DCU 674 is also referred to as a first level data cache (L1 cache). The DCU 674 can handle multiple outstanding cache misses and continue to service incoming stores and loads. It also supports maintaining cache coherency. The data TLB unit 672 is a cache used to improve virtual address translation speed by mapping virtual and physical address spaces. In one exemplary embodiment, the memory access unit 664 may include a load unit, a store address unit, and a store data unit, each of which is coupled to the data TLB unit 672 in the memory unit 670. L2 cache element 676 may be coupled to one or more other levels of cache and ultimately to main memory.

In one embodiment, data prefetcher 680 speculatively loads/prefetches data to DCU 674 by automatically predicting which data is to be consumed by a program. A prefetch may indicate that data stored in one memory location (e.g., location) of a memory hierarchy (e.g., lower level cache or memory) is to be translated to a higher level memory location that is closer (e.g., produces less access latency) to the processor before the data is actually needed by the processor. More specifically, the prefetch may indicate an early retrieval of data from one of the lower level caches/memories to the data cache and/or prefetch buffer before the processor issues a demand for the particular data being returned.

The processor 600 may support one or more instruction sets, such as the x86 instruction set (with some extensions added with newer versions), the MIPS instruction set of MIPS technologies corporation of sony wiler, california, the ARM instruction set of ARM holdings corporation of sony wil, california (with optional additional extensions, such as NEON)).

It should be appreciated that a core may support multithreading (performing two or more parallel operations or sets of threads), and that multithreading may be accomplished in a variety of ways, including time-division multithreading, simultaneous multithreading (where a single physical core provides a logical core for each of multiple threads for which a physical core is simultaneously multithreading), or a combination thereof (e.g., time-division fetching and decoding and thereafter such as with

Simultaneous multithreading for hyper threading).

Although register renaming is described in the context of out-of-order execution, it should be understood that register renaming may be used in an in-order architecture. While the illustrated embodiment of the processor also includes separate instruction and data cache units and a shared L2 cache unit, alternative embodiments may have a single internal cache for both instructions and data, such as, for example, a level one (L1) internal cache or multiple levels of internal cache. In some embodiments, a system may include a combination of internal caches and external caches that are external to the core and/or processor. Alternatively, all of the cache may be external to the core and/or the processor.

FIG. 6B is a block diagram illustrating an in-order pipeline and a register renaming stage, out-of-order issue/execution pipeline implemented by processor 600 of FIG. 6A according to some embodiments of the invention. The solid line boxes in FIG. 6B show an in-order pipeline, while the dashed line boxes show a register renaming, out-of-order issue/execution pipeline. In FIG. 6B, the processor pipeline 600 includes a fetch stage 602, a length decode stage 604, a decode stage 606, an allocation stage 608, a rename stage 610, a scheduling (also known as dispatch or issue) stage 612, a register read/memory read stage 614, an execute stage 616, a write back/memory write stage 618, an exception handling stage 622, and a commit stage 624. In some embodiments, the order of stages 602-624 may be different than that shown and is not limited to the particular order shown in FIG. 6B.

FIG. 7 illustrates a block diagram of a microarchitecture of a processor 700 that includes logic circuitry to perform secure memory repartitioning, according to one embodiment. In some embodiments, an instruction according to one embodiment may be implemented to operate on data elements having a byte size, word size, double word size, quad word size, etc., and having a number of data types (e.g., single and double precision integer and floating point data types). In one embodiment, in-order front end 701 is the portion of processor 700 that fetches instructions to be executed and prepares them for later use in the processor pipeline. Embodiments of page addition and content copying may be implemented in processor 700.

The front end 701 may include several units. In one embodiment, the instruction prefetcher 716 fetches instructions from memory and feeds these instructions to the instruction decoder 718, which in turn decodes or interprets the instructions. For example, in one embodiment, the decoder decodes a received instruction into one or more operations called "microinstructions" or "micro-operations" (also called micro-ops or uops) that the machine can execute. In other embodiments, the decoder parses the instruction into an opcode and corresponding data and control fields that may be used by the micro-architecture to perform operations in accordance with one embodiment. In one embodiment, the trace cache 730 takes decoded uops and combines them into program ordered sequences or traces in the uop queue 734 for execution. When the trace cache 730 encounters a complex instruction, the microcode ROM 732 provides the uops needed to complete the operation.

Some instructions are converted into a single micro-op, while others require several micro-ops to complete the complete operation. In one embodiment, if more than four micro-ops are needed to complete an instruction, the decoder 718 accesses the microcode ROM 732 to execute the instruction. For one embodiment, instructions may be decoded into a small number of micro-ops for processing at the instruction decoder 718. In another embodiment, instructions may be stored in the microcode ROM 732 if many micro-ops are needed to complete an operation. The trace cache 730 references a entry point Programmable Logic Array (PLA) to determine the correct micro-instruction pointer to read a micro-code sequence from the micro-code ROM 732 to complete one or more instructions according to one embodiment. After the microcode ROM 732 finishes serializing the micro-ops for an instruction, the front end 701 of the machine resumes fetching micro-ops from the trace cache 730.

Out-of-order execution engine 703 is where instructions are prepared for execution. The out-of-order execution logic has several buffers to smooth and reorder the instruction streams to optimize performance as they flow down the pipeline and are scheduled for execution. The allocator logic allocates the machine buffers and resources required for each uop to execute. The register renaming logic renames the logical registers to entries in a register file. The allocator also allocates an entry for each uop in one of two uop queues (one for memory operations and the other for non-memory operations) before the instruction schedulers (memory scheduler, fast scheduler 702, slow/general floating point scheduler 704 and simple floating point scheduler 706). The uop schedulers 702, 704, 706 determine when the uops are ready for execution based on the readiness of their dependent input register operand sources and the availability of execution resources required for the uops to complete their operations. Fast scheduler 702 for one embodiment may schedule on each half of the main clock cycle while the other schedulers may only schedule once per main processor clock cycle. The scheduler arbitrates for the dispatch ports to schedule the uops for execution.

In execution block 711, register files 708 and 710 are located between schedulers 702, 704, and 706 and

execution units

712, 714, 716, 718, 710, 712, and 714. Separate register files 708, 710 are also present for integer and floating point operations, respectively. Each register file 708, 710 in one embodiment also includes a bypass network that can bypass just completed results that have not yet been written to the register file or forward these results to new dependent uops. The integer register file 708 and floating point register file 710 are also capable of transferring data to each other. For one embodiment, integer register file 708 is divided into two separate register files, one register file for the low order 32 bits of data and a second register file for the high order 32 bits of data. The floating point register file 710 in one embodiment has 128 bit wide entries because floating point instructions typically have operands from 64 to 128 bits wide.

The execution block 711 includes

execution units

712, 714, 716, 718, 710, 712, and 714 in which instructions are actually executed. This block includes register files 708 and 710 that store the integer and floating point data operand values required for execution of the micro instructions. Processor 700 for one embodiment includes a number of execution units: an Address Generation Unit (AGU)712, AGU 714, fast ALU 716, fast ALU 718, slow ALU 710, floating point ALU 712, floating point move unit 714. For one embodiment, the floating point execution blocks 712 and 714 execute floating point, MMX, SIMD, SSE, or other operations. The floating-point ALU 712 in one embodiment includes a 64 bit by 64 bit floating-point divider to perform divide, square root, and remainder micro-ops. For embodiments of the present disclosure, floating point hardware may be utilized to process instructions involving floating point values.

In one embodiment, the ALU operations go to high-speed ALU execution units 716 and 718. Fast ALUs 716 and 718 in one embodiment may perform fast operations with an effective latency of half a clock cycle. For one embodiment, most complex integer operations go to the slow ALU 710 because the slow ALU 710 includes integer execution hardware for long latency type operations, such as multipliers, shifters, flag logic, and branch processing equipment. Memory load/store operations are performed by

AGUs

712 and 714. For one embodiment, integer ALUs 716, 718, and 710 are described in the context of performing integer operations on 64-bit data operands. In alternative embodiments, the ALUs 716, 718, 710 may be implemented to support a variety of data bits, including 16, 32, 128, 256, and so on. Similarly, floating

point units

712, 714 may be implemented to support a range of operands having bits of various widths. For one embodiment, floating

point units

712, 714 may operate on 128-bit wide packed data operands in conjunction with SIMD and multimedia instructions.

In one embodiment, uop schedulers 702, 704, and 706 dispatch dependent operations before the parent load completes execution. Since uops are speculatively scheduled and executed in processor 700, processor 700 also includes logic to handle memory misses. If the data load misses in the data cache, there may be a dependent operation in the pipeline that leaves the temporarily incorrect data to the scheduler's run. The replay mechanism tracks and re-executes instructions that use incorrect data. Only dependent operations need to be replayed while allowing independent operations to complete. The scheduler and replay mechanism of one embodiment of the processor may also be designed to capture a sequence of instructions for a text string comparison operation.

According to one embodiment, processor 700 also includes logic to implement secure memory repartitioning. In one embodiment, execution block 711 of processor 700 may include MCU 115 to perform secure memory repartitioning as described herein.

The term "register" may refer to an on-board (on-board) processor memory location that is used as part of an instruction to identify an operand. In other words, registers may be those available from outside the processor (from the programmer's perspective). However, the registers in an embodiment should not be limited to meaning a particular type of circuit. Rather, the registers in an embodiment are capable of storing and providing data, and of performing the functions described herein. The registers described herein may be implemented by circuitry in a processor using any number of different techniques, such as dedicated physical registers, dynamically allocated physical registers using register renaming, a combination of dedicated and dynamically allocated physical registers, and so forth. In one embodiment, the integer register stores 32 bits of integer data. The register file of one embodiment also includes eight multimedia SIMD registers for packed data.

For the purposes of the discussion herein, registers should be understood to be data registers designed to hold packed data, such as a 64-bit wide MMX from an MMX technology enabled microprocessor from Intel corporation of Santa Clara, Calif^TMA register (also referred to as a "mm" register in some examples). These MMX registers (available in both integer and floating point forms) are operable with packed data elements that accompany SIMD and SSE instructions. Similarly, 128-bit wide XMM registers relating to SSE2, SSE3, SSE4, or other (collectively "SSEx") technologies may also be used to hold such packed data operands. In one embodiment, the registers do not need to distinguish between the two data types when storing packed data and integer data. In one embodiment, the integer and floating point may be included in the same register file, or in different register files. Furthermore, in one embodiment, the floating point and integer data may be stored in different registers or in the same register.

Embodiments may be implemented in many different system types. Referring now to FIG. 8, shown is a block diagram of a multiprocessor system 800 in accordance with one implementation. As shown in FIG. 8, multiprocessor system 800 is a point-to-point interconnect system, and includes a first processor 870 and a second processor 880 coupled via a point-to-point interconnect 850. As shown in fig. 8, each of processors 870 and 880 may be multicore processors, including first and second processor cores (i.e., processor cores 874a and 874b and processor cores 884a and 884b), although potentially many more cores may be present in the processors. The processors may each include hybrid write mode logic in accordance with embodiments of the present disclosure. Embodiments of page addition and content copying may be implemented in processor 870, processor 880, or both.

Although shown with two processors 870, 880, it is to be understood that the scope of the present disclosure is not so limited. In other implementations, one or more additional processors may be present in a given processor.

Processors 870 and 880 are shown including an integrated

memory controller unit

882 and 882, respectively. Processor 870 also includes as part of its bus controller units point-to-point (P-P) interfaces 876 and 888; similarly, second processor 880 includes

P-P interfaces

886 and 888. Processors 870, 880 may exchange information via a point-to-point (P-P) interface 850 using

P-P interface circuitry

888, 888. As shown in FIG. 8,

IMCs

882 and 882 couple the processors to respective memories, namely a memory 832 and a memory 834, which may be portions of main memory locally attached to the respective processors.

Processors 870, 880 may each exchange information with a chipset 890 via individual

P-P interfaces

852, 854 using point to point

interface circuits

876, 894, 886, 898. Chipset 890 may also exchange information with a high-performance graphics circuit 838 via a high-performance graphics interface 839.

A shared cache (not shown) may be included within either processor, or outside of both processors but connected to the processors via a P-P interconnect, such that if a processor is placed in a low power mode, local cache information for either or both processors may be stored in the shared cache.

Chipset 890 may be coupled to a first bus 816 via an interface 896. In one embodiment, first bus 816 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI express bus or another third generation IO interconnect bus, although the scope of the present disclosure is not so limited.

As shown in FIG. 8, various IO devices 814 may be coupled to first bus 816, along with a bus bridge 818, which couples first bus 816 to a second bus 820. In one embodiment, second bus 820 may be a Low Pin Count (LPC) bus. In one embodiment, various devices may be coupled to second bus 820 including, for example, a keyboard and/or mouse 822, communication devices 827 and a storage unit 828 (such as a disk drive or other mass storage device which may include instructions/code and data 830). Further, an audio IO 824 may be coupled to second bus 820. Note that other architectures are possible. For example, instead of the point-to-point architecture of fig. 8, a system may implement a multi-drop bus or other such architecture.

Referring now to fig. 9, shown is a block diagram of a third system 900 in accordance with an embodiment of the present invention. Like elements in fig. 8 and 9 bear like reference numerals, and certain aspects of fig. 8 have been omitted from fig. 9 to avoid obscuring other aspects of fig. 9.

Fig. 9 shows that processors 970, 980 may include integrated memory and IO control logic ("CL") 972 and 982, respectively. For at least one embodiment, the CL 972, 982 may include an integrated memory controller unit such as described herein. Additionally, the CL 972, 982 may also include IO control logic. Fig. 9 shows that memories 932, 934 are coupled to CLs 972, 982, and that IO device 914 is also coupled to control logic 972, 982. Legacy IO devices 915 are coupled to the chipset 990. Embodiments of page addition and content copying may be implemented in processor 970, processor 980, or both.

Fig. 10 is an exemplary system on a chip (SoC) that may include one or more of the cores 1002. Other system designs and configurations known in the art for laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, network appliances, hubs, switches, embedded processors, Digital Signal Processors (DSPs), graphics devices, video game devices, set-top boxes, microcontrollers, cell phones, portable media players, handheld devices, and various other electronic devices are also suitable. In general, a number of systems or electronic devices capable of containing the processors and/or other execution logic disclosed herein are generally suitable.

Referring now to fig. 10, shown is a block diagram of a SoC 1000 in accordance with one embodiment of the present disclosure. In addition, the dashed box is a feature on more advanced socs. In fig. 10, interconnect unit 1002 is coupled to: an application processor 1010 including a set of one or more cores 1002A-N and a shared cache unit 1006; a system agent unit 1010; a bus controller unit 1016; an integrated memory controller unit 1014; a set or one or more media processors 1020 which may include integrated graphics logic 1008, an image processor 1024 to provide still and/or video camera functionality, an audio processor 1026 to provide hardware audio acceleration, and a video processor 1028 to provide video encoding/decoding acceleration; an Static Random Access Memory (SRAM) unit 1030; a Direct Memory Access (DMA) unit 1032; and a display unit 1040 for coupling to one or more external displays. Embodiments of page addition and content copying may be implemented in SoC 1000.

Referring next to FIG. 11, an embodiment of a System On Chip (SOC) design is depicted, in accordance with embodiments of the present invention. As one illustrative example, SoC 1100 is included in a User Equipment (UE). In one embodiment, UE refers to any device used by an end user for communication, such as a handheld phone, a smart phone, a tablet, an ultra-thin notebook, a notebook with a broadband adapter, or any other similar communication device. The UE may be connected to a base station or node, which may essentially correspond to a Mobile Station (MS) in a GSM network. Embodiments of page addition and content copying may be implemented in SoC 1100.

Here, SOC 1100 includes 2 cores — 1106 and 1107. Similar to the discussion above,

cores

1106 and 1107 may conform to an instruction set architecture, such as based on

Architecture Core^TMA micro semi-conductor corporation (AMD) processor, a MIPS-based processor, a microprocessor, a memory, a microprocessor, a,ARM-based processor designs, or their customers, and their licensees or adopters.

Cores

1106 and 1107 are coupled to cache control 1108 associated with bus interface unit 1109 and L2 cache 1110 to communicate with other portions of system 1100. Interconnect 1111 includes on-chip interconnects, such as IOSF, AMBA, or other interconnects discussed above, which may implement one or more aspects of the described disclosure.

The interconnect 1111 provides a communication channel to other components, such as a SIM 1130 that interfaces with a Subscriber Identity Module (SIM) card, a boot ROM 1135 that holds boot code for execution by the

cores

1106 and 1107 to initialize and boot the SOC 1100, an SDRAM controller 1140 that interfaces with external memory (e.g., DRAM 1160), a flash controller 1145 that interfaces with non-volatile memory (e.g., flash memory 1165), a peripheral control device 1150 (e.g., serial peripheral interface) that interfaces with peripherals, a video codec 1120 and video interface 1125 for displaying and receiving input (e.g., touch-enabled input), a GPU 1115 for performing graphics-related computations, and so forth. Any of these interfaces may include aspects of the embodiments described herein.

In addition, the system shows peripherals used for communication, such as a bluetooth module 1170, a 3G modem 1175, a GPS 1180, and Wi-Fi 1185. Note that as described above, the UE includes a radio for communication. Therefore, these peripheral communication modules may not be all included. However, in a UE some form of wireless means for external communication should be included.

Fig. 12 shows a diagrammatic representation of a machine in the exemplary form of a computer system 1200, within which computer system 1200 a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative implementations, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the internet. The machine may operate in the capacity of a server or a client machine in client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

The exemplary computer system 1200 includes a processing device (processor) 1202, a main memory 1204 (e.g., Read Only Memory (ROM), flash memory, Dynamic Random Access Memory (DRAM), such as synchronous DRAM (sdram) or Rambus DRAM (RDRAM), etc.), a static memory 1206 (e.g., flash memory, Static Random Access Memory (SRAM), etc.), and a data storage device 1218, which communicate with each other via a bus 1230.

Processor 1202 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More specifically, the processor 1202 may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, or a processor implementing other instruction sets, or a processor implementing a combination of instruction sets. The processor 1202 may also be one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), network processor, or the like. The processor 1202 is configured to execute instructions 1226 for performing the operations and steps discussed herein.

Computer system 1200 may further include a network interface device 1222. Computer system 1200 may also include a video display unit 1208 (e.g., a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT) or a touch screen), an alphanumeric input device 1210 (e.g., a keyboard), a cursor control device 1214 (e.g., a mouse), and a signal generation device 1216 (e.g., a speaker)

The data storage device 1218 may include a computer-readable storage medium 1224 on which is stored one or more sets of instructions 1226 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 1226 may also reside, completely or at least partially, within the main memory 1204 and/or within the processor 1202 during execution thereof by the computer system 1200; the main memory 1204 and the processor 1202 also constitute computer-readable storage media. The instructions 1226 may further be transmitted or received over a network 1220 via a network interface device 1234.

While the computer-readable storage medium 1224 is shown in an exemplary implementation to be a single medium, the term "computer-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "computer-readable storage medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. Accordingly, the term "computer readable storage medium" shall be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media. The following examples pertain to further embodiments.

In example 1, a processor includes: 1) a memory to store data from an application, wherein the memory includes a Memory Corruption Detection (MCD) table and a memory object; and 2) a processor core coupled to the memory, wherein the processing core is to: a) receiving a memory access request from an application to access data of a memory object having contiguous memory blocks, wherein the memory access request comprises: i) a pointer indicating a location of a memory object in memory; and ii) a first MCD unique Identifier (ID); b) retrieving data stored in contiguous memory blocks of the memory object based on the location indicated by the pointer; c) retrieving allocation information associated with the contiguous memory blocks from the MCD table, wherein the allocation information comprises: i) a second MCD unique identifier associated with a contiguous memory block; and ii) an MCD boundary value indicating a size of a first memory region of the contiguous memory block; and d) sending an error message to the application based on the allocation information when an error event associated with the retrieved data occurs.

In example 2, the processor of example 1, wherein the contiguous memory block comprises: 1) an available memory area and an unavailable memory area; and 2) the first memory region is an available memory region or an unavailable memory region.

In example 3, the processor of examples 1-2, wherein the processor core is further to: 1) comparing the first MCD unique ID with the second MCD unique ID to determine when the retrieved data is from the memory object indicated by the pointer; and 2) determining when the retrieved data is from an available area of memory based on the allocation information.

In example 4, the processor of examples 1-3, wherein the error event occurs when: 1) the first MCD unique ID does not match the second MCD unique ID; or 2) the memory access is within an unavailable memory region.

In example 5, the processor of examples 1-4, wherein the processor core is further to determine the available memory region by: 1) subtracting the MCD boundary value from the size value of the contiguous memory block to obtain a boundary position value; and 2) identifying an MCD boundary location in the contiguous memory block based on the boundary location value, wherein the MCD boundary location indicates a boundary between the available memory region and the unavailable memory region.

In example 6, the processor of examples 1-5, wherein the size of the contiguous memory block is 64 bytes.

In example 7, the processor of examples 1-6, wherein when the MCD boundary value is zero, all of the contiguous memory blocks are available memory regions.

Embodiments may have different combinations of the structural features described above. For example, all optional features of the processors and methods described above may also be implemented with the systems described herein, and the details of the examples may be used anywhere in one or more embodiments.

In example 8, a processor includes: 1) a memory to store data from an application, wherein the memory includes a Memory Corruption Detection (MCD) table; and 2) a processor core coupled to the memory, wherein the processing core is to: a) receiving an allocation request from an application to allocate one or more contiguous memory blocks in memory to a memory object; b) allocating one or more contiguous memory blocks to the memory object in view of the size of the requested memory object, wherein a contiguous memory block of the one or more contiguous memory blocks comprises a first memory region and a second memory region; and c) writing an MCD metadata word to the MCD table, wherein the MCD metadata word includes i) a first MCD unique identifier associated with the contiguous memory block; and ii) an MCD boundary value indicating a size of a first memory region of the contiguous memory block.

In example 9, the processor of example 8, wherein the first memory area is a used portion of the contiguous memory block.

In example 10, the processor of examples 8-9, wherein the first memory region is an unused portion of the contiguous memory block.

In example 11, the processor of examples 8-10, wherein the processor core is further to: 1) creating a pointer having a memory address of the memory object and a second MCD unique identifier associated with the memory object; and 2) sending the pointer to the application.

In example 12, the processor of examples 8-11, wherein the size of the contiguous memory block is 64 bytes.

In example 13, the processor of examples 8-12, wherein the MCD metadata word is 2 bytes in size, and wherein: 1) the MCD unique ID is 1 byte of the MCD metadata word, and 2) the MCD boundary value is 1 byte of the MCD metadata word.

In example 14, a system on a chip (SoC) includes: 1) a processor; 2) a memory device coupled to the processor for storing data from the application, wherein the memory comprises a Memory Corruption Detection (MCD) table and a memory object; and 3) a memory controller coupled to the memory device, the memory controller to: a) receiving a memory access request from an application to access data of a memory object having contiguous memory blocks, wherein the memory access request comprises: i) a pointer indicating a location of a memory object in memory; and ii) a first MCD unique Identifier (ID); b) retrieving data stored in the contiguous memory block based on the location indicated by the pointer; c) retrieving allocation information associated with the contiguous memory blocks from the MCD table, wherein the allocation information comprises: i) a second MCD unique identifier associated with a contiguous memory block; and ii) an MCD boundary value indicating a size of a first memory region of the contiguous memory block; d) determining when the retrieved data is from an available area of memory based on the allocation information; and e) sending the retrieved data to the application.

In example 15, the SoC of example 14, wherein the contiguous memory block comprises: 1) an available memory area and an unavailable memory area; and 2) the first memory region is an available memory region or an unavailable memory region.

In example 16, the SoC of examples 14-15, wherein when the MCD boundary value is zero, all of the contiguous memory blocks are available memory regions.

In example 17, the SoC of examples 14-16, wherein the memory controller is further to compare the first MCD unique ID to the second MCD unique ID to determine when the retrieved data is from the memory object indicated by the pointer.

In example 18, the SoC of examples 14-17, the memory controller further to send an error message to the application when an error event associated with the retrieved data occurs, wherein the error event occurs when: 1) the first MCD unique ID does not match the second MCD unique ID; or 2) the memory access is within an unavailable memory region.

In example 19, the SoC of examples 14-18, wherein the memory controller is further to determine the available memory region by: 1) subtracting the MCD boundary value from the size value of the contiguous memory block to obtain a boundary position value; and 2) identifying an MCD boundary location in the contiguous memory block based on the boundary location value, wherein the MCD boundary location indicates a boundary between the available memory region and the unavailable memory region.

In example 20, the SoC of examples 14-19, wherein the size of the contiguous memory block is 64 bytes.

While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.

In the description herein, numerous specific details are set forth, such as examples of specific types of processors and system configurations, specific hardware structures, specific architectural and microarchitectural details, specific register configurations, specific instruction types, specific system components, specific measurements/heights, specific processor pipeline stages and operations, etc., in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that these specific details need not be employed to practice the present invention. In other instances, well known components or methods, such as specific or alternative processor architectures, specific logic circuits/code for described algorithms, specific firmware code, specific interconnect operation, specific logic configurations, specific manufacturing techniques and materials, specific compiler implementations, specific expressions of algorithms in code, specific power down and gating techniques/logic, and other specific operational details of computer systems have not been described in detail in order to avoid unnecessarily obscuring the present invention.

Embodiments are described with reference to secure memory repartitioning within a particular integrated circuit, such as within a computing platform or microprocessor. Embodiments may also be applicable to other types of integrated circuits and programmable logic devices. For example, the disclosed embodiments are not limited to only desktop computer systems or portable computers, such as

Ultrabooks^TMAnd (4) a computer. And may also be used for other devices such as handheld devices, tablets, other thin notebooks, system on a chip (SOC) devices, and embedded applications. Some examples of handheld devices include cellular telephones, internet protocol devices, digital cameras, Personal Digital Assistants (PDAs), and handheld PCs. Embedded applications typically include microcontrollers, Digital Signal Processors (DSPs), systems on a chip, network computers (netpcs), set-top boxes, network hubs, Wide Area Network (WAN) switches, or any other system that can perform the functions and operations taught below. It is described that the system may be any type of computer or embedded system. The disclosed embodiments may be particularly useful for low-end devices, such as wearable devices (e.g., watches), electronic implants, sensing and control infrastructure devices, controllers, supervisory control and data acquisition (SCADA) systems, and so forth. Furthermore, the apparatus, methods, and systems described herein are not limited to physical computing devices, but may also relate to software optimization for energy conservation and efficiency. As will become apparent in the following description, embodiments of the methods, apparatus and systems described herein (whether in terms of hardware, firmware, software or a combination thereof) are crucial to the landscape of 'green technology' balanced with performance considerations.

Although various embodiments herein are described with reference to a processor, other embodiments are applicable to other types of integrated circuits and logic devices. Similar techniques and teachings of embodiments of the invention may be applied to other types of circuits or semiconductor devices that may benefit from higher pipeline throughput and improved performance. The teachings of embodiments of the present invention are applicable to any processor or machine that performs data manipulation. However, the present invention is not limited to processors or machines that perform 512-bit, 256-bit, 128-bit, 64-bit, 32-bit, or 16-bit data operations, and may be applicable to any processor or machine that performs data manipulation or management. Further, the description herein provides examples, and the accompanying drawings illustrate various examples for illustrative purposes. However, these examples should not be construed in a limiting sense as they are intended to provide only examples of embodiments of the invention and are not intended to provide an exhaustive list of all possible implementations of embodiments of the invention.

While the following examples describe instruction processing and distribution in the context of execution units and logic circuits, other embodiments of the invention may also be implemented by data and/or instructions stored on a machine-readable tangible medium, which when executed by a machine, cause the machine to perform functions consistent with at least one embodiment of the invention. In one embodiment, the functionality associated with embodiments of the invention is embodied in machine-executable instructions. These instructions may be used to cause a general-purpose processor or special-purpose processor that is programmed with the instructions to perform the steps of the present invention. Embodiments of the present invention may also be provided as a computer program product or software which may include a machine or computer-readable medium having stored thereon instructions which may be used to program a computer (or other electronic devices) to perform one or more operations according to embodiments of the present invention. Alternatively, various operations of embodiments of the present invention may be performed by specific hardware components that contain fixed-function logic for performing the operations, or by any combination of programmed computer components and fixed-function hardware components.

Instructions for programming logic to perform various embodiments of the present invention may be stored within a memory (such as DRAM, cache, flash, or other storage device) in a system. Further, the instructions may be distributed via a network or by other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to: examples of computer readable media include, but are not limited to, floppy diskettes, optical disks, compact disc read-only memory (CD-ROM), magneto-optical disks, read-only memory (ROM), Random Access Memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical cards, flash memory, or tangible machine-readable memory used in transmitting information over the internet via electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.). Thus, a computer-readable medium includes any type of tangible machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

A design may go through multiple stages, from creation to simulation to fabrication. The data representing the design may represent the design in a number of ways. First, as is useful in simulations, the hardware may be represented using a hardware description language or another functional description language. Furthermore, a circuit level model with logic and/or transistor gates may be generated at some stages of the design process. Furthermore, most designs, at some stage, reach a level of data representing the physical placement of various devices in the hardware model. In the case where conventional semiconductor fabrication techniques are used, the data representing the hardware model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce the integrated circuit. In any design representation, the data may be stored in any form of a machine-readable medium. A memory or magnetic or optical storage device (such as a disk) may be a machine-readable medium that stores information transmitted via optical or electrical waves, which are modulated or otherwise generated to transmit the information. When an electrical carrier wave indicating or carrying the code or design is transmitted, to the extent that copying, buffering, or re-transmission of the electrical signal is achieved, a new copy is made. Accordingly, a communication provider or a network provider may store, at least temporarily, an article of manufacture (such as information encoded into a carrier wave) embodying techniques of various embodiments of the invention on a tangible, machine-readable medium.

A module, as used herein, refers to any combination of hardware, software, and/or firmware. By way of example, a module includes hardware, such as a microcontroller, associated with a non-transitory medium for storing code adapted to be executed by the microcontroller. Thus, in one embodiment, reference to a module refers to hardware specifically configured to identify and/or execute code to be stored on non-transitory media. Additionally, in another embodiment, the use of a module refers to a non-transitory medium including code specifically adapted to be executed by a microcontroller to perform predetermined operations. And it may be inferred that in yet another embodiment, the term module (in this example) may refer to a combination of a microcontroller and a non-transitory medium. In general, module boundaries that are shown as separate are generally different and may overlap. For example, the first and second modules may share hardware, software, firmware, or a combination thereof, while possibly retaining some separate hardware, software, or firmware. In one embodiment, use of the term logic includes hardware such as transistors, registers, or other hardware such as programmable logic devices.

In one embodiment, use of the phrase "configured to" refers to arranging, bringing together, manufacturing, offering for sale, importing and/or designing a device, hardware, logic or element to perform a specified and/or determined task. In this example, a device that is not operating, or an element thereof, is still "configured to" perform the specified task(s), if it is designed, coupled, and/or interconnected to perform the specified task(s). As a purely illustrative example, during operation, a logic gate may provide either a 0 or a 1. But logic gates "configured to" provide an enable signal to the clock do not include every potential logic gate that can provide a 1 or a 0. Rather, the logic gate is a logic gate that is coupled in some manner that the output of a 1 or 0 is used to enable the clock during operation. It is again noted that use of the term "configured to" does not require operation, but instead focuses on the potential state of a device, hardware, and/or element in which the device, hardware, and/or element is designed to perform a particular task while the device, hardware, and/or element is operating.

Furthermore, in one embodiment, use of the phrases "for," "capable of/capable of being used for" and/or "capable of being used for" refers to some apparatus, logic, hardware, and/or elements designed in such a way as to enable use of the apparatus, logic, hardware, and/or elements in a specified manner. As noted above, in one embodiment, a use for, capable of, or available for use refers to a potential state of a device, logic, hardware, and/or element that is not operating but is designed in such a way as to enable use of the device in a specified manner.

A value, as used herein, includes any known representation of a number, state, logic state, or binary logic state. In general, the use of a logic level, logic value, or multiple logic values is also referred to as 1 and 0, which simply represents a binary logic state. For example, 1 refers to a logic high level and 0 refers to a logic low level. In one embodiment, a memory cell, such as a transistor or flash memory cell, can hold a single logic value or multiple logic values. However, other representations of values in a computer system are also used. For example, a decimal number of tens may also be represented as a binary value of 1010 and a hexadecimal letter A. Accordingly, a value includes any representation of information that can be stored in a computer system.

Also, a state may also be represented by a value or a portion of a value. By way of example, a first value, such as a logic 1, may represent a default or initial state, while a second value, such as a logic 0, may represent a non-default state. Further, in one embodiment, the terms reset and set refer to default and updated values or states, respectively. For example, the default value potentially comprises a high logic value, i.e., reset, while the updated value potentially comprises a low logic value, i.e., set. Note that any combination of values may be used to represent any number of states.

The embodiments of methods, hardware, software, firmware, or code described above may be implemented via instructions or code stored on a machine-accessible, machine-readable, computer-accessible, or computer-readable medium that are executable by a processing element. A non-transitory machine-accessible/readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine, such as a computer or electronic system. For example, a non-transitory machine-accessible medium includes: random Access Memory (RAM), such as static RAM (sram) or dynamic RAM (dram); a ROM; a magnetic or optical storage medium; a flash memory device; an electrical storage device; an optical storage device; an acoustic storage device; other forms of storage devices for holding information received from transient (propagated) signals (e.g., carrier waves, infrared signals, digital signals); and the like, as distinguished from non-transitory media from which information may be received.

Instructions used to program logic to perform embodiments of the present invention may be stored in memory (such as DRAM, cache, flash, or other memory) in the system. Further, the instructions may be distributed via a network or by other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to: examples of computer readable media include, but are not limited to, floppy diskettes, optical disks, compact disc read-only memory (CD-ROM), magneto-optical disks, read-only memory (ROM), Random Access Memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical cards, flash memory, or tangible machine-readable memory used in transmitting information over the internet via electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.). Thus, a computer-readable medium includes any type of tangible machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In the foregoing specification, a detailed description has been given with reference to specific exemplary embodiments. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. Moreover, the foregoing use of embodiment and other exemplarily language does not necessarily refer to the same embodiment or the same example, but may refer to different and distinct embodiments, as well as potentially the same embodiment.

Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. The blocks described herein may be hardware, software, firmware, or a combination thereof.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as "defining," "receiving," "determining," "issuing," "linking," "associating," "obtaining," "authenticating," "blocking," "executing," "requesting," "passing," or the like, refer to the action and/or processes of a computing system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computing system's registers and memories into other data similarly represented as physical quantities within the computing system memories or registers or other such information storage, transmission and/or display devices. The word "example" or "exemplary" may be used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "exemplary" or "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word "example" or "exemplary" is intended to present concepts in a concrete fashion. As used in this application, the term "or" means an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise, or clear from context, "X comprises a or B" means any of the natural inclusive permutations. That is, if X comprises A; x comprises B; or X includes both a and B, then in any of the above examples, "X includes a or B. In addition, the articles "a" and "an" as used in this application and the appended claims should generally be construed to mean "one or more" unless specified otherwise or clear from context to be directed to a singular form. Furthermore, the use of the terms "embodiment" or "one embodiment" or "an implementation" or "one implementation" throughout is not intended to mean the same embodiment or implementation unless so described. Furthermore, the terms "first," "second," "third," "fourth," and the like, as used herein, are intended to be used as labels to distinguish between different elements, and may not necessarily have the sequential meaning as dictated by their number.

Claims

1. A processor, comprising:

a memory to store data from an application, wherein the memory comprises a Memory Corruption Detection (MCD) table and a memory object; and

a processor core coupled to the memory, wherein the processor core is to:

receiving, from the application, a memory access request to access data of the memory object in the memory having a contiguous memory block, wherein the contiguous memory block includes an available memory region and an unavailable memory region, wherein the memory access request includes:

a pointer indicating a location of the memory object in the memory; and

a first MCD unique identifier ID;

retrieving data stored in the contiguous memory block based on the location indicated by the pointer;

retrieving allocation information associated with the contiguous memory block from the MCD table, wherein the allocation information comprises:

a second MCD unique identifier ID associated with the contiguous memory block; and

an MCD boundary value indicating a size of a first memory region of the contiguous memory block; and

sending an error message to the application based on the allocation information when an error event associated with the retrieved data occurs; and

determining the available memory region by:

subtracting the MCD boundary value from a size value of the contiguous memory block to obtain a boundary position value; and

identifying an MCD boundary location in the contiguous memory block based on a boundary location value, wherein the MCD boundary location indicates a boundary between the available memory region and the unavailable memory region.

2. The processor of claim 1, wherein the first memory region is the available memory region or the unavailable memory region.

3. The processor of claim 1, wherein the processor core is further to:

comparing the first MCD unique ID to the second MCD unique ID to determine when the retrieved data is from the memory object indicated by the pointer; and

determining when the retrieved data is from the available memory region based on the allocation information.

4. The processor of claim 3, wherein the error event occurs when:

the first MCD unique ID does not match the second MCD unique ID; or

The memory access is within the unavailable memory region.

5. The processor of claim 1, wherein the contiguous memory block is 64 bytes in size.

6. The processor of claim 1, wherein when the MCD boundary value is zero, all contiguous memory blocks are the available memory region.

7. A processor, comprising:

a memory for storing data from an application, wherein the memory comprises a Memory Corruption Detection (MCD) table; and

a processor core coupled to the memory, wherein the processor core is to:

receiving, from the application, an allocation request to allocate one or more contiguous memory blocks in the memory to a memory object;

allocating the one or more contiguous memory blocks to the memory object in view of a size of the requested memory object, wherein a contiguous memory block of the one or more contiguous memory blocks comprises a first memory region and a second memory region; and

writing an MCD metadata word to the MCD table, wherein the MCD metadata word comprises:

a first MCD unique Identifier (ID) associated with the contiguous memory block, wherein the first MCD unique ID is 1 byte of the MCD metadata word; and

an MCD boundary value indicating a size of the first memory region of the contiguous memory block, wherein the MCD boundary value is 1 byte of the MCD metadata word.

8. The processor of claim 7, wherein the first memory area is a used portion of the contiguous memory block.

9. The processor of claim 7, wherein the first memory region is an unused portion of the contiguous memory block.

10. The processor of claim 7, wherein the processor core is further to:

creating a pointer having a memory address of the memory object and a second MCD unique identifier ID associated with the memory object; and

sending the pointer to the application.

11. The processor of claim 7, wherein the contiguous memory block is 64 bytes in size.

12. A system-on-chip SoC, comprising:

a processor;

a memory device, coupled to the processor, for storing data from an application, wherein the memory comprises a Memory Corruption Detection (MCD) table and a memory object; and

a memory controller coupled to the memory device, the memory controller to:

receiving a memory access request from the application to access data of the memory object having a contiguous memory block, wherein the contiguous memory block includes an available memory region and an unavailable memory region, wherein the memory access request includes:

a pointer indicating a location of the memory object in the memory; and

a first MCD unique identifier ID;

determining when the retrieved data is from an available area of memory based on the allocation information; and

sending the retrieved data to the application; and

determining the available memory region by:

13. The system-on-chip SoC of claim 12, wherein the first memory region is the available memory region or the unavailable memory region.

14. The system-on-chip SoC of claim 12, wherein when the MCD boundary value is zero, all consecutive memory blocks are the available memory region.

15. The system-on-chip SoC of claim 12, wherein the memory controller is further to compare the first MCD unique ID to the second MCD unique ID to determine when the retrieved data is from the memory object indicated by the pointer.

16. The system-on-chip SoC of claim 15, wherein the memory controller is further to send an error message to the application when an error event associated with the retrieved data occurs, wherein the error event occurs when:

the first MCD unique ID does not match the second MCD unique ID; or

The memory access is within the unavailable memory region.

17. The system-on-chip SoC of claim 12, in which the contiguous memory block is 64 bytes in size.