CN1242087A - 用于高级微处理器的门控存储缓冲器 - Google Patents
用于高级微处理器的门控存储缓冲器 Download PDFInfo
- Publication number
- CN1242087A CN1242087A CN97180942A CN97180942A CN1242087A CN 1242087 A CN1242087 A CN 1242087A CN 97180942 A CN97180942 A CN 97180942A CN 97180942 A CN97180942 A CN 97180942A CN 1242087 A CN1242087 A CN 1242087A
- Authority
- CN
- China
- Prior art keywords
- data
- instruction
- memory
- address
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000000872 buffer Substances 0.000 title claims abstract description 125
- 238000013519 translation Methods 0.000 claims abstract description 155
- 230000004044 response Effects 0.000 claims abstract description 31
- 238000003860 storage Methods 0.000 claims description 232
- 238000000034 method Methods 0.000 claims description 67
- 238000013461 design Methods 0.000 claims description 35
- 238000001514 detection method Methods 0.000 claims description 21
- 238000013500 data storage Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims 4
- 210000000352 storage cell Anatomy 0.000 claims 4
- 238000005204 segregation Methods 0.000 claims 3
- 230000001052 transient effect Effects 0.000 claims 3
- 230000014616 translation Effects 0.000 description 150
- 238000005457 optimization Methods 0.000 description 43
- 230000008569 process Effects 0.000 description 33
- 238000005516 engineering process Methods 0.000 description 30
- 238000012545 processing Methods 0.000 description 24
- 238000012546 transfer Methods 0.000 description 20
- 230000008707 rearrangement Effects 0.000 description 18
- 238000012360 testing method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 14
- 241001673391 Entandrophragma candollei Species 0.000 description 12
- 238000007792 addition Methods 0.000 description 12
- 238000004088 simulation Methods 0.000 description 11
- 230000008859 change Effects 0.000 description 10
- 238000011068 loading method Methods 0.000 description 10
- 230000015572 biosynthetic process Effects 0.000 description 8
- 230000002860 competitive effect Effects 0.000 description 6
- 230000008030 elimination Effects 0.000 description 5
- 238000003379 elimination reaction Methods 0.000 description 5
- 230000001965 increasing effect Effects 0.000 description 5
- 238000007689 inspection Methods 0.000 description 4
- 238000004321 preservation Methods 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 241001269238 Data Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000014759 maintenance of location Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 238000005728 strengthening Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 241000282341 Mustela putorius furo Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008571 general function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000005291 magnetic effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3834—Maintaining memory consistency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Original C code while( (n--)>0) { *s++=c } ==================================================================== Win32×86 instructions produced by a compiler compiling this C code. mov %ecx,[%ebp+0×c] // load c from memory address into the %ecx mov %eax,[%ebp+0×8] // load s from memory address into the %eax mov [%eax],%ecx // store c into memory address s held in %eax add %eax,#4 // increment s by 4. mov [%ebp+0×8],%eax // store(s+4)back into memory mov %eax,[%ebp+0×10] // load n from memory address into the %eax lea %ecx,[%eax-1] // decrement n and store the result in %ecx mov [%ebp+0×10],%ecx // store (n-1) into memory and %eax,%eax // test n to set the condition codes jg .-0×1b // branch to the top of this section if “n>0”
Shows each X86 Instruction shown above followed by the host instructions necessary to implement the X86 Instruction. mov %ecx,[%ebp+0×c] // load c from memory address into ecx add R0,Rebp,0×c ; form the memory address and put it in R0 ld Recx,[R0] ; load c from memory address in R0 into Recx mov %eax,[%ebp+0×8] // load s from memory address into %eax add R2,Rebp,0×8 ; form the memory address and put it in R2 ld Reax,[R2] ; load s from memory address in R2 into Recx mov [%eax],%ecx // store c into memory address s held in %eax st [Reax],Recx ; store c into memory address s held in Reax add %eax,#4 // increment s by 4 add Reax,Reax,4 ; increment s by 4 mov [%ebp+0×8],%eax // store (s+4) back into memory add R5,Rebp,0×8 ; form the memory address and put it in R5 st [R5],Reax ; store (s+4) back into memory <dp n="d33"/> mov %eax,[%ebp+0×10] // load n from memory address into %eax add R7,Rebp,0×10 ; form the memory address and put it in R7 ld Reax,[R7] ; load n from memory address into the Reax lea %ecx,[%eax-1] // decrement n and store the result in %ecx sub Recx,Reax,1 ; decrement n and store the result in Recx mov [%ebp+0×10],%ecx // store (n-1) into memory add R9,Rebp,0×10 ; form the memory address and put it in R9 st [R9],Recx ; store (n-1) into memory and %eax,%eax // test n to set the condition codes andcc R11,Reax,Reax ; test n to set the condition codes jg .-0×1b // branch to the top of this section if “n>0” Most Instruction key: ld=load add=ADD st=store sub=subtract jg=jump if condition codes indicate greater andcc=and set the condition codes
Adds host instructions necessary to perform X86 address computation and upper and lower segment limit checks. mov %ecx,[%ebp+0×c] // load c add R0,Rebp,0×c ; form logical address into R0 chkl R0,Rss_limit ; Check the logical address against segment lower limit chku R0,R_FFFFFFFF ; Check the logical address against segment upper limit add R1,R0,Rss_base ; add the segment base to form the linear address ld Recx,[R1] ; load c from memory address in R1 into Recx mov %eax,[%ebp+0×8] // load s add R2,Rebp,0×8 ; form logical address into R0 chkl R2,Rss_limit ; Check the logical address against segment lower limit chku R2,R_FFFFFFFF ; Check the logical address against segment upper limit add R3,R2,Rss_base ; add the segment base to form the linear address ld Reax, [R3] ; load s from memory address in R3 into Ra mov [%eax],%ecx // store c into [s] chku Reax,Rds_limit ; Check the logical address against segment upper limit add R4,Reax,Rds_base ; add the segment base to form the linear address st [R4],Recx ; store c into memory address s add %eax,#4 // increment s by 4 addcc Reax,Reax,4 ; increment s by 4 mov [%ebp+0×8],%eax // store (s+4) to memory add R5,Rebp,0×8 ; form logical address into R5 chkl R5,Rss_limit ; Check the logical address against segment lower limit chku R5,R_FFFFFFFF ; Check the logical address against segment upper limit add R6,R5,Rss_base ; add the segment base to form the linear address st [R6],Reax ; store (s+4) to memory address in R6 mov %eax,[%ebp+0×10] // load n add R7,Rebp,0×10 ; form logical address into R7 chkl R7,Rss_limit ; Check the logical address against segment lower limit chku R7,R_FFFFFFFF ; Check the logical address against segment upper limit add R8,R7,Rss_base ; add the segment base to form the linear address ld Reax,[R8] ; load n from memory address in R8 into Reax lea %ecx, [%eax-1] // decrement n sub Recx,Reax,1 ; decrement n mov [%ebp+0×10], %ecx // store (n-1) <dp n="d35"/> add R9,Rebp,0×10 ;form logical address into R9 chkl R9,Rss_limit ; Check the logical address against segment lower limit chku R9,R_FFFFFFFF ; Check the logical address against segment upper limit add R10,R9,Rss_base ; add the segment base to form the linear address st [R10],Recx ; store n-1 in Recx into memory using address in R10 and %eax,%eax // test n to set the condition codes andcc R11,Reax,Reax ;test n to set the condition codes
Adds instructions to maintain the target X86 instruction pointer“eip”and the commit instructions that use the special morph host hardware to update X86 state. <dp n="d36"/> mov %ecx,[%ebp+0×c] // load c add R0,Rebp,0×c chkl R0,Rss_limit chku R0,R_FFFFFFFF add R1,R0,Rss_base ld Recx,[R1] add Reip,Reip,3 ; add X86 instruction length to eip in Reip commit ; commits working state to official state mov %eax,[%ebp+0×8] // load s add R2,Rebp,0×8 chkl R2,Rss_limit chku R2,R_FFFFFFFF add R3,R2,Rss_base ld Reax,[R3] add Reip,Reip,3 ; add X86 instruction length to eip in Reip commit ; commits working state to official state mov [%eax],%ecx // store c into [s] chku Reax,Rds_limit add R4,Reax,Rds_base st [R4],Recx add Reip,Reip,2 ; add X86 instruction length to eip in Reip commit ; commits working state to official state add %esx,#4 // increment s by 4 addcc Reax,Reax,4 add Reip,Reip,5 ; add X86 instruction length to eip in Reip commit ; commits working state to official state mov [%ebp+0×8],%eax // store (s+4) add R5,Rebp,0×8 chkl R5,Rss_limit chku R5,R_FFFFFFFF add R6,R5,Rss_base st [R6],Reax add Reip,Reip,3 ; add X86 instruction length to eip in Reip commit ; commits working state to official state mov %eax,[%ebp+0×10] // load n add R7,Rebp,0×10 chkl R7,Rss_limit chku R7,R_FFFFFFFF add R8,R7,Rss_base ld Reax,[R8] add Reip,Reip,3 ; add X86 instruction length to eip in Reip commit ; commits working state to official state <dp n="d37"/> lea %ecx,[%eax-1] // decrement n sub Recx,Reax,1 add Reip,Reip,3 ; add x86 instruction length to eip in Reip commit ; commits working state to official state mov [%ebp+0×10],%ecx // store (n-1) add R9,Rebp,0×10 chkl R9,Rss_limit chku R9,R_FFFFFFFF add R10,R9,Rss_base st [R10],Recx add Reip,Reip,3 add x86 instruction length to eip in Reip commit ; commits working state to official state and %eax,%eax // test n andcc R11,Reax,Reax add Reip,Reip,3 commit ; commits working state to official state jg .-0×1b // branch“n>0” add Rseq,Reip,Length(jg) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit ; commits working state to official state jg mainloop,mainloop Host Instruction key: commit=copy the contents of the working registers to the official target registers and send working stores to memory
Optimization ============================================================================================ Assumes 32 bit flat address space which allows the elimination of segment base additions and some limit checks. Win32 uses Flat 32b segmentation Record Assumptions: Rss_base==0 Rss_limit==0 Rds_base==0 Rds_limit==FFFFFFFF SS and DS protection check <dp n="d39"/> mov %ecx,[%ebp+0×c] //load c add R0,Rebp,0×c chku R0,R_FFFFFFFF ld Recx,[R0] add Reip,Reip,3 commit mov %eax,[%ebp+0×8] //load s add R2,Rebp,0×8 chku R2,R_FFFFFFFF ld Reax,[R2] add Reip,Reip,3 commit mov [%eax],%ecx //store c into [s] chku Reax,R_FFFFFFFF st [Reax],Recx add Reip,Reip,2 commit add %eax,#4 //increment s by 4 addcc Reax,Reax,4 add Reip,Reip,5 commit mov [%ebp+0×8],%eax //store (s+4) add R5,Rebp,0×8 chku R5,R_FFFFFFFF st [R5],Reax add Reip,Reip,3 commit mov %eax,[%ebp+0×10] //load n add R7,Rebp,0×10 chku R7,R_FFFFFFFF ld Reax,[R7] add Reip,Reip,3 commit <dp n="d40"/> lea %ecx,[%eax-1] //decrement n sub Recx,Reax,1 add Reip,Reip,3 commit mov [%ebp+0×10],%ecx //store (n-1) add R9,Rebp,0×10 chku R9,R_FFFFFFFF st [R9],Recx add Reip,Reip,3 commit and %eax,%eax //test n andcc R11,Reax,Reax add Reip,Reip,3 commit jg .-0×1b //branch “n>0” add Rseq,Reip,Length(jg) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop
Assume data addressed includes no bytes outside of computer memory limits which can only occur on unaligned page crossing memory references at the upper memory limit,and can be handled by special case software or hardware. mov %ecx,[%ebp+0×c] //load c add R0,Rebp,0×c ld Recx,[R0] add Reip,Reip,3 commit mov %eax,[%ebp+0×8] //load s add R2,Rebp,0×8 ld Reax,[R2] add Reip,Reip,3 commit mov [%eax],%ecx //store c into [s] st [Reax],Recx add Reip,Reip,2 commit add %eax,#4 //increment s by 4 addcc Reax,Reax,4 add Reip,Reip,5 commit mov [%ebp+0×8],%eax //store (s+4) add R5,Rebp,0×8 st [R5],Reax add Reip,Reip,3 commit mov %eax,[%ebp+0×10] //load n add R7,Rebp,0×10 ld Reax,[R7] add Reip,Reip,3 commit <dp n="d42"/> lea %ecx,[%eax-1] // decrement n sub Recx,Reax,1 add Reip,Reip,3 commit mov [%ebp+0×10],%ecx // store (n-1) add R9,Rebp,0×10 st [R9],Recx add Reip,Reip,3 commit and %eax,%eax // test n andcc R11,Reax,Reax add Reip,Reip,3 commit jg .-0×1b // branch“n>0” add Rseq,Reip,Length(jg) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop Host Instruction key: selcc = Select one of the source registers and copy its contents to the destination register based on the condition codes.
Detect and eliminate redundant address calculations.The example shows the code after eliminating the redundant operations. mov %ecx,[%ebp+0×c] // load c add R0,Rebp,0×c ld Recx,[R0] add Reip,Reip,3 <dp n="d43"/> commit mov %eax,[%ebp+0×8] //load s add R2,Rebp,0×8 ld Reax,[R2] add Reip,Reip,3 commit mov [%eax],%ecx //store c into [s] st [Reax],Recx add Reip,Reip,2 commit add %eax,#4 //increment s by 4 addcc Reax,Reax,4 add Reip,Reip,5 commit mov [%ebp+0×8],%eax //store (s+4) st [R2],Reax add Reip,Reip,3 commit mov %eax,[%ebp+0×10] //load n add R7,Rebp,0×10 ld Reax,[R7] add Reip,Reip,3 commit lea %ecx,[%eax-1] //decrement n sub Recx,Reax,1 add Reip,Reip,3 commit mov [%ebp+0×10],%ecx //store (n-1) st [R7],Recx add Reip,Reip,3 commit and %eax,%eax //test n andcc R11,Reax,Reax add Reip,Reip,3 commit jg .-0×1b //branch“n>0” add Rseq,Reip,Length(jg) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop
Assume that target exceptions will not occur within the translation so delay updating eip and target state. mov %ecx,[%ebp+0×c] //load c add R0,Rebp,0×c ld Recx,[R0] mov %eax,[%ebp+0×8] //load s add R2,Rebp,0×8 ld Reax,[R2] mov [%eax],%ecx //store c into [s] st [Reax],Recx add %eax,#4 //increment s by 4 add Reax,Reax,4 mov [%ebp+0×8],%eax //store (s+4) st [R2],Reax <dp n="d45"/> mov %eax,[%ebp+0×10] //load n add R7,Rebp,0×10 ld Reax,[R7] lea %ecx,[%eax-1] //decrement n sub Recx,Reax,1 mov [%ebp+0×10],%ecx //store (n-1) st [R7],Recx and %eax,%eax //test n andcc R11,Reax,Reax jg .-0×1b //branch“n>0” add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop
In summary: add R0,Rebp,0×c ld Recx,[R0] add R2,Rebp,0×8 ld Reax,[R2] st [Reax],Recx add Reax,Reax,4 st [R2],Reax add R7,Rebp,0×10 ld Reax,[R7] //Live out sub Recx,Reax,1 //Live out st [R7],Recx andcc R11,Reax,Reax add Rseq,Reip,Length(block) ldc Rtarg,EIP (target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop The comment“Live Out”refers to the need to actually maintain Reax and Recx correctly prior to the commit. Otherwise further optimization might be possible. ============================================================================
Renaming to reduce register resource dependencies. This will allow subsequent scheduling to be more effective. From this point on,the original target X86 code is omitted as the relationship between individual target X86 instructions and host instructions becomes increasingly blurred. add R0,Rebp,0×c ld R1,[R0] add R2,Rebp,0×8 ld R3,[R2] st [R3],R1 add R4,R3,4 st [R2],R4 add R7,Rebp,0×10 ld Reax,[R7] //Live out sub Recx,Reax,1 //Live out st [R7],Recx andcc R11,Reax,Reax add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) selcc Reip,Rseq,Rtarg commit jg mainloop,mainloop
After the scheduling process which organizes the primitive host operations as multiple operations that can execute in the parallel on the host VLIW hardware. Each line shows the parallel operations that the VLIW machine executes,and the“&”indicates the parallelism. add R2,Rebp,0×8 & add R0,Rebp,0×c nop & add R7,Rebp,0×10 ld R3,[R2] & add Rseq,Reip,Length(block) ld R1,[R0] & add R4,R3,4 st [R3],R1 & ldc Rtarg,EIP(target) ld Reax,[R7] & nop st [R2],R4 & sub Recx,Reax,1 st [R7],Recx & andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg & jg mainloop,mainloop & commit Host Instruction key: nop=no operation
Resolve host branch targets and chain stored translations add R2,Rebp,0×8 & add R0,Rebp,0×c nop & add R7,Rebp,0×10 ld R3,[R2] & add Rseq,Reip,Length(block) ld R1,[R0] & add R4,R3,4 st [R3],R1 & ldc Rtarg,EIP(target) ld Reax,[R7] & nop st [R2],R4 & sub Recx,Reax,1 st [R7],Recx & andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg & jg Sequential,Target & commit
Advanced Optimizations,Backward Code Motion: This and subsequent examples start with the code prior to scheduling. This optimization first depends on detecting that the code is a loop. Then invariant operations can be moved out of the loop body and executed once before entering the loop body. entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) Loop: ld R1,[R0] ld R3,[R2] st [R3],R1 add R4,R3,4 st [R2],R4 ld Reax,[R7] sub Recx,Reax,1 st [R7],Recx andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg commit jg mainloop,Loop
Schedule the loop body after backward code motion. For example purposes,only code in the loop body is shown scheduled Entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) Loop: ld R3,[R2] & nop ld R1,[R0] & add R4,R3,4 st [R3],R1 & nop ld Reax,[R7] & nop st [R2],R4 & sub Recx,Reax,1 st [R7],Recx & andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg & jg sequential,Loop & commit Host Instruction key: ldc=load a 32-bit constant
After Backward Code Motion: Target: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) Loop: ld R1,[R0] ld R3,[R2] st [R3],R1 add R4,R3,4 st [R2],R4 ld Reax,[R7] //Live out <dp n="d50"/> sub Recx,Reax,1 //Live out st [R7],Recx andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg commit jg mainloop,Loop ================================================================ Register Allocation: This shows the use of register alias detection hardware of the morph host that allows variables to be safely moved from memory into registers. The starting point is the code after "backward code motion". This shows the optimization that can eliminate loads. First the loads are performed. The address is protected by the alias hardware,such that should a store to the address occur,an "alias"exception is raised. The loads in the loop body are then replaced with copies. After the main body of the loop,the alias hardware is freed. Entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) ld Rc,[R0] ;First do the load of the variable from memory prot [R0],Alias1 ;Then protect the memory location from stores ld Rs, [R2] prot [R2],Alias2 ld Rn,[R7] prot [R7],Alias3 Loop: copy R1,Rc copy R3,Rs st [R3],R1 add R4,Rs,4 copy Rs,R4 st [R2],Rs,NoAliasCheck copy Reax,Rn //Live out sub Recx,Reax,1 //Live out copy Rn,Recx st [R7],Rn,noAliasCheck andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg commit jg Epilog,Loop Epilog: FA Alias1 Free the alias detection hardware FA Alias2 Free the alias detection hardware FA Alias3 Free the alias detection hardware j Sequential <dp n="d51"/> Host Instruction key: protect=protect address from loads FA=free alias copy=copy j=jump
Copy Propagation: After using the alias hardware to turn loads within the loop body into copies,copy propagation allows the elimination of some copies. Entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) ld Rc,[R0] prot [R0],Alias1 ld Rs,[R2] prot [R2],Alias2 ld Recx,[R7] prot [R7],Alias3 Loop: st [Rs],Rc add Rs,Rs,4 st [R2],Rs,NoAliasCheck copy Reax,Recx //Live out sub Recx,Reax,1 //Live out st [R7],Recx,NoAliasCheck andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg <dp n="d52"/> commit jg Epilog,Loop Epilog: FA Alias1 FA Alias2 FA Alias3 j Sequential
Example illustrating scheduling of the loop body only. Entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) ld Rc,[R0] prot [R0],Alias1 ld Rs,[R2] prot [R2],Alias2 ld Recx,[R7] prot [R7],Alias3 Loop: st [Rs],Rc, & add Rs,Rs,4 & copy Reax,Recx st [R2],Rs,NAC & sub Recx,Reax,1 st [R7],Recx,NAC & andcc Rll,Reax,Reax selcc Reip,Rseq,Rtarg & jg Epilog,Loop & commit Epilog: FA Alias1 FA Alias2 FA Alias3 j Sequential Host Instruction key: NAC=No Alias Check
Store Elimination by use of the alias hardware. Entry: add R0,Rebp,0×c add R2,Rebp,0×8 add R7,Rebp,0×10 add Rseq,Reip,Length(block) ldc Rtarg,EIP(target) ld Rc,[R0] prot [R0],Alias1 ;protect the address from loads and stores ld Rs,[R2] prot [R2],Alias2 ;protect the address from loads and stores ld Recx,[R7] prot [R7],Alias3 ;protect the address from loads and stores Loop: st [Rs],Rc, & add Rs,Rs,4 & copy Reax,Recx sub Recx,Reax,1 & andcc R11,Reax,Reax selcc Reip,Rseq,Rtarg & jg Epilog,Loop & commit Epilog: FA Alias1 FA Alias2 FA Alias3 st [R2],Rs ;writeback the final value of Rs st [R7],Recx ;writeback the final value of Recx j Sequential
Claims (34)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/772,686 | 1996-12-23 | ||
US08/772,686 US6011908A (en) | 1996-12-23 | 1996-12-23 | Gated store buffer for an advanced microprocessor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1242087A true CN1242087A (zh) | 2000-01-19 |
CN1103079C CN1103079C (zh) | 2003-03-12 |
Family
ID=25095868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN97180942A Expired - Lifetime CN1103079C (zh) | 1996-12-23 | 1997-12-12 | 用于高级微处理器的门控存储缓冲器 |
Country Status (7)
Country | Link |
---|---|
US (1) | US6011908A (zh) |
EP (1) | EP1010077A4 (zh) |
JP (1) | JP3537448B2 (zh) |
KR (1) | KR100394967B1 (zh) |
CN (1) | CN1103079C (zh) |
CA (1) | CA2270122C (zh) |
WO (1) | WO1998028689A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100418072C (zh) * | 2004-12-27 | 2008-09-10 | 英特尔公司 | 用于基于填充缓冲器命中来预取的装置和方法 |
CN102200926A (zh) * | 2010-03-24 | 2011-09-28 | 北京兆易创新科技有限公司 | 一种存储器读操作功能的仿真验证方法 |
CN103502945A (zh) * | 2011-04-07 | 2014-01-08 | 英特尔公司 | 基于旋转的别名保护寄存器中的寄存器分配 |
CN112199669A (zh) * | 2020-09-25 | 2021-01-08 | 杭州安恒信息技术股份有限公司 | 一种检测rop攻击的方法和装置 |
Families Citing this family (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6205537B1 (en) | 1998-07-16 | 2001-03-20 | University Of Rochester | Mechanism for dynamically adapting the complexity of a microprocessor |
US7111290B1 (en) | 1999-01-28 | 2006-09-19 | Ati International Srl | Profiling program execution to identify frequently-executed portions and to assist binary translation |
US8065504B2 (en) * | 1999-01-28 | 2011-11-22 | Ati International Srl | Using on-chip and off-chip look-up tables indexed by instruction address to control instruction execution in a processor |
US7941647B2 (en) | 1999-01-28 | 2011-05-10 | Ati Technologies Ulc | Computer for executing two instruction sets and adds a macroinstruction end marker for performing iterations after loop termination |
US8127121B2 (en) | 1999-01-28 | 2012-02-28 | Ati Technologies Ulc | Apparatus for executing programs for a first computer architechture on a computer of a second architechture |
US6954923B1 (en) | 1999-01-28 | 2005-10-11 | Ati International Srl | Recording classification of instructions executed by a computer |
US7013456B1 (en) | 1999-01-28 | 2006-03-14 | Ati International Srl | Profiling execution of computer programs |
US8074055B1 (en) | 1999-01-28 | 2011-12-06 | Ati Technologies Ulc | Altering data storage conventions of a processor when execution flows from first architecture code to second architecture code |
US6826748B1 (en) | 1999-01-28 | 2004-11-30 | Ati International Srl | Profiling program execution into registers of a computer |
US6779107B1 (en) | 1999-05-28 | 2004-08-17 | Ati International Srl | Computer execution by opportunistic adaptation |
US7634635B1 (en) | 1999-06-14 | 2009-12-15 | Brian Holscher | Systems and methods for reordering processor instructions |
US7089404B1 (en) | 1999-06-14 | 2006-08-08 | Transmeta Corporation | Method and apparatus for enhancing scheduling in an advanced microprocessor |
US7254806B1 (en) | 1999-08-30 | 2007-08-07 | Ati International Srl | Detecting reordered side-effects |
US7761857B1 (en) * | 1999-10-13 | 2010-07-20 | Robert Bedichek | Method for switching between interpretation and dynamic translation in a processor system based upon code sequence execution counts |
US6880152B1 (en) | 1999-10-13 | 2005-04-12 | Transmeta Corporation | Method of determining a mode of code generation |
US6748589B1 (en) | 1999-10-20 | 2004-06-08 | Transmeta Corporation | Method for increasing the speed of speculative execution |
WO2001061476A2 (en) * | 2000-02-14 | 2001-08-23 | Chicory Systems, Inc. | System including cpu and code translator for translating code from a second instruction set to a first instruction set |
US6671664B1 (en) * | 2000-02-22 | 2003-12-30 | Hewlett-Packard Development Copany, L.P. | Management of uncommitted register values during random program generation |
US6594821B1 (en) | 2000-03-30 | 2003-07-15 | Transmeta Corporation | Translation consistency checking for modified target instructions by comparing to original copy |
US6349361B1 (en) | 2000-03-31 | 2002-02-19 | International Business Machines Corporation | Methods and apparatus for reordering and renaming memory references in a multiprocessor computer system |
US7389208B1 (en) * | 2000-06-30 | 2008-06-17 | Accord Solutions, Inc. | System and method for dynamic knowledge construction |
GB2367653B (en) | 2000-10-05 | 2004-10-20 | Advanced Risc Mach Ltd | Restarting translated instructions |
US6829719B2 (en) | 2001-03-30 | 2004-12-07 | Transmeta Corporation | Method and apparatus for handling nested faults |
US6820216B2 (en) * | 2001-03-30 | 2004-11-16 | Transmeta Corporation | Method and apparatus for accelerating fault handling |
US7310723B1 (en) | 2003-04-02 | 2007-12-18 | Transmeta Corporation | Methods and systems employing a flag for deferring exception handling to a commit or rollback point |
JP2005032018A (ja) * | 2003-07-04 | 2005-02-03 | Semiconductor Energy Lab Co Ltd | 遺伝的アルゴリズムを用いたマイクロプロセッサ |
US7681046B1 (en) | 2003-09-26 | 2010-03-16 | Andrew Morgan | System with secure cryptographic capabilities using a hardware specific digital secret |
US7694151B1 (en) | 2003-11-20 | 2010-04-06 | Johnson Richard C | Architecture, system, and method for operating on encrypted and/or hidden information |
WO2005106648A2 (en) * | 2004-04-15 | 2005-11-10 | Sun Microsystems, Inc. | Entering scout-mode when speculatiive stores exceed the capacity of the store buffer |
KR100591769B1 (ko) * | 2004-07-16 | 2006-06-26 | 삼성전자주식회사 | 분기 예측 정보를 가지는 분기 타겟 버퍼 |
TWI285841B (en) * | 2004-07-16 | 2007-08-21 | Samsung Electronics Co Ltd | Branch target buffer, branch target buffer memory array, branch prediction unit and processor with a function of branch instruction predictions |
US7406634B2 (en) * | 2004-12-02 | 2008-07-29 | Cisco Technology, Inc. | Method and apparatus for utilizing an exception handler to avoid hanging up a CPU when a peripheral device does not respond |
US8413162B1 (en) * | 2005-06-28 | 2013-04-02 | Guillermo J. Rozas | Multi-threading based on rollback |
US7496727B1 (en) | 2005-12-06 | 2009-02-24 | Transmeta Corporation | Secure memory access system and method |
US7865885B2 (en) * | 2006-09-27 | 2011-01-04 | Intel Corporation | Using transactional memory for precise exception handling in aggressive dynamic binary optimizations |
US8112604B2 (en) * | 2007-12-17 | 2012-02-07 | International Business Machines Corporation | Tracking load store ordering hazards |
US8131953B2 (en) * | 2007-12-17 | 2012-03-06 | International Business Machines Corporation | Tracking store ordering hazards in an out-of-order store queue |
US20100153776A1 (en) * | 2008-12-12 | 2010-06-17 | Sun Microsystems, Inc. | Using safepoints to provide precise exception semantics for a virtual machine |
US9069918B2 (en) * | 2009-06-12 | 2015-06-30 | Cadence Design Systems, Inc. | System and method implementing full-rate writes for simulation acceleration |
US20120173843A1 (en) * | 2011-01-04 | 2012-07-05 | Kamdar Chetan C | Translation look-aside buffer including hazard state |
US9442735B1 (en) * | 2012-04-13 | 2016-09-13 | Marvell International Ltd. | Method and apparatus for processing speculative, out-of-order memory access instructions |
JP2016081169A (ja) * | 2014-10-14 | 2016-05-16 | 富士通株式会社 | 情報処理装置、データ処理システム、データ処理管理プログラム、及び、データ処理管理方法 |
US10296343B2 (en) | 2017-03-30 | 2019-05-21 | Intel Corporation | Hybrid atomicity support for a binary translation based microprocessor |
US20220413870A1 (en) * | 2021-06-25 | 2022-12-29 | Intel Corporation | Technology For Optimizing Memory-To-Register Operations |
DE102022003674A1 (de) * | 2022-10-05 | 2024-04-11 | Mercedes-Benz Group AG | Verfahren zum statischen Allozieren von lnformationen zu Speicherbereichen, informationstechnisches System und Fahrzeug |
US20240320110A1 (en) * | 2023-03-24 | 2024-09-26 | Western Digital Technologies, Inc. | Securely erasing data on inoperative storage device |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3863228A (en) * | 1973-12-13 | 1975-01-28 | Honeywell Inf Systems | Apparatus for detecting and elminating a transfer of noise records to a data processing apparatus |
US4467411A (en) * | 1981-03-06 | 1984-08-21 | International Business Machines Corporation | Scheduling device operations in a buffered peripheral subsystem |
US4458316A (en) * | 1981-03-06 | 1984-07-03 | International Business Machines Corporation | Queuing commands in a peripheral data storage system |
JPS59117800A (ja) * | 1982-12-25 | 1984-07-07 | Fujitsu Ltd | バツフア・ストレ−ジの1ビツトエラ−処理方式 |
US4607331A (en) * | 1983-05-13 | 1986-08-19 | Motorola, Inc. | Method and apparatus for implementing an algorithm associated with stored information |
JP2679715B2 (ja) * | 1989-06-30 | 1997-11-19 | 富士通株式会社 | データ転送方法 |
US5463767A (en) * | 1992-02-24 | 1995-10-31 | Nec Corporation | Data transfer control unit with memory unit fault detection capability |
DE69429612T2 (de) * | 1993-10-18 | 2002-09-12 | Via-Cyrix, Inc. | Schreibpuffer für einen superskalaren Mikroprozessor mit Pipeline |
US5566298A (en) * | 1994-03-01 | 1996-10-15 | Intel Corporation | Method for state recovery during assist and restart in a decoder having an alias mechanism |
US5517615A (en) * | 1994-08-15 | 1996-05-14 | Unisys Corporation | Multi-channel integrity checking data transfer system for controlling different size data block transfers with on-the-fly checkout of each word and data block transferred |
US5564111A (en) * | 1994-09-30 | 1996-10-08 | Intel Corporation | Method and apparatus for implementing a non-blocking translation lookaside buffer |
-
1996
- 1996-12-23 US US08/772,686 patent/US6011908A/en not_active Expired - Lifetime
-
1997
- 1997-12-12 EP EP97951635A patent/EP1010077A4/en not_active Withdrawn
- 1997-12-12 CA CA002270122A patent/CA2270122C/en not_active Expired - Fee Related
- 1997-12-12 WO PCT/US1997/022768 patent/WO1998028689A1/en active IP Right Grant
- 1997-12-12 CN CN97180942A patent/CN1103079C/zh not_active Expired - Lifetime
- 1997-12-12 KR KR19997005717A patent/KR100394967B1/ko not_active IP Right Cessation
- 1997-12-12 JP JP52883098A patent/JP3537448B2/ja not_active Expired - Fee Related
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100418072C (zh) * | 2004-12-27 | 2008-09-10 | 英特尔公司 | 用于基于填充缓冲器命中来预取的装置和方法 |
CN102200926A (zh) * | 2010-03-24 | 2011-09-28 | 北京兆易创新科技有限公司 | 一种存储器读操作功能的仿真验证方法 |
CN102200926B (zh) * | 2010-03-24 | 2014-05-07 | 北京兆易创新科技股份有限公司 | 一种存储器读操作功能的仿真验证方法 |
CN103502945A (zh) * | 2011-04-07 | 2014-01-08 | 英特尔公司 | 基于旋转的别名保护寄存器中的寄存器分配 |
US9405547B2 (en) | 2011-04-07 | 2016-08-02 | Intel Corporation | Register allocation for rotation based alias protection register |
CN103502945B (zh) * | 2011-04-07 | 2017-09-22 | 英特尔公司 | 基于旋转的别名保护寄存器中的寄存器分配 |
CN112199669A (zh) * | 2020-09-25 | 2021-01-08 | 杭州安恒信息技术股份有限公司 | 一种检测rop攻击的方法和装置 |
CN112199669B (zh) * | 2020-09-25 | 2022-05-17 | 杭州安恒信息技术股份有限公司 | 一种检测rop攻击的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
JP3537448B2 (ja) | 2004-06-14 |
JP2001507151A (ja) | 2001-05-29 |
US6011908A (en) | 2000-01-04 |
CA2270122A1 (en) | 1998-07-02 |
CA2270122C (en) | 2001-09-04 |
EP1010077A4 (en) | 2003-01-22 |
EP1010077A1 (en) | 2000-06-21 |
WO1998028689A1 (en) | 1998-07-02 |
KR100394967B1 (ko) | 2003-08-19 |
CN1103079C (zh) | 2003-03-12 |
KR20000062300A (ko) | 2000-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1103079C (zh) | 用于高级微处理器的门控存储缓冲器 | |
CN1141647C (zh) | 赋予高级微处理器内存储器数据别名的方法和装置 | |
CN1161691C (zh) | 检测被寻址单元预测失败的存储控制器 | |
US7840776B1 (en) | Translated memory protection apparatus for an advanced microprocessor | |
US6031992A (en) | Combining hardware and software to provide an improved microprocessor | |
JP3776132B2 (ja) | マイクロプロセッサの改良 | |
JP3621116B2 (ja) | 先進のプロセッサのための変換メモリ保護装置 | |
EP0998707B1 (en) | Host microprocessor with apparatus for temporarily holding target processor state | |
CN1107909C (zh) | 带临时保存目标处理器状态的装置的主处理器 | |
CN1163826C (zh) | 改进的微处理器 | |
CN1286772A (zh) | 用于高级微处理器的翻译存储器保护装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: KNOWLEDGE VENTURE CAPITAL ROMPLAST-14 O., LTD Free format text: FORMER OWNER: TRANSMITAR CO., LTD Effective date: 20091030 |
|
C41 | Transfer of patent application or patent right or utility model | ||
C56 | Change in the name or address of the patentee |
Owner name: TRANSMITAR CO., LTD Free format text: FORMER NAME: TRANSMITAR CO., LTD. |
|
CP03 | Change of name, title or address |
Address after: California, USA Patentee after: Full simeida LLC Address before: California, USA Patentee before: Transmeta Corp. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20091030 Address after: Nevada Patentee after: TRANSMETA Corp. Address before: California, USA Patentee before: Full simeida LLC |
|
CX01 | Expiry of patent term |
Granted publication date: 20030312 |
|
CX01 | Expiry of patent term |