CN1237441C

CN1237441C - Appts. and method of selectively controlling memory attribute

Info

Publication number: CN1237441C
Application number: CNB031030416A
Authority: CN
Inventors: G·葛兰·亨利; 罗德·E·胡克; 泰瑞·派克斯
Original assignee: INTELLIGENCE FIRST CO
Current assignee: INTELLIGENCE FIRST CO
Priority date: 2002-08-22
Filing date: 2003-01-28
Publication date: 2006-01-18
Anticipated expiration: 2023-01-28
Also published as: TWI245221B; CN1431587A

Abstract

An apparatus and method are provided for extending a microprocessor instruction set to allow for selective override of memory traits at the instruction level. The apparatus includes translation logic and extended execution logic. The translation logic translates an extended instruction into a micro instruction sequence. The extended instruction has an extended prefix and an extended prefix tag. The extended prefix specifies a memory trait for a memory reference prescribed by the extended instruction, where the memory trait for the memory reference cannot be specified by an existing instruction from an existing instruction set. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within the existing instruction set. The extended execution logic is coupled to the translation logic. The extended execution logic receives the micro instruction sequence, and employs the memory trait to execute the memory reference.

Description

Apparatus and method for selectively controlling memory attributes

与相关申请案的对照Comparison with Related Applications

(0001)本发明主张以下美国申请案的优先权：案号10/227527，申请日为2002年8月22日。(0001) This application claims priority to the following US application: Case No. 10/227527, filed August 22, 2002.

(0002)本发明与下列同在申请中的美国专利申请案有关，都具有相同的申请人与发明人。台湾申请案号申请日 DOCKET NUMBER 专利名称 91116957 7/30/02 CNTR：2176 延伸微处理器指令集的装置及方法 91116958 7/30/02 CNTR：2186 执行条件指令的装置及方法 91116956 7/30/02 CNTR：2188 选择性地控制条件码回写的装置及方法 91116959 7/30/02 CNTR：2189 增加微处理器的缓存器数量的装置 91124005 10/18/02 CNTR：2190 延伸微处理器数据模式的装置及方法 91124006 10/18/02 CNTR：2191 延伸微处理器地址模式的装置及方法 CNTR：2192 储存检查的禁止 CNTR：2193 选择性中断的禁止 91124007 10/18/02 CNTR：2195 非暂存存储器参照控制装置 91116672 7/26/02 CNTR：2198 选择性地控制结果回写的装置及方法 (0002) This invention is related to the following co-pending US patent applications, all having the same applicant and inventor. Taiwan application number filing date DOCKET NUMBER patent name 91116957 7/30/02 CNTR: 2176 Device and method for extending microprocessor instruction set 91116958 7/30/02 CNTR: 2186 Device and method for executing conditional instructions 91116956 7/30/02 CNTR: 2188 Device and method for selectively controlling condition code write-back 91116959 7/30/02 CNTR: 2189 Device for increasing the number of registers in a microprocessor 91124005 10/18/02 CNTR: 2190 Apparatus and method for extending microprocessor data schema 91124006 10/18/02 CNTR: 2191 Device and method for extending microprocessor address mode CNTR: 2192 Prohibition of storage inspection CNTR: 2193 Disabling of Selective Interrupts 91124007 10/18/02 CNTR: 2195 Non-scratch memory reference control device 91116672 7/26/02 CNTR: 2198 Apparatus and method for selectively controlling result write-back

技术领域technical field

(0003)本发明是有关微电子的领域，尤指一种能将延伸地址模式控制纳入一既有的微处理器指令集架构的技术。(0003) The present invention relates to the field of microelectronics, especially a technology capable of incorporating extended address mode control into an existing microprocessor instruction set architecture.

背景技术Background technique

(0004)自1970年代初以来，微处理器的使用即呈指数般成长。从最早应用于科学与技术的领域，到如今已从那些特殊领域引进商业的消费者领域，如桌上型与膝上型(laptop)计算机、视频游戏控制器以及许多其它常见的家用与商用装置等产品。(0004) Since the early 1970s, the use of microprocessors has grown exponentially. From its earliest applications in the fields of science and technology, to the consumer field that has now been introduced into commerce from those specialized fields such as desktop and laptop computers, video game controllers, and many other common home and business devices and other products.

(0005)随着使用上的爆炸性成长，在技术上也历经一相对应的提升，其特征在于对下列项目有着日益升高的要求：更快的速度、更强的寻址能力、更快的存储器存取、更大的操作数，更多种一般用途类型的运算(如浮点运算、单一指令多重数据(SIMD)、条件移动等)以及附加的特殊用途运算(如数字信号处理功能及其它多媒体运算)。如此造就了该领域中惊人的技术进展，且都已应用于微处理器的设计，像扩充流水线化(extensive pipelining)、超纯量架构(super-scalar architecture)、快取结构、乱序处理(out-of-order processing)、爆发式存取(burst access)装置、分支预测(branch predication)以及假想执行(speculative execution)。总之，比起30年前刚出现时，现在的微处理器呈现出惊人的复杂度，且具备了强大的能力。(0005) With the explosive growth in use, there has also been a corresponding improvement in technology, characterized by increasing requirements for the following items: faster speed, stronger addressability, faster memory access, larger operands, more general-purpose types of operations (such as floating-point operations, single instruction multiple data (SIMD), conditional moves, etc.), and additional special-purpose operations (such as digital signal processing functions and other multimedia computing). This has resulted in amazing technological advances in the field, and has been applied to the design of microprocessors, such as extended pipelining (extensive pipelining), super-scalar architecture (super-scalar architecture), cache structure, out-of-order processing ( out-of-order processing), burst access devices, branch prediction, and speculative execution. In short, today's microprocessors are astonishingly complex and powerful compared to when they first appeared 30 years ago.

(0006)但与许多其它产品不同的是，有另一非常重要的因素已限制了，并持续限制着微处理器架构的演进。现今微处理器会如此复杂，一大部分得归因于这项因素，即旧有软件的兼容性。在市场考量下，所多制造商选择将新的架构特征纳入最新的微处理器设计中，但同时在这些最新的产品中，又保留了所有为确保兼容于较旧的、即所谓“旧有”(legacy)应用程序所必需的能力。(0006) But unlike many other products, another very important factor has limited and continues to limit the evolution of microprocessor architectures. Much of the complexity of today's microprocessors is due to this factor, legacy software compatibility. Under market considerations, many manufacturers choose to incorporate new architectural features into the latest microprocessor designs, but at the same time, in these latest products, they retain all the features required to ensure compatibility with older, so-called "legacy" microprocessors. "(legacy) capabilities required by the application.

(0007)这种旧有软件兼容性的负担，没有其它地方，会比在x86-兼容的微处理器的发展史中更加显而易见。大家都知道，现在的32/16位的虚拟模式(virtual-mode)x86微处理器，仍可执行1980年代所撰写的8位真实模式(real-mode)的应用程序。而本领域的技术人员也承认，有不少相关的架构“包袱”堆在x86架构中，只为了支持与旧有应用程序及运作模式的兼容性。虽然在过去，研发者可将新开发的架构特征加入既有的指令集架构，但如今使用这些特征所凭借的工具，即可程序化的指令，却变得相当稀少。更简单他说，在某些重要的指令集中，已没有“多余”的指令，让设计者可借以将更新的特征纳入一既有的架构中。(0007) Nowhere is this burden of legacy software compatibility more apparent than in the history of x86-compatible microprocessors. Everyone knows that today's 32/16-bit virtual-mode x86 microprocessors can still execute 8-bit real-mode applications written in the 1980s. Those skilled in the art also admit that there is a lot of related architectural "baggage" piled up in the x86 architecture just to support compatibility with legacy applications and operating modes. While in the past developers could add newly developed architectural features to existing instruction set architectures, today the tools to use these features, ie, programmable instructions, are relatively rare. More simply, he said, in some important instruction sets, there are no "redundant" instructions that allow designers to incorporate newer features into an existing architecture.

(0008)例如，在x86指令集架构中，已经没有任何一未定义的一字节大小的运算码状态，是尚未被使用的。在主要的一字节大小的x86运算码图中，全部256个运算码状态都已被既有的指令占用了。结果是，x86微处理器的设计者现在必须在提供新特征与保留旧有软件兼容性两者间作抉择。若要提供新的可程序化特征，则必须分派运算码状态给这些特征。若既有的指令集架构没有多余的运算码状态，则某些既存的运算码状态必须重新定义，以提供给新的特征。因此，为了提供新的特征，就得牺牲旧有软件兼容性了。(0008) For example, in the x86 instruction set architecture, there is no undefined one-byte opcode state that is not yet used. In the main one-byte x86 opcode map, all 256 opcode states are already occupied by existing instructions. As a result, designers of x86 microprocessors must now choose between providing new features and preserving legacy software compatibility. To provide new programmable features, opcode states must be assigned to those features. If the existing ISA does not have redundant opcode states, some existing opcode states must be redefined to accommodate new features. Therefore, compatibility with older software has to be sacrificed in order to provide new features.

(0009)现今微处理器设计者所关心的一个领域，为应用程序如何有效率地使用高速缓存结构。随着快取技术的演进，已提供越来越多的特征，其允许系统程序员可控制一系统中高速缓存何时及如何被使用。早期的快取控制特征仅提供开/关的能力。通过设定微处理器的一内部缓存器，或通过将其封装体(package)上的某外部信号脚位设为真，设计者可将存储器的快取致能，或将整个存储器空间设定为不可快取(uncacheable)。对于不可快取的存储器参照(memory reference)(即加载/读取与储存/写入)，则皆送至系统存储器总线，而产生与外在总线架构相同的等待时间(latency)。相反地，存储器对于一高速缓存的参照或存取，只有在一快取未中(cache miss)发生时(亦即，一存储器参照的目标在内部高速缓存内并非有效)，才被送至系统存储器总线。快取特征使得应用程序在执行速度上大幅提升，特别是应用程序对存储器中相同的数据结构进行重复参照时。(0009) One area of concern to microprocessor designers today is how efficiently applications use cache structures. As caching technology has evolved, more and more features have been provided that allow system programmers to control when and how caches are used in a system. Early cache control features provided only on/off capability. By setting an internal register of the microprocessor, or by setting an external signal pin on its package (package) to true, the designer can enable the cache of the memory, or set the entire memory space to It is not cacheable (uncacheable). Non-cacheable memory references (ie, load/read and store/write) are sent to the system memory bus, resulting in the same latency as the external bus architecture. In contrast, memory references or accesses to a cache are sent to the system only when a cache miss occurs (i.e., the target of a memory reference is not valid in the internal cache) memory bus. The caching feature allows applications to run much faster, especially when the application repeatedly references the same data structures in memory.

(0010)最近微处理器架构上的改进，已使得系统设计者能更精确地控制如何使用快取特征。这些改进允许设计者在微处理器的地址空间内，定义一个范围的地址的性质，其中，此定义是以微处理器对这些地址的参照是如何依其快取层级架构(cache hierarchy)执行的方式进行。一般而言，对这些地址的参照可被定义为不可快取、复合写入(write combining)、写透(write through)。回写(write back)或写入保护(write protected)。这些性质称为存储器属性(attribute)，或存储器特性(trait)。因此，具有回写属性的地址的储存参照，会被送到高速缓存，并假想地(speculatively)分派至其中的储存位置。对具有不可快取属性的另一地址的储存参照，则送至系统总线，且不会进行假想地分派储存位置的动作。(0010) Recent improvements in microprocessor architecture have given system designers more precise control over how cache features are used. These improvements allow the designer to define the properties of a range of addresses within the microprocessor's address space, where the definition is how references to those addresses by the microprocessor are performed according to its cache hierarchy way. In general, references to these addresses can be defined as non-cacheable, write combining, and write through. Write back or write protected. These properties are called memory attributes (attributes), or memory characteristics (traits). Therefore, the storage reference of the address with the write-back attribute will be sent to the cache, and speculatively assigned to the storage location therein. A storage reference to another address with the non-cacheable attribute is sent to the system bus without speculative allocation of storage locations.

(0011)不过，对于存储器属性及特定属性如何由微处理器借其高速缓存加以处理，提供一深度的说明，则不在本申请案的范围内。此处去了解本技术领域目前所能使设计者指派一存储器属性予一存储器区域，以及所有后续对该区域内地址的存储器参照，将依据关联于该指定存储器属性的快取原则(cachepolicy)来处理，如此即已足够。(0011) However, it is outside the scope of this application to provide an in-depth description of memory attributes and how specific attributes are handled by a microprocessor with its caches. It is understood here that the art currently enables designers to assign a memory attribute to a memory region, and all subsequent memory references to addresses within the region will be based on the cache policy associated with the specified memory attribute deal with it, that's enough.

(0012)虽然现代的微处理器设计允许存储器的不同区域被赋予不同的存储器特性，但在两个重要方面，设计上仍受限制，第一，微处理器指令集架构限制了用以定义/改变存储器特性至使用者层级(user-level)的应用程序所无法存取的一(privilege)层级的指令执行。因此，当一桌上型/膝上型微处理器激活时，其操作系统在任何使用者层级应用程序开启前，便将物理存储器空间的存储器特性建立好。因而使用者层级的应用程序便不能改变主机系统的存储器特性。第二，在现代.微处理器中，用来建立存储器特性的最佳处理层级为分页层级。在常用的允许存储器分页(memory paging)的架构中，每一存储器分页的存储器属性，由操作系统在分页目录/表(page directory/tabie)的项目内作进一步界定。因此，所有对于一特定分页内地址的参照，将使用于该相关存储器存取运算执行时所赋予的存储器属性。(0012) Although modern microprocessor designs allow different memory areas to be assigned different memory characteristics, the design is still limited in two important respects. First, the microprocessor instruction set architecture limits the ability to define/ Command execution that changes memory characteristics to a privilege level that is inaccessible to user-level applications. Thus, when a desktop/laptop microprocessor is activated, its operating system establishes the memory characteristics of the physical memory space before any user-level applications are started. Therefore, user-level applications cannot change the memory characteristics of the host system. Second, in modern microprocessors, the best processing level for establishing memory characteristics is the paging level. In common architectures that allow memory paging, the memory attributes of each memory page are further defined by the operating system in entries in the page directory/table. Thus, all references to addresses within a particular page will use the memory attribute assigned when the associated memory access operation was performed.

(0013)对许多应用程序而言，上述的控制特征虽可让使用者层级的应用程序在执行速度上有明显的改进，但本案发明人注意到，就其它的应用程序而言仍会有所限制，这除了因为在使用者层级上，并无法应用现代的存储器特性控制，也因为存储器属性仅能依分页层级(page-level)的单位来建立。例如，一个对一第一数据结构作重复存取的使用者程序，在对一第二数据结构进行一附带的参照时，若第一数据结构的快取项目必须清除，以空出高速缓存的空间供第二数据结构使用，则该使用者程序的执行效率会因而受到影响。由于操作系统并未预知使用者层级的应用程序对于数据结构的参照频率，应用程序的数据空间一般皆被赋予一回写特性，因而促成了前述冲突的产生条件。程序员并没有用来更改数据空间特性的工具，以强迫该附带参照转送至存储器总线(例如，赋予不可快取的特性给该第二数据结构)，而排除该冲突。(0013) For many application programs, although the above-mentioned control features can significantly improve the execution speed of user-level application programs, the inventors of this case have noticed that there will still be some improvement in terms of other application programs. limitation, not only because modern memory property controls cannot be applied at the user level, but also because memory properties can only be established in page-level units. For example, if a user program that repeatedly accesses a first data structure makes an incidental reference to a second data structure, if the cache entry for the first data structure must be flushed to free up the cached If the space is used by the second data structure, the execution efficiency of the user program will be affected accordingly. Because the operating system does not predict the reference frequency of the user-level application program to the data structure, the data space of the application program is generally given a write-back feature, thus contributing to the aforementioned conflict. Programmers do not have tools for changing data space properties to force the incidental reference to be forwarded to the memory bus (eg, assigning non-cacheable properties to the second data structure) to eliminate the conflict.

(0014)因此，我们所需要的是，一种可将选择性的存储器属性控制特征纳入既有微处理器指令集架构的装置及方法，其中该微处理器指令集是被已定义的运算码完全占用，且纳入该属性控制特征除了不影响一符合旧有规格的微处理器执行旧有应用程序的能力，同时还提供程序员修改存储器属性的能力。(0014) What is needed, therefore, is an apparatus and method for incorporating selective memory attribute control features into an existing microprocessor instruction set architecture, wherein the microprocessor instruction set is a defined opcode Full occupancy, and the incorporation of this attribute control feature, in addition to not affecting the ability of a microprocessor conforming to legacy specifications to execute legacy applications, also provides programmers with the ability to modify memory attributes.

发明内容Contents of the invention

(0015)本发明如同前述其它申请案，是针对上述及其它公知技术的问题与缺点加以克服。本发明提供一种更好的技术，用以扩充微处理器的指令集，使其超越现有的能力。提供指令层级的存储器特性控制特征。在一具体实施例中，提供了一种可在微处理器内进行存储器属性的指令层级控制的装置。该装置包括一转译器(translation logic)与一延伸执行器(extended execution logic)。该转译器将一延伸指令转译成一微指令序列(micro instruction sequence)。该延伸指令具一延伸前置码(extended prefix)与一延伸前置码标记(extendedprefix tag)。该延伸前置码对于该延伸指令所指定的一存储器参照，指定一存储器特性，其中该存储器参照的存储器特性不能由一既有指令集的一既有指令来指定。该延伸前置码包括一属性字段，用以指定该存储器特性，其中该存储器特性包括数个存储器属性其中之一。该延伸前置码标记则指出该延伸前置码，其中延伸前置码标记是原本该既有指令集内另一依据架构所指定的运算码。该延伸执行器耦接至转译器，用以接收该微指令序列，并应用该存储器特性来执行该存储器参照。其中该转译器包括：一逸出指令检测器，用于检测该延伸前置码标记；一指令译码器，用以确定所要执行的一运算，其中该运算包括该存储器参照；以及一延伸译码器，耦接至该逸出指令检测器与该指令译码器，用以确定该存储器特性，并于该微指令序列内指定该存储器特性。(0015) The present invention, like other aforementioned applications, is to overcome the problems and shortcomings of the above-mentioned and other known technologies. The present invention provides a better technique for extending the instruction set of a microprocessor beyond its current capabilities. Provides instruction-level memory feature control features. In one embodiment, an apparatus for instruction-level control of memory attributes within a microprocessor is provided. The device includes a translation logic and an extended execution logic. The translator translates an extended instruction into a micro instruction sequence. The extended command has an extended prefix and an extended prefix tag. The extension prefix specifies a memory property for a memory reference specified by the extended instruction, wherein the memory reference has a memory property that cannot be specified by an existing instruction of an existing instruction set. The extended prefix includes an attribute field for specifying the memory characteristic, wherein the memory characteristic includes one of several memory attributes. The extended preamble tag indicates the extended preamble, wherein the extended preamble tag is an operation code originally specified according to another architecture in the existing instruction set. The extended executor is coupled to the translator for receiving the microinstruction sequence and using the memory property to execute the memory reference. Wherein the translator includes: an escaped instruction detector for detecting the extended preamble tag; an instruction decoder for determining an operation to be performed, wherein the operation includes the memory reference; and an extended translation An encoder, coupled to the escaped instruction detector and the instruction decoder, is used to determine the memory characteristic and specify the memory characteristic in the microinstruction sequence.

(0016)本发明的一个目的，是提出一种延伸既有指令集以提供存储器特性的选择性控制的微处理器装置。该微处理器装置具有一转译器(translator)，该转译器配置为接收延伸指令。该延伸指令指定一存储器存取的存储器属性，其中该延伸指令包括该既有微处理器指令集其中一选取的运算码，其后则接着一n位的延伸前置码。该选取的运算码指出该延伸指令，而该n位的延伸前置码则指出该存储器属性。该存储器存取的存储器属性不能另依该既有指令集的指令加以指定。其中该n位的延伸前置码包括一存储器特性字段，配置为指定该存储器属性，其中该存储器属性包括数个存储器存取特性其中之一。该转译器接收该延伸指令，并产生一微指令序列，以指示微处理器执行该存储器存取，其中该存储器存取将依该存储器属性执行。所述转译器包括：一逸出指令检测器，用以检测该延伸指令内的该选取的运算码；一指令译码器，用以译码该延伸指令的其余部分，以确定该存储器存取；以及一延伸译码器，耦接至该逸出指令检测器及该指令译码器，用以译码该n位的延伸前置码，并于该微指令序列内指定该存储器属性。(0016) It is an object of the present invention to propose a microprocessor device that extends the existing instruction set to provide selective control of memory characteristics. The microprocessor device has a translator configured to receive extended instructions. The extended instruction specifies memory attributes for a memory access, wherein the extended instruction includes a selected opcode in the existing microprocessor instruction set, followed by an n-bit extended preamble. The selected opcode specifies the extended instruction, and the n-bit extended preamble specifies the memory attribute. The memory attribute of the memory access cannot otherwise be specified by an instruction of the existing instruction set. Wherein the n-bit extended preamble includes a memory attribute field configured to specify the memory attribute, wherein the memory attribute includes one of several memory access attributes. The translator receives the extended instruction and generates a microinstruction sequence to instruct the microprocessor to perform the memory access, wherein the memory access will be performed according to the memory attribute. The translator includes: an escaped instruction detector for detecting the selected opcode within the extended instruction; an instruction decoder for decoding the remainder of the extended instruction to determine the memory access ; and an extended decoder, coupled to the escaped instruction detector and the instruction decoder, for decoding the n-bit extended preamble and specifying the memory attribute within the microinstruction sequence.

(0017)本发明的另一目的，在于提出一种为既有指令集增添指令层级的存储器特性控制特征的装置。该装置包括一转译器，该转译器接收一指令并对其进行转译，该指令包括一逸出标记和一组附随部分，其中该逸出标记为该既有指令集内的一运算码；该随附部分包括一存储器特性指定符，用以指定数个存储器特性其中之一予该存储器存取；以及一延伸执行器，耦接至该转译器，利用所指定的存储器特性执行该存储器存取，其中该既有指令集仅指定了用于该存储器存取的一预设存储器特性，且其中该延伸执行器应用所指定的存储器特性取代该预设存储器特性。其中，该转译器包括：一逸出标记检测器，用以检测该逸出标记，并指示该附随部分的转译动作需依据延伸转译约定；以及一译码器，耦接至该逸出标记检测器，用以依据该既有指令集的约定，执行指令的转译动作，并依据该延伸转译约定执行该指令的转译，以依据所指定的存储器特性，致能该存储器存取的执行。(0017) Another object of the present invention is to provide an apparatus for adding an instruction-level memory characteristic control feature to an existing instruction set. The apparatus includes a translator that receives and translates an instruction including an escape flag and a set of accompanying parts, wherein the escape flag is an opcode within the existing instruction set; the The accompanying part includes a memory property specifier for specifying one of several memory properties for the memory access; and an extension executor coupled to the translator for performing the memory access using the specified memory property , wherein the legacy instruction set only specifies a default memory property for the memory access, and wherein the extended executor applies the specified memory property instead of the default memory property. Wherein, the translator includes: an escape flag detector, used to detect the escape flag, and indicate that the translation action of the accompanying part needs to be based on the extended translation agreement; and a decoder, coupled to the escape flag detector The device is used for executing the translation of the instruction according to the agreement of the existing instruction set, and performing the translation of the instruction according to the extended translation agreement, so as to enable the execution of the memory access according to the specified memory characteristic.

(0018)本发明的再一目的，在于提供一种扩充既有指令集架构的方法，以致能指令层级的选择性存储器属性控制。该方法包括：提供一延伸指令，该延伸指令包括一延伸指令标记、一延伸前置码及其余部分，其中该延伸指令标记是该既有指令集架构其中一第一运算码项目；通过该延伸前置码指定要应用于一对应存储器存取的一存储器属性，其中该存储器存取是由该延伸指令的其余部分所指定；该延伸前置码包括一属性字段，用以指定该存储器特性，其中该存储器特性包括数个存储器属性其中之一；以及应用该存储器属性以执行该存储器存取，其中该应用动作使用该存储器属性取代该存储器存取的一预设存储器属性。(0018) Another object of the present invention is to provide a method for extending an existing ISA to enable selective memory attribute control at the instruction level. The method includes: providing an extended instruction, the extended instruction including an extended instruction tag, an extended preamble and other parts, wherein the extended instruction tag is a first opcode item of the existing instruction set architecture; through the extending the preamble specifies a memory attribute to be applied to a corresponding memory access specified by the remainder of the extended instruction; the extended preamble includes an attribute field specifying the memory attribute, Wherein the memory property includes one of several memory properties; and applying the memory property to perform the memory access, wherein the apply action replaces a default memory property of the memory access with the memory property.

附图说明Description of drawings

(0019)本发明的前述与其它目的、特征及优点，在配合下列说明及所附图标后，将可获得更好的理解：(0019) The aforementioned and other purposes, features and advantages of the present invention will be better understood after coordinating the following descriptions and the accompanying icons:

(0020)图1为一相关技术的微处理器指令格式的方块图；(0020) Fig. 1 is a block diagram of the microprocessor instruction format of a related art;

(0021)图2为一表格，其描述一指令集架构中的指令，如何对应至图1指令格式内一8位运算码字节的位器状态；(0021) Fig. 2 is a table, which describes the instructions in an instruction set architecture, how to correspond to the bit register state of an 8-bit operation code byte in the instruction format of Fig. 1;

(0022)图3为本发明的延伸指令格式的方块图；(0022) Fig. 3 is a block diagram of the extended instruction format of the present invention;

(0023)图4为一表格，其显示依据本发明，延伸架构特征如何对应至一8位延伸前置码实施例中位的器状态；(0023) FIG. 4 is a table showing how extended architecture features map to bit register states in an 8-bit extended preamble embodiment in accordance with the present invention;

(0024)图5为解说本发明应用选择性的存储器属性控制的一流水线化微处理器的方块图；(0024) FIG. 5 is a block diagram illustrating a pipelined microprocessor employing selective memory attribute control of the present invention;

(0025)图6为本发明用于指定一微处理器中的扩充存储器特性的延伸前置码的一具体实施例的方块图；(0025) FIG. 6 is a block diagram of an embodiment of an extended preamble for specifying extended memory characteristics in a microprocessor according to the present invention;

(0026)图7为本发明用于指定一微处理器中的扩充存储器特性的延伸前置码的另一具体实施例的方块图；(0026) FIG. 7 is a block diagram of another embodiment of an extended preamble for specifying extended memory characteristics in a microprocessor according to the present invention;

(0027)图8为一表格，其解说图7延伸前置码中典型存储器特性的一编码范例；(0027) FIG. 8 is a table illustrating an encoding example of typical memory characteristics in the extended preamble of FIG. 7;

(0028)图9为图5微处理器内转译阶段器的具体的方块图；(0028) Fig. 9 is the concrete block diagram of translation stage device in Fig. 5 microprocessor;

(0029)图10为图5的微处理器内延伸执行器的方块图；以及(0029) FIG. 10 is a block diagram of the extended actuator in the microprocessor of FIG. 5; and

(0030)图11为描述本发明用于取代一微处理器的存储器特性的方法的运作流程图。(0030) FIG. 11 is a flowchart describing the operation of the method of the present invention for replacing memory features of a microprocessor.

图标说明Icon Description

100 指令格式 101 前置码100 Command Format 101 Preamble

102 运算码 103 地址指定元102 Operation code 103 Address specifying element

200 8位运算码图 201 运算码值200 8-bit opcode map 201 Opcode value

202 运算码F1H202 operation code F1H

300 延伸指令格式 301 前置码300 Extended instruction format 301 Preamble

302 运算码 303 地址指定元302 Operation code 303 Address specifying element

304 延伸指令标记 305 延伸前置码304 Extended Command Mark 305 Extended Preamble

400 8位前置码图 401 架构特征400 8-bit preamble map 401 Architecture features

500 流水线化微处理器 501 提取器500 pipelined microprocessors 501 extractors

502 指令高速缓存/外部存储器502 instruction cache/external memory

503 指令队列 504 转译器503 instruction queue 504 translator

505 延伸转译器 506 微指令队列505 Extended Translator 506 Microinstruction Queue

507 执行器 508 延伸执行器507 Actuator 508 Extended Actuator

600 延伸前置码 601 来源特性字段600 Extended Preamble 601 Source Characteristic Field

602 目的特性字段602 Purpose field

700 存储器属性前置码 701 属性字段700 Storage attribute prefix 701 Attribute field

702 来源位 703 目的位702 source bit 703 destination bit

704 备用字段704 Spare field

800 表格Form 800

900 转译阶段器 901 激活状态信号900 translation stager 901 activation status signal

902 机器特定缓存器 903 延伸特征字段902 Machine Specific Register 903 Extended Feature Field

904 指令缓冲器 905 转译器904 instruction buffer 905 translator

906 转译控制器 907 除能信号906 Translation controller 907 Disable signal

908 逸出指令检测器 909 延伸译码器908 Escape instruction detector 909 Extended decoder

910 指令译码器 911 控制只读存储器910 Instruction Decoder 911 Control ROM

912 微指令缓冲器 913 运算码延伸项字段912 Microinstruction buffer 913 Operation code extension field

914 微运算码字段 915 目的字段914 Micro operation code field 915 Purpose field

916 来源字段 917 位移字段916 source field 917 displacement field

1000 延伸执行阶段器 1001 延伸微指令缓冲器1000 Extended execution stager 1001 Extended micro-instruction buffer

1002 地址缓冲器 1003 地址缓冲器1002 address buffer 1003 address buffer

1004 目的操作数缓冲器 1005 延伸存取器1004 Destination operand buffer 1005 Extended accessor

1006 存储器特性描述符 1007 高速缓存1006 Memory Characteristic Descriptor 1007 Cache

1008 总线单元 1009 存取控制器1008 Bus Unit 1009 Access Controller

1010 储存缓冲器 1011 高速缓存1010 store buffer 1011 cache

1012 总线 1013 总线1012 Bus 1013 Bus

1014 储存缓冲器 1015 来源操作数缓冲器1014 store buffer 1015 source operand buffer

1100～1128用于取代一微处理器的存储器特性的方法的运作流程1100-1128 The operation flow of the method for replacing the memory characteristic of a microprocessor

具体实施方式Detailed ways

(0031)以下的说明，是在一特定实施例及其必要条件的脉络下而提供，可使一般本领域技术人员能够利用本发明。然而，各种对该较佳实施例所作的修改，对本领域技术人员而言乃是显而易见，并且，在此所讨论的一般原理，亦可应用至其它实施例。因此，本发明并不限于此处所展示与叙述的特定实施例，而是具有与此处所公开的原理与新颖特征相符的最大范围。(0031) The following description is provided in the context of a specific embodiment and its prerequisites to enable those of ordinary skill in the art to utilize the present invention. However, various modifications to this preferred embodiment will be readily apparent to those skilled in the art, and the general principles discussed herein can be applied to other embodiments as well. Therefore, the present invention is not limited to the specific embodiments shown and described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

(0032)前文已针对今日的微处理器内，如何扩充其架构特征，以超越相关指令集能力的技术，作了背景的讨论。有鉴于此，在图1与图2，将讨论一相关技术的例子。此处的讨论强调了微处理器设计者所一直面对的两难，即一方面，他们想将最新开发的架构特征纳入微处理器的设计中，但另一方面，他们又要保留执行旧有应用程序的能力。在图1至2的例子中，一完全占用的运算码图，已把增加新运算码至该范例架构的可能性排除，因而迫使设计者要不就选择将新特征纳入，而牺牲某种程度的旧有软件兼容性，要不就将架构上的最新进展一并放弃，以便维持微处理器与旧有应用程序的兼容性。在相关技术的讨论后，于图3至11，将提供对本发明的讨论。通过利用一既有但未使用的运算码作为一延伸指令的前置码标记，本发明可让微处理器设计者克服已完全使用的指令集架构的限制，在允许他们提供程序员于指令层级对一特定存储器参照指派存储器特性的能力的同时，也能保留与旧有应用程序的兼容性。(0032) The previous article has discussed the background of how to expand its architectural features in today's microprocessors to exceed the capabilities of related instruction sets. In view of this, in FIG. 1 and FIG. 2 , an example of related technology will be discussed. The discussion here highlights the constant dilemma that microprocessor designers face when, on the one hand, they want to incorporate newly developed architectural features into their microprocessor designs, but on the other hand, they want to retain the implementation of application capabilities. In the example of Figures 1-2, a fully occupied opcode map has precluded the possibility of adding new opcodes to the example architecture, thereby forcing the designer to either choose to incorporate new features at the expense of some degree of Legacy software compatibility, or abandon the latest developments in the architecture, in order to maintain the compatibility of the microprocessor with legacy applications. Following the discussion of the related art, in Figures 3 through 11, a discussion of the present invention will be provided. By utilizing an existing but unused opcode as a preamble tag for an extended instruction, the present invention allows microprocessor designers to overcome the limitations of fully utilized instruction set architectures, while allowing them to provide programmers with an instruction-level The ability to assign memory characteristics to a particular memory reference while maintaining compatibility with legacy applications.

(0033)请参阅图1，其是一相关技术的微处理器指令格式100的方块图。该相关技术的指令100具有数量可变的数据项101-103，每一项目皆设定成一特定值，合在一起便组成微处理器的一特定指令100。该特定指令100指示微处理器执行一特定运算，例如将两操作数相加，或者是将一操作数从存储器搬移至一内部缓存器，或从该内部缓存器搬移至存储器。一般而言，指令100内的运算码项目102指定了所要执行的特定运算，而选用(optional)的地址指定元项目103位于运算码102之后，以指定关于该特定运算的附加信息，像是如何执行该运算，操作数位于何处等等。指令格式100并允许程序员在一运算码102前加上前置码项目101。在运算码102所指定的特定运算执行时，前置码101用以指示是否使用特定的架构特征。一般来说，这些架构特征能应用于指令集中任何运算码102所指定运算的大部分。例如，现今前置码101存在于一些能使用不同大小虚拟地址(如8位、16位、32位)执行运算的微处理器中。而当许多此类处理器被程序化为一预设的地址大小时(比如32位)，在其个别指令集中所提供的前置码101，仍能使程序员依据各个指令，选择性地取代(override)该预设的地址大小(如为了产生16位的虚拟地址)。可选择的地址大小仅是架构特征的一例，在许多现代的微处理器中，这些架构特征能应用于众多可由运算码102加以指定的运算(如加、减、乘、布尔器等)。(0033) Please refer to FIG. 1 , which is a block diagram of a related art microprocessor instruction format 100 . The instruction 100 of the related art has a variable number of data items 101-103, each of which is set to a specific value, and together constitutes a specific instruction 100 of the microprocessor. The specific instruction 100 instructs the microprocessor to perform a specific operation, such as adding two operands, or moving an operand from memory to an internal register, or from the internal register to memory. In general, the operation code item 102 in the instruction 100 specifies the specific operation to be performed, and the optional (optional) address designation meta-item 103 is located after the operation code 102 to specify additional information about the specific operation, such as how to The operation is performed, where are the operands located, etc. The instruction format 100 also allows the programmer to add a preamble item 101 before an opcode 102 . When the specific operation specified by the operation code 102 is executed, the preamble 101 is used to indicate whether to use a specific architectural feature. In general, these architectural features apply to most of the operations specified by any opcode 102 in the instruction set. For example, preamble 101 exists today in some microprocessors that can perform operations using virtual addresses of different sizes (eg, 8-bit, 16-bit, 32-bit). And when many of these processors are programmed to a preset address size (such as 32 bits), the precode 101 provided in their individual instruction sets still enables programmers to selectively replace (override) the preset address size (for example, to generate a 16-bit virtual address). The selectable address size is just one example of an architectural feature that can be applied to numerous operations (eg, add, subtract, multiply, Boolean, etc.) that can be specified by opcodes 102 in many modern microprocessors.

(0034)图1所示的指令格式100，有一为业界所熟知的范例，此即x86指令格式100，其为所有现代的x86-兼容微处理器所采用。更具体他说，x86指令格式100(也称为x86指令集架构100)使用了8位前置码101、8位运算码102以及8位地址指定元103。x86架构100亦具有数个前置码101，其中两个取代了x86微处理器所预设的地址/数据大小(即运算码状态66H与67H)，另一个则指示微处理器依据不同的转译规则来解译其后的运算码字节102(即前置码值0FH，其使得转译动作是依据所谓的二字节运算码规则来进行)，其它的前置码101则使特殊运算重复执行，直至重复条件满足为止(即REP运算码：F0H、F2H及F3H)。(0034) The instruction format 100 shown in FIG. 1 has a well-known example in the industry, the x86 instruction format 100, which is used by all modern x86-compatible microprocessors. More specifically, he said that the x86 instruction format 100 (also known as the x86 instruction set architecture 100 ) uses an 8-bit preamble 101 , an 8-bit operation code 102 and an 8-bit address designator 103 . The x86 architecture 100 also has several preambles 101, two of which replace the default address/data size of the x86 microprocessor (ie opcode states 66H and 67H), and the other instructs the microprocessor to follow a different translation rules to interpret the following operation code byte 102 (that is, the preamble value 0FH, which makes the translation action be performed according to the so-called two-byte operation code rule), and other preamble codes 101 make the special operation repeated , until the repetition condition is satisfied (that is, the REP operation code: F0H, F2H and F3H).

(0035)现请参阅图2，其显示一表格200，用以描述一指令集架构的指令201如何对应至图1指令格式内一8位运算码字节102的位值。表格200呈现了一8位运算码图200的范例，其将一8位运算码项目102所具有的最多256个值，关联到对应的微处理器运算码指令201。表格200将运算码项目102的一特定值，比如02H，映像至一对应的运算码指令201(即指令I02 201)。在x86运算码图的例子中，为此领域技术人员所熟知的是，运算码值14H是映像至x86的进位累加(Add with Carry，ADC)指令，此指令将一8位的直接(immediate)操作数加至架构缓存器AL的内含值。本领域的技术人员也将发觉，上文提及的x86前置码101(亦即66H、67H、0FH、F0H、F2H及F3H)是实际的运算码值201，其在不同脉络下，指定要将特定的架构延伸项应用于随后的运算码项目102所指定的运算。例如，在运算码14H(正常情况下，是前述的ADC运算码)前加上前置码0FH，会使得x86处理器执行一“解压缩与插入低压缩的单精度浮点值”(Unpack and Interleave Low PackedSingle-Precision Floating-Point Values)运算，而非原本的ADC运算。诸如此x86例子所述的特征，在现代的微处理器中是部分地致能，此因微处理器内的指令转译/译码器是依序解译一指令100的项目101-103。所以在过去，于指令集架构中使用特定运算码值作为前置码101，可允许微处理器设计者将不少先进的架构特征纳入兼容旧有软件的微处理器的设计中，而不会对未使用那些特定运算码状态的旧有程序，带来执行上的负面冲击。例如，一未曾使用x86运算码0FH的旧有程序，仍可在今日的x86微处理器上执行。而一较新的应用程序，借着运用x86运算码0FH作为前置码101，就能使用许多新进纳入的x86架构特征，如单一指令多重数据(SIMD)运算，条件移动运算等等。(0035) Please refer to FIG. 2, which shows a table 200 for describing how an instruction 201 of an instruction set architecture corresponds to a bit value of an 8-bit opcode byte 102 in the instruction format of FIG. 1 . Table 200 presents an example of an 8-bit opcode map 200 that associates up to 256 values of an 8-bit opcode entry 102 to corresponding microprocessor opcode instructions 201 . The table 200 maps a specific value of the operation code item 102, such as 02H, to a corresponding operation code instruction 201 (ie, instruction I02 201). In the example of the x86 opcode diagram, as is well known to those skilled in the art, the opcode value 14H is mapped to the x86 Add with Carry (ADC) instruction, which converts an 8-bit immediate The operand is added to the contained value of the architectural register AL. Those skilled in the art will also find that the x86 preamble 101 mentioned above (ie, 66H, 67H, 0FH, F0H, F2H, and F3H) is the actual opcode value 201, which, in a different context, specifies the desired The particular architectural extension is applied to the operation specified by the opcode entry 102 that follows. For example, adding the preamble 0FH before the operation code 14H (normally, it is the aforementioned ADC operation code) will cause the x86 processor to perform an "unpack and insert low-compression single-precision floating-point value" (Unpack and Interleave Low PackedSingle-Precision Floating-Point Values) operation, not the original ADC operation. Features such as those described for the x86 example are partially enabled in modern microprocessors because the instruction translator/decoder within the microprocessor interprets the items 101-103 of an instruction 100 in sequence. So in the past, using a specific opcode value as the preamble 101 in the instruction set architecture allowed microprocessor designers to incorporate many advanced architectural features into the design of a microprocessor compatible with legacy software without A negative impact on the execution of legacy programs that do not use those particular opcode states. For example, an old program that never used the x86 opcode 0FH can still execute on today's x86 microprocessors. And a newer application, by using the x86 opcode 0FH as the prefix 101, can use many newly incorporated x86 architecture features, such as single instruction multiple data (SIMD) operations, conditional move operations, and so on.

(0036)尽管过去已通过指定可用/多余的运算码值201作为前置码101(也称为架构特征标记/指针101或逸出指令101)，来提供架构特征，但许多指令集架构100在提供功能上的强化时，仍会因为一非常直接的理由，而碰到阻碍：所有可用/多余的运算码值已被用完，也就是，运算码图200中的全部运算码值已被架构化地指定。当所有可用的值被分派为运算码项目102或前置码项目101时，就没有剩余的运算码值可作为纳入新特征之用。这个严重的问题存在于现在的许多微处理器架构中，因而迫使设计者得在增添架构特征与保留旧有程序的兼容性两者间作抉择。(0036) Although architectural features have been provided in the past by designating available/redundant opcode values 201 as preambles 101 (also known as architectural signatures/pointers 101 or escaped instructions 101), many instruction set architectures 100 in Providing functional enhancements is still hindered for a very straightforward reason: all available/redundant opcode values have been used up, i.e. all opcode values in opcode map 200 have been constructed specified. When all available values are assigned as opcode entry 102 or preamble entry 101, there are no remaining opcode values available for incorporating new features. This serious problem exists in many microprocessor architectures today, forcing designers to choose between adding architectural features and retaining compatibility with legacy programs.

(0037)值得注意的是，图2所示的指令201是以一般性的方式表示(亦即I24、I86)，而非具体指涉实际的运算(如进位累加、减、异或)。这是因为，在一些不同的微处理器架构中，完全占用的运算码图200在架构上，已将纳入较新进展的可能性排除。虽然图2例子所提到的是8位的运算码项目102，本领域的技术人员仍将发觉，运算码102的特定大小，除了作为一特殊情况来讨论完全占用的运算码结构200所造成的问题外，其它方面与问题本身并不相干。因此，一完全占用的6位运算码图将有64个可架构化地指定的运算码/前置码201，并将无法提供可用/多余的运算码值作为扩充之用。(0037) It is worth noting that the instruction 201 shown in FIG. 2 is expressed in a general way (ie I24, I86), rather than specifically referring to actual operations (such as carry accumulation, subtraction, XOR). This is because, in several different microprocessor architectures, the fully occupied opcode map 200 has architecturally precluded the possibility of incorporating newer advances. Although the example of FIG. 2 refers to an 8-bit opcode entry 102, those skilled in the art will recognize that the particular size of the opcode 102 results from the fully occupied opcode structure 200 being discussed as a special case. Apart from the problem, other aspects are irrelevant to the problem itself. Thus, a fully occupied 6-bit opcode map would have 64 architecturally assignable opcodes/preambles 201 and would not provide usable/redundant opcode values for expansion.

(0038)另一种替代做法，则并非将原有指令集完全废弃，以一新的格式100与运算码图200取代，而是只针对一部份既有的运算码201，以新的指令意含取代，如图2的运算码40H至4FH。以这种混合的技术，微处理器就可以单独地以下列两种模式的一运作：其中旧有模式利用运算码40H-4FH，是依旧有规则来解译，或者以另一种改良模式(enhanced mode)运作，此时运算码40H-4FH则依加强的架构规则来解译。此项技术确能允许设计者将新特征纳入设计，然而，当符合旧有规格的微处理器于加强模式运作时，缺点仍旧存在，因为微处理器不能执行任何使用运算码40H-4FH的应用程序。因此，站在保留旧有软件兼容性的立场，兼容旧有软件/加强模式的技术，还是无法接受的。(0038) Another alternative method is not to completely abolish the original instruction set and replace it with a new format 100 and operation code map 200, but only for a part of the existing operation code 201, with new instructions Means substitution, as shown in operation codes 40H to 4FH in Figure 2 . With this hybrid technique, the microprocessor can operate independently in one of two modes: the old mode, using opcodes 40H-4FH, which still have rules to interpret, or in another modified mode ( enhanced mode), at this time, the operation codes 40H-4FH are interpreted according to the enhanced architectural rules. This technology does allow designers to incorporate new features into their designs, however, when a microprocessor conforming to the legacy specification is run in enhanced mode, the disadvantage still exists because the microprocessor cannot execute any application using opcodes 40H-4FH program. Therefore, from the standpoint of retaining the compatibility of old software, it is still unacceptable to be compatible with old software/enhanced technology.

(0039)然而，对于运算码空间已完全占用的指令集200，且该空间涵盖所有于符合旧有规格的微处理器上执行的应用程序的情形，本案发明人已注意到其中运算码201的使用状况，且他们亦观察出，虽然有些指令202是架构化地指定，但未用于能被微处理器执行的应用程序中。图2所述的指令IF1 202即为此现象的一例。事实上，相同的运算码值202(亦即F1H)是映像至未用于x86指令集架构的一有效指令202。虽然该未使用的x86指令202是有效的x86指令202，其指示要在x86微处理器上执行一架构化地指定的运算，但它却未使用于任何能在现代x86微处理器上执行的应用程序。这个特殊的x86指令202被称为电路内模拟断点(In Circuit Emulation Breakpoint)(亦即ICEBKPT，运算码值为F1H)，之前都是专门使用于一种现在已不存在的微处理器模拟设备中。ICE BKPT 202从未用于电路内模拟器的外的应用程序中，并且先前使用ICE BKPT 202的电路内模拟设备已不复存在。因此，在x86的情形下，本案发明人已在一完全占用的指令集架构200内发现一样工具，借着利用一有效但未使用的运算码202，以允许在微处理器的设计中纳入先进的架构特征，而不需牺牲旧有软件的兼容性。在一完全占用的指令集架构200中，本发明利用一架构化地指定但未使用的运算码202，作为一指针标记，以指出其后的一n位前置码，因此允许微处理器设计者可将最多2n个最新发展的架构特征，纳入微处理器的设计中，同时保留与所有旧有软件完全的兼容性。(0039) However, for the instruction set 200 where the opcode space has been fully occupied, and this space covers all applications executed on microprocessors conforming to the old specifications, the inventors of the present case have noticed that the opcode 201 usage, and they also observed that, although some instructions 202 are architecturally specified, they are not used in applications that can be executed by the microprocessor. The instruction IF1 202 described in FIG. 2 is an example of this phenomenon. In fact, the same opcode value 202 (ie, F1H) is mapped to an effective instruction 202 that is not used in the x86 instruction set architecture. Although the unused x86 instruction 202 is a valid x86 instruction 202 that instructs an architecturally specified operation to be performed on an x86 microprocessor, it is not used in any operation that can be performed on a modern x86 microprocessor. application. This special x86 instruction 202 is called In Circuit Emulation Breakpoint (In Circuit Emulation Breakpoint) (also known as ICEBKPT, opcode value is F1H), and was previously used exclusively for a now defunct microprocessor emulation device. middle. The ICE BKPT 202 has never been used in applications other than in-circuit simulators, and previous in-circuit simulation devices using the ICE BKPT 202 no longer exist. Thus, in the case of x86, the present inventors have discovered a tool within a fully occupied instruction set architecture 200 to allow the design of microprocessors to incorporate advanced architectural features without sacrificing legacy software compatibility. In a fully occupied instruction set architecture 200, the present invention utilizes an architecturally specified but unused opcode 202 as a pointer marker to indicate an n-bit preamble to follow, thus allowing microprocessor design One can incorporate up to 2n of the most recently developed architectural features into a microprocessor design while retaining full compatibility with all legacy software.

(0040)本发明通过提供一n位的扩充存储器特性指定符前置码，以使用前置码标记/延伸前置码的概念，因而可允许程序员在一微处理器中，依据每个指令指定一存储器属性予一对应的存储器存取运算。在该对应的存储器存取运算执行时，该存储器属性被用于取代由操作系统程序先前建立的存储器特性描述符表格/装置所指定的一预设属性。本发明现将参照图3至11进行讨论。(0040) The present invention provides an n-bit extended memory characteristic specifier preamble to use the preamble mark/extend the concept of the preamble, thereby allowing the programmer to, in a microprocessor, according to each instruction Assign a memory attribute to a corresponding memory access operation. When the corresponding memory access operation is performed, the memory attribute is used in place of a default attribute specified by the memory property descriptor table/device previously created by the operating system program. The present invention will now be discussed with reference to FIGS. 3 to 11 .

(0041)现请参阅图3，其为本发明的延伸指令格式300的方块图。与图1所讨论的格式100非常近似，该延伸指令格式300具有数量可变的指令项目301-305，每一项目设定为一特定值，集合起来便组成微处理器的一特定指令300。该特定指令300指示微处理器执行一特定运算，像是将两操作数相加，或是将一操作数从存储器搬移至微处理器的缓存器内。一般而言，指令300的运算码项目302指定了所要执行的特定运算，而选用的地址指定元项目303则位于运算码302后，以指定该特定运算的相关附加信息，像是如何执行该运算、操作数所在的缓存器、用于计算来源/结果操作数的存储器地址的直接与间接数据等等。指令格式300亦允许程序员在一运算码302前加上前置码项目301。在运算码302所指定的特定运算执行时，前置码项目301是用来指示是否要使用既有的架构特征。(0041) Please refer to FIG. 3 , which is a block diagram of the extended instruction format 300 of the present invention. Very similar to the format 100 discussed in FIG. 1 , the extended instruction format 300 has a variable number of instruction items 301 - 305 , each of which is set to a specific value and collectively constitutes a specific instruction 300 for the microprocessor. The specific instruction 300 instructs the microprocessor to perform a specific operation, such as adding two operands, or moving an operand from memory to a register of the microprocessor. Generally speaking, the operation code item 302 of the instruction 300 specifies a specific operation to be performed, and the optional address designation meta-item 303 is placed after the operation code 302 to specify additional information related to the specific operation, such as how to perform the operation , the register where the operand is located, the direct and indirect data used to calculate the memory address of the source/result operand, and so on. The instruction format 300 also allows the programmer to prefix an opcode 302 with a prefix item 301 . When the specific operation specified by the opcode 302 is executed, the preamble item 301 is used to indicate whether to use the existing architectural features.

(0042)然而，本发明的延伸指令300是前述图1指令格式100的一超集(superset)，其具有两个附加项目304与305，可被选择性作为指令延伸项，并置于一格式化延伸指令300中所有其余项目301-303之前。这两个附加项目304与305可让程序员能对于延伸指令300所指定的存储器参照指定一存储器特性，其中对应于该存储器参照的该存储器特性是无法另由符合旧有规格微处理器的既有指令集来加以指定。选用项目304与305是一延伸前置码标记304与一扩充存储器特性指定符前置码305。该延伸前置码标记304是一微处理器指令集内另一依据架构所指定的运算码。在一x86的实施例中，该延伸前置码标记304，或称逸出标记304，是用运算码状态F1H，其为早先使用的ICEBKPT指令。逸出标记304向微处理器器指出，该延伸前置码305，或称延伸特征指定元305，是跟随在后，其中该延伸前置码305指定了对应于一指定存储器存取的一存储器属性。在一具体实施例中，逸出标记304指出，一对应延伸指令300的附随部分301-303及305指定了微处理器所要执行的存储器存取。存储器特性指定符305，或称延伸前置码305，指定了数个存储器特性其中的一予该存储器存取。微处理器内的延伸执行器便依据该指定的存储器特性执行该存储器存取，因而取代了原先由其它方式所指定的预设存储器属性，这些其它方式包括使用现代微处理器架构所具有的控制缓存器位、存储器类型缓存器、分页表及其它类型的存储器属性描述符(descriptor)。(0042) However, the extended instruction 300 of the present invention is a superset (superset) of the aforementioned instruction format 100 of FIG. before all remaining items 301-303 in extension instruction 300. These two additional items 304 and 305 allow the programmer to specify a memory property for the memory reference specified by the extended instruction 300, wherein the memory property corresponding to the memory reference cannot otherwise be determined by an existing microprocessor conforming to the legacy specifications. There are instruction sets to specify. Optional items 304 and 305 are an extended preamble flag 304 and an extended memory characteristic specifier preamble 305 . The extended prefix flag 304 is another architecture-specific opcode in a microprocessor instruction set. In an x86 embodiment, the extended preamble flag 304, or escape flag 304, is in opcode state F1H, which was used earlier for the ICEBKPT instruction. The escape flag 304 indicates to the microprocessor that the extended preamble 305, or extended feature specifier 305, which specifies a memory corresponding to a specified memory access, is to follow Attributes. In one embodiment, the escape flag 304 indicates that the accompanying parts 301-303 and 305 of a corresponding extended instruction 300 specify memory accesses to be performed by the microprocessor. A memory property specifier 305, or extended preamble 305, specifies one of several memory properties for the memory to access. An extended actuator within the microprocessor executes the memory access according to the specified memory properties, thus superseding the default memory properties previously specified by other means, including using the controls available in modern microprocessor architectures Register bits, memory type registers, page tables, and other types of memory attribute descriptors.

(0043)此处将本发明的选择性的存储器属性控制技术作个概述。一延伸指令是配置为对一既有微处理器指令集的存储器存取指定一存储器属性，其中该存储器存取的该存储器属性无法另以该既有微处理器指令集的指令来加以指定。该延伸指令包括该既有指令集的运算码/指令304其中的一以及一n位的延伸前置码305。所选取的运算码对指令作为一指针304，以指出指令300是一延伸特征指令300(亦即，其指定了微处理器架构的延伸项)，该n位的特征前置码305则指出该存储器属性。在一具体实施例中，延伸前置码305具八位的大小，最多可指定256种不同的属性或存储器属性与其它延伸特征的组合。n位前置码的实施例，则最多可指定2n种不同的存储器特性。(0043) The selective memory attribute control technique of the present invention is summarized here. An extended instruction is configured to specify a memory attribute for a memory access of an existing microprocessor instruction set, wherein the memory attribute of the memory access cannot otherwise be specified by instructions of the existing microprocessor instruction set. The extended instruction includes one of the opcodes/instructions 304 of the legacy instruction set and an n-bit extended preamble 305 . The selected opcode pair instruction acts as a pointer 304 to indicate that the instruction 300 is an extended feature instruction 300 (i.e., it specifies an extension of the microprocessor architecture), and the n-bit feature prefix 305 indicates the storage properties. In one embodiment, the extended preamble 305 has a size of eight bits and can specify up to 256 different attributes or combinations of memory attributes and other extended features. For an n-bit preamble embodiment, up to 2n different memory characteristics can be specified.

(0044)现请参阅图4，一表格400显示依据本发明，一指定存储器参照的存储器属性如何映像至一8位延伸前置码实施例的位器状态。类似于图2所讨论的运算码图200，图4的表格400呈现一8位的延伸前置码图400的范例，其将一8位延伸前置码项目305的最多256个值，关联到一符合旧有规格的微处理器的对应存储器特性401(如E34、E4D等)。在一x86的具体实施例中，本发明的8位延伸特征前置码305是提供给存储器特性401(亦即E00-EFF)的指令层级控制之用，该些存储器特性401乃现行x86指令集架构于指令层级所未能指定的。(0044) Referring now to FIG. 4, a table 400 shows how the memory attributes of a given memory reference map to the bit register states of an 8-bit extended preamble embodiment in accordance with the present invention. Similar to the opcode map 200 discussed in FIG. 2 , the table 400 of FIG. 4 presents an example of an 8-bit extended preamble map 400 that correlates up to 256 values of an 8-bit extended preamble entry 305 to A corresponding memory characteristic 401 of a microprocessor conforming to an old specification (such as E34, E4D, etc.). In an x86 embodiment, the 8-bit extended feature preamble 305 of the present invention is provided for instruction level control of memory features 401 (ie, E00-EFF) that are part of the current x86 instruction set The architecture is not specified at the instruction level.

(0045)图4所示的延伸特征401是以一般性的方式表示，而非具体指涉实际的特征，此因本发明的技术可应用于各种不同的架构延伸项401与特定的指令集架构。本领域的技术人员将发觉，许多不同的架构特征401，其中一些已于上文提及，可依此处所述的逸出标记304/延伸前置码305技术将其纳入一既有的指令集。图4的8位前置码实施例提供了最多256个不同的特征401，而一n位前置码实施例则具有最多2n个不同特征401的程序化选择。(0045) The extended features 401 shown in FIG. 4 are expressed in a general manner, rather than specifically referring to actual features, so the technology of the present invention can be applied to various architectural extensions 401 and specific instruction sets architecture. Those skilled in the art will recognize that many different architectural features 401, some of which were mentioned above, can be incorporated into an existing instruction according to the escape tag 304/extended preamble 305 technique described herein. set. The 8-bit preamble embodiment of FIG. 4 provides a maximum of 256 different signatures 401 , while an n-bit preamble embodiment has a programmed selection of up to 2n different signatures 401 .

(0046)现请参阅图5，其为解说本发明用以执行选择性的存储器属性控制运算的流水线化微处理器500的方块图。微处理器500具有三个明显的阶段类型：提取、转译及执行。提取阶段具有提取器501，可从指令高速缓存502或外部存储器502提取指令。所提取的指令经由指令队列503送至转译阶段。转译阶段具有转译器504，耦接至一微指令队列506。转译器504包括延伸转译器505。执行阶段则有执行器507，其内具有延伸执行器508。(0046) Referring now to FIG. 5, it is a block diagram illustrating a pipelined microprocessor 500 for performing selective memory attribute control operations according to the present invention. Microprocessor 500 has three distinct types of stages: fetch, translate, and execute. The fetch stage has a fetcher 501 that fetches instructions from an instruction cache 502 or external memory 502 . The fetched instructions are sent to the translation stage via the instruction queue 503 . The translation stage has a translator 504 coupled to a microinstruction queue 506 . Translator 504 includes extended translator 505 . The execution phase has an actuator 507 with an extension actuator 508 inside.

(0047)依据本发明，于运作时，提取器501从指令高速缓存/外部存储器502提取格式化指令，并将这些指令依其执行顺序放入指令队列503中。接着从指令队列503提取这些指令，送至转译器504。转译器504将每一送入的指令转译/译码为一对应的微指令序列，以指示微处理器500去执行这些指令所指定的运算。依本发明，延伸转译器505检测那些具有延伸前置码标记的指令，以进行对应扩充存储器特性指定符前置码的转译/译码。在一x86的实施例中，延伸转译器505配置为检测其值为F1H的延伸前置码标记，其是x86的ICE BKPT运算码。延伸微指令字段则提供于微指令队列506中，以允许指定由该指令附随部分所指定的相关存储器参照的存储器特性。(0047) According to the present invention, during operation, the fetcher 501 fetches formatted instructions from the instruction cache/external memory 502 and puts these instructions into the instruction queue 503 according to their execution order. These instructions are then extracted from the instruction queue 503 and sent to the translator 504 . The translator 504 translates/decodes each incoming instruction into a corresponding sequence of microinstructions to instruct the microprocessor 500 to perform operations specified by these instructions. According to the present invention, the extended translator 505 detects those instructions marked with the extended preamble to perform translation/decoding corresponding to the extended memory property specifier preamble. In an x86 embodiment, the extended translator 505 is configured to detect an extended preamble flag whose value is F1H, which is an x86 ICE BKPT opcode. An extended microinstruction field is provided in microinstruction queue 506 to allow specification of the memory characteristics of the associated memory reference specified by the accompanying portion of the instruction.

(0048)微指令从微指令队列506被送至执行器507，其中延伸执行器508配置为依照一预设存储器特性(由既有的存储器特性描述符工具所定义)执行一指定存储器参照，或配置为利用于使用者层级通过本发明的延伸前置码所程序化的一存储器特性，依延伸微指令字段的指定，取代该预设的存储器特性。在一具体实施例中，该存储器特性是以快取线为单位而赋予的。(0048) the microinstruction is sent from the microinstruction queue 506 to the executor 507, wherein the extended executor 508 is configured to execute a specified memory reference according to a preset memory characteristic (defined by an existing memory characteristic descriptor tool), or Configured to utilize a memory characteristic programmed by the extended preamble of the present invention at the user level, as specified by the extended microinstruction field, to replace the default memory characteristic. In a specific embodiment, the memory characteristics are assigned in units of cache lines.

(0049)本领域的技术人员将发现，图5所示的微处理器500是现代的流水线化微处理器50经过简化的结果，事实上，现代的流水线化微处理器500最多可包括有20至30个不同的流水线阶段。然而，这些阶段可概括地归类为方块图所示的三个阶段，因此，图5的方块图500可用以点明前述本发明实施例所需的必要组件。为了简明起见，微处理器500中无关的组件并未显示出来。(0049) Those skilled in the art will find that the microprocessor 500 shown in FIG. 5 is a simplified result of a modern pipelined microprocessor 50. In fact, a modern pipelined microprocessor 500 may include up to to 30 different pipeline stages. However, these stages can be broadly categorized into three stages shown in the block diagram. Therefore, the block diagram 500 of FIG. 5 can be used to point out the necessary components required by the foregoing embodiments of the present invention. For the sake of clarity, unrelated components of microprocessor 500 are not shown.

(0050)现请参阅图6，其为本发明用于指定微处理器中一程序化存储器存取的存储器属性的延伸前置码600的一具体实施例的方块图。存储器特性指定符前置码600具8位大小，且包括一来源特性(source trait)字段601与一目的特性(destination trait)字段602。来源特性字段601为一相关延伸指令的其余部分所指定的来源操作数存储器存取(即加载、读取)指定一存储器属性，而目的特性字段602则为该其余部分所指定的目的操作数存储器存取(即储存、写入)指定一存储器属性。因此，8位前置码600的范例可指定16个不同存储器特性其中的一予来源与目的操作数两者，而这16个存储器特性是可用于取代相关地址范围或存储器分页所被指定的预设特性。图6所示的实施例为关联于对应指令的所有来源操作数地址指定了单一的来源存储器特性，并为所有目的操作数指定了单一(可能与前述的不同)的目的存储器特性。本领域技术人员将发觉，来源与目的属性可分别加以指定，在与重复字符串指令如x86架构的REP MOVS等连用的情形下，会特别有用。上述实施例的一变形则对于该对应指令所参照的每一目的/来源操作数提供一对应的目的特性字段602与来源特性字段601，因而等量地增加/减少前置码600的位。(0050) Please refer to FIG. 6 , which is a block diagram of an embodiment of an extended preamble 600 for specifying memory attributes of a programmed memory access in a microprocessor according to the present invention. The memory trait specifier preamble 600 has a size of 8 bits and includes a source trait field 601 and a destination trait field 602 . The source attribute field 601 specifies a memory attribute for the source operand memory access (i.e., load, read) specified by the rest of an associated extended instruction, and the destination attribute field 602 specifies the destination operand memory for the remainder Access (ie store, write) specifies a memory attribute. Thus, an example of an 8-bit preamble 600 may specify one of 16 different memory characteristics, both the source and destination operands, that may be used in place of the predetermined values specified by the associated address range or memory page. set characteristics. The embodiment shown in FIG. 6 specifies a single source memory identity for all source operand addresses associated with corresponding instructions, and a single (possibly different from the foregoing) destination memory identity for all destination operands. Those skilled in the art will find that the source and destination attributes can be specified separately, which is particularly useful in conjunction with repeating string instructions such as REP MOVS of the x86 architecture. A modification of the above embodiment provides a corresponding destination property field 602 and source property field 601 for each destination/source operand referenced by the corresponding instruction, thereby increasing/decreasing the bits of the preamble 600 by an equal amount.

(0051)现请参阅图7，其为本发明用于指定微处理器中一程序化存储器存取的存储器属性的延伸前置码600的另一具体实施例的方块图。存储器属性前置码700具8位大小，且包括一属性字段701，一来源位702、一目的位703及一备用字段704。3位的属性字段701为一对应指令所指定的存储器存取运算指定8个不同存储器特性的其中之一。来源位702对于所有来源操作数的存储器存取，致能属性字段701所指定的属性，而目的位703则对所有目的操作数的存储器存取致能该指定属性。因此，8位前置码700的实施例可指定8个不同存储器特性其中之一予可能应用于来源参照、目的参照或以上两者的存储器存取，而这8个存储器特性是可用于取代相关地址范围或存储器分页所被指定的预设特性。(0051) Please refer to FIG. 7 , which is a block diagram of another embodiment of an extended preamble 600 for specifying memory attributes of a programmed memory access in a microprocessor according to the present invention. The memory attribute preamble 700 has a size of 8 bits and includes an attribute field 701, a source bit 702, a destination bit 703 and a spare field 704. The 3-bit attribute field 701 is a memory access operation specified by a corresponding instruction Specifies one of 8 different memory characteristics. The source bit 702 enables the attribute specified by the attribute field 701 for all source operand memory accesses, and the destination bit 703 enables the specified attribute for all destination operand memory accesses. Thus, an embodiment of the 8-bit preamble 700 may designate one of eight different memory characteristics that may be used to replace the associated Preset characteristics to which address ranges or memory pages are specified.

(0052)现请参阅图8，其为一表格，用以解说图7延伸前置码700的字段的典型存储器特性的一编码范例。表格800具有一属性行ATTR及一特性行TRAIT。ATTR行中属性字段701的值被映像至TRAIT行中一对应的存储器特性。在此编码范例中，提供了常用的存储器特性，像是不可快取(值为000)与回写(值为011)，然而本领域技术人员将发觉，其它适用于一特殊微处理器架构的特性，也能通过图6与7的属性字段601、602、701来加以编码。(0052) Please refer now to FIG. 8, which is a table illustrating an encoding example of typical memory characteristics of the fields of the extended preamble 700 of FIG. Table 800 has an attribute row ATTR and a property row TRAIT. The value of attribute field 701 in the ATTR row is mapped to a corresponding memory characteristic in the TRAIT row. In this coding example, common memory characteristics such as non-cacheable (value 000) and write-back (value 011) are provided, however those skilled in the art will find that other memory features are suitable for a particular microprocessor architecture. Properties, can also be coded through the attribute fields 601, 602, 701 of FIGS. 6 and 7.

(0053)现请参阅图9，其为图5的微处理器内转译阶段器900的具体的方块图。转译阶段器900具有一指令缓冲器904，依本发明，其提供延伸指令至转译器905。转译器905是耦接至一具有一延伸特征字段903的机器特定缓存器(machine specific register)902。转译器905具一转译控制器906，其提供一除能信号907至一逸出指令检测器908及一延伸译码器909。逸出指令检测器908耦接至延伸译码器909及一指令译码器910。延伸译码器909与指令译码器910存取一控制只读存储器(ROM)911，其中储存了对应至某些延伸指令的样板(template)微指令序列。转译器905亦包括一微指令缓冲器912，其具有一运算码延伸项字段913、一微运算码字段914、一目的字段915、一来源字段916以及一位移字段917。(0053) Please refer to FIG. 9 , which is a specific block diagram of the translation stager 900 in the microprocessor of FIG. 5 . The translation stager 900 has an instruction buffer 904 which provides extended instructions to the translator 905 according to the present invention. The translator 905 is coupled to a machine specific register 902 having an extended feature field 903 . Translator 905 has a translation controller 906 that provides a disable signal 907 to an escaped instruction detector 908 and an extended decoder 909 . The escaped instruction detector 908 is coupled to the extended decoder 909 and an instruction decoder 910 . The extension decoder 909 and the instruction decoder 910 access a control read-only memory (ROM) 911 in which template microinstruction sequences corresponding to certain extension instructions are stored. Translator 905 also includes a micro-op buffer 912 having an opcode extension field 913 , a micro-opcode field 914 , a destination field 915 , a source field 916 and a displacement field 917 .

(0054)运作上，在微处理器通电激活期间，机器特定缓存器902内的延伸字段903的状态是通过信号激活状态(signal power-up state)901确定，以指出该特定微处理器是否能转译与执行本发明的用以取代微处理器的预设存储器属性的延伸指令。在一具体实施例中，信号901从一特征控制缓存器(图上未显示)导出，该特征控制缓存器则读取一于制造时即已配置的熔丝数组(fuse array)(未显示)。机器特定缓存器902将延伸特征字段903的状态送至转译控制器906。转译控制器906则控制从指令缓冲器904所提取的指令，要依照延伸转译规则或常用转译规则进行解译。提供这样的控制特征，可允许监督应用程序(如BIOS)致能/除能微处理器的延伸执行特征。若延伸特征被除能，则具有被选为延伸特征标记的运算码状态的指令，将依常用转译规则进行转译。在一x86的具体实施例中，选取运算码状态F1H作为标记，则在常用的转译规则下，遇到F1H将造成不合法的指令异常(exception)。若延伸转译被除能，指令译码器910将转译/译码所有送入的指令，并对微指令912的所有字段913至917进行配置。然而，在延伸转译规则下，若遇到标记，则会被逸出指令检测器908检测出来。逸出指令检测器908将指示延伸译码器909依据延伸转译规则来转译/译码该延伸指令的延伸前置码部分，并配置运算码延伸项字段913，以指出该延伸指令其余部分指定的存储器存取所要应用的存储器特性。指令译码器910将转译/译码该延伸指令的其余部分，并对微指令912的微运算码字段914、来源字段916、目的字段915及位移字段917进行配置。某些特定指令将导致对控制ROM 911的存取，以获取对应的微指令序列样板。经过配置的微指令912被送至一微指令队列(未显示于图中)，由处理器进行后续执行。(0054) In operation, during power-up of the microprocessor, the state of the extension field 903 in the machine-specific register 902 is determined by the signal power-up state 901 to indicate whether the particular microprocessor can Translating and executing the extended instructions of the present invention to replace the default memory attributes of the microprocessor. In one embodiment, signal 901 is derived from a feature control register (not shown) that reads a fuse array (fuse array) (not shown) configured at manufacture . The machine specific register 902 sends the state of the extended features field 903 to the translation controller 906 . The translation controller 906 controls the instructions fetched from the instruction buffer 904 to be interpreted according to the extended translation rules or common translation rules. Providing such a control feature may allow a supervisory application (eg, BIOS) to enable/disable extended execution features of the microprocessor. If extended features are disabled, instructions with opcode states selected as extended feature flags are translated according to the usual translation rules. In a specific embodiment of x86, the operation code state F1H is selected as a flag, and under common translation rules, encountering F1H will cause an illegal instruction exception (exception). If extended translation is disabled, the instruction decoder 910 will translate/decode all incoming instructions and configure all fields 913 to 917 of the microinstruction 912 . However, under the extended translation rule, if a token is encountered, it will be detected by the escaped instruction detector 908 . The escaped instruction detector 908 will instruct the extended decoder 909 to translate/decode the extended preamble portion of the extended instruction according to the extended translation rules, and configure the opcode extension field 913 to indicate the Memory accesses the memory characteristics to be applied. The instruction decoder 910 will translate/decode the rest of the extended instruction and configure the micro-opcode field 914 , source field 916 , destination field 915 and displacement field 917 of the microinstruction 912 . Certain specific instructions will result in access to the control ROM 911 to obtain the corresponding microinstruction sequence template. The configured microinstructions 912 are sent to a microinstruction queue (not shown in the figure) for subsequent execution by the processor.

(0055)现请参阅图10，其为图5微处理器内的延伸执行阶段器1000的方块图。该延伸执行阶段器1000具一延伸存取器(extended access logic)1005，其分别经由总线1012与1013耦接至一高速缓存1007与一总线单元1008。总线单元1008是用以指导于存储器总线(图中未显示)上进行的存储器存取作业(memory transaction)。依本发明，延伸存取器1005从微处理器前一阶段的一延伸微指令缓冲器1001接收微指令，从地址缓冲器1002与1003接收两个地址操作数，并从目的操作数缓冲器1004接收一目的操作数。延伸存取器1005亦耦接至数个依主机微处理器的架构常规进行配置的存储器特性描述符1006，延伸存取器1005包括一存取控制器1009、一储存缓冲器1010及一加载缓冲器1011。该加载缓冲器1011将一来源操作数输出送至一来源操作数缓冲器1015。(0055) Please refer to FIG. 10 , which is a block diagram of the extended execution stage 1000 in the microprocessor of FIG. 5 . The extended execution stage 1000 has an extended access logic 1005, which is coupled to a cache 1007 and a bus unit 1008 via buses 1012 and 1013, respectively. The bus unit 1008 is used to guide the memory access operation (memory transaction) on the memory bus (not shown). According to the present invention, the extended accessor 1005 receives microinstructions from an extended microinstruction buffer 1001 in the previous stage of the microprocessor, receives two address operands from address buffers 1002 and 1003, and receives two address operands from the destination operand buffer 1004. Receives a destination operand. Extended accessor 1005 is also coupled to a number of memory descriptors 1006 configured according to the architecture conventions of the host microprocessor. Extended accessor 1005 includes an access controller 1009, a store buffer 1010 and a load buffer device 1011. The load buffer 1011 sends a source operand output to a source operand buffer 1015 .

(0056)运作上，延伸执行器1000是用于执行存储器存取，从存储器读取操作数，以及将操作数写入存储器，如延伸微指令缓冲器1001中的微指令所指示的。执行读取/加载运算时，存取控制器1009从地址缓冲器1002与1003接收一个或更多存储器地址，并读取存储器特性描述符1006，以确定相关于该加载运算的存储器属性。在一x86实施例中，存储器特性描述符1006包括x86高速缓存与分页控制缓存器、分页目录与分页表项目、存储器类型范围缓存器(memory type range register，MTTR)、分页属性表(paging attributetable，PAT)以及外部信号脚位KEN#、wB/WT#、PCT及PWT。存取控制器1009依据x86的层级存储器属性常规，使用从这些来源1006所取得的信息，以确定该加载运算的预设存储器属性。对非x86的实施例而言，存取控制器1009依据对应主机微处理器的特定架构的层级存储器属性常规，使用从存储器特性描述符1006所取得的信息，来确定该加载运算的预设存储器属性。存储器地址，连同其对应存取的属性，被送至加载缓冲器1011。依据所提供的特性属性，加载缓冲器1011经由总线1012从高速缓存或直接经由总线单元1008从系统存储器(未显示)获得来源操作数。所获得的来源操作数与一流水线时钟信号(未显示)同步，被送至来源操作数缓冲器1015。延伸微指令亦与该流水线时钟信号同步，被送入流水线至延伸微指令缓存器1014。来源操作数便以此种方式被送至微处理器的下一阶段。(0056) Operationally, the extended executor 1000 is used to perform memory accesses, read operands from memory, and write operands to memory, as indicated by the microinstructions in the extended microinstruction buffer 1001 . When performing a load/load operation, access controller 1009 receives one or more memory addresses from address buffers 1002 and 1003 and reads memory property descriptor 1006 to determine memory properties associated with the load operation. In an x86 embodiment, the memory property descriptor 1006 includes x86 cache and paging control registers, paging directory and paging table entries, memory type range registers (memory type range register, MTTR), paging attribute table (paging attribute table, PAT) and external signal pins KEN#, wB/WT#, PCT and PWT. The access controller 1009 uses the information obtained from these sources 1006 to determine the default memory attributes for the load operation according to x86 hierarchical memory attribute conventions. For non-x86 embodiments, the access controller 1009 uses information obtained from the memory property descriptor 1006 to determine the default memory for the load operation according to the hierarchy memory attribute conventions corresponding to the particular architecture of the host microprocessor. Attributes. The memory address, along with its corresponding accessed attributes, is sent to the load buffer 1011. Load buffer 1011 obtains source operands from cache memory via bus 1012 or directly from system memory (not shown) via bus unit 1008, depending on the provided property attributes. The obtained source operands are sent to the source operand buffer 1015 synchronously with a pipeline clock signal (not shown). The extended microinstructions are also sent into the pipeline to the extended microinstruction register 1014 synchronously with the pipeline clock signal. In this way the source operand is sent to the next stage of the microprocessor.

(0057)执行延伸微指令所指示的写入/储存运算时，存取控制器1009从地址缓冲器1002与1003接收该运算的地址数据，并从缓冲器1004接收所要储存的操作数。存取控制器1009如前所述般来存取存储器特性描述符1006，以确定对应于该储存运算的存储器特性。该存储器特性、地址信息以及该目的操作数并送至储存缓冲器1010。依据所提供的特定属性，储存缓冲器1010经由总线1012将该目的操作数写入高速缓存1007，或直接经由总线单元1008写入系统存储器。(0057) When executing the write/store operation indicated by the extended microinstruction, the access controller 1009 receives the address data of the operation from the address buffers 1002 and 1003, and receives the operand to be stored from the buffer 1004. The access controller 1009 accesses the memory property descriptor 1006 as described above to determine the memory property corresponding to the store operation. The memory characteristics, address information and the destination operand are sent to the storage buffer 1010 . Depending on the particular attributes provided, store buffer 1010 writes the destination operand into cache 1007 via bus 1012 , or directly into system memory via bus unit 1008 .

(0058)本发明的储存缓冲器1010与加载缓冲器1011被配置为依据主机处理器的存储器属性模型的相关处理要求，来执行储存与加载的存取运算，其中该处理要求是包括强/弱排序常规(如假想执行规则)以及快取存取原则。在一具体实施例中，加载与储存运算是在主机微处理器的不同流水线阶段中执行。(0058) The store buffer 1010 and the load buffer 1011 of the present invention are configured to perform store and load access operations according to the relevant processing requirements of the memory attribute model of the host processor, wherein the processing requirements include strong/weak Sorting conventions (such as hypothetical execution rules) and cache access principles. In one embodiment, the load and store operations are performed in separate pipeline stages of the host microprocessor.

(0059)对使用选择性的存储器属性取代前置码的延伸指令而言，相关存储器存取(即加载、储存或加载与储存两者)的取代存储器特性通过延伸微指令缓冲器1001内的延伸微指令的运算码延伸项字段(未显示)，被送至存取控制器1009。存取控制器1009，如前所述，通过从存储器特性描述符1006所获得的信息，确定所指定存储器存取的预设存储器特性。若该指定取代特性比对应的预设特性还强，则存取控制器1009将取代特性连同前述的地址及/或目的操作数，送至储存缓冲器1010/加载缓冲器1011。若该指定取代特性比对应的预设特性还弱，则存取控制器1009将预设特性连同地址及/或目的操作数，送至储存缓冲器1010/加载缓冲器1011。因此，选择性的存储器取代仅依据所应用的特定架构而执行，以加强一存储器特性。例如，在x86架构中，一存储器存取的不可快取特性不能被弱化为回写。反之，回写特性则不能被加强为不可快取。存储器存取所要使用的特性是队决取线为单位而赋予的，而在许多现代的桌上型/膝上型微处理器架构中，快取线的大小为32字节。(0059) For extended instructions that use optional memory attributes instead of preambles, the replaced memory characteristics of the associated memory access (i.e., load, store, or both load and store) are extended by the extended microinstruction buffer 1001 The opcode extension field (not shown) of the microinstruction is sent to the access controller 1009 . The access controller 1009, as previously described, uses the information obtained from the memory property descriptor 1006 to determine the default memory characteristics for the specified memory access. If the designated replacement characteristic is stronger than the corresponding default characteristic, the access controller 1009 sends the replacement characteristic together with the aforementioned address and/or destination operand to the store buffer 1010/load buffer 1011. If the specified replacement property is weaker than the corresponding default property, the access controller 1009 sends the default property together with the address and/or destination operand to the store buffer 1010/load buffer 1011. Thus, selective memory replacement is only performed according to the specific architecture used to enhance a memory characteristic. For example, in the x86 architecture, the non-cacheable nature of a memory access cannot be weakened to write-back. Conversely, the write-back feature cannot be enhanced to be non-cacheable. The characteristics to be used for memory accesses are given in units of lines, and in many modern desktop/laptop microprocessor architectures, the size of a cache line is 32 bytes.

(0060)现请参阅图11，其为描述本发明对可使程序员于指令层级取代微处理器内的预设存储器属性的指令，进行转译与执行的方法的运作流程图。流程开始于方块1102，其中一个配置有延伸特征指令的程序，被送至微处理器。流程接着进行至方块1104。(0060) Please refer to FIG. 11 , which is a flow chart describing the operation of the method for translating and executing the instruction that enables the programmer to replace the preset memory attribute in the microprocessor at the instruction level of the present invention. The flow begins at block 1102, where a program configured with extended feature instructions is sent to the microprocessor. The flow then proceeds to block 1104 .

(0061)于方块1104中，下一个指令是从高速缓存/外部存储器提取。流程接着进行至判断方块1106。(0061) In block 1104, the next instruction is fetched from cache/external memory. The flow then proceeds to decision block 1106 .

(0062)于判断方块1106中，对在方块1104中所提取的下个指令进行检查，以判断是否包括一本发明的延伸逸出码。在一x86的实施例中，该检查是用以检测运算码值F1(ICE BKPT)。若检测到该延伸逸出码，则流程进行至方块1108。若未检测到该延伸逸出码，则流程进行至方块1112。(0062) In decision block 1106, the next command extracted in block 1104 is checked to determine whether it includes an extended escape code of the present invention. In an x86 embodiment, the check is to detect opcode value F1(ICE BKPT). If the extended escape code is detected, the flow proceeds to block 1108 . If the extended escape code is not detected, the flow proceeds to block 1112 .

(0063)于方块1108中，译码/转译该延伸指令的延伸前置码部分，以确定一存储器属性，该存储器属性是被指定为取代该下个指令所指定的相关存储器存取的预设存储器属性。流程接着进行到方块1110。(0063) In block 1108, the extended preamble portion of the extended instruction is decoded/translated to determine a memory attribute that is designated as a default in place of the associated memory access specified by the next instruction storage properties. Flow then proceeds to block 1110 .

(0064)于方块1110中，该相关存储器存取的存储器属性于一对应微指令序列的延伸项字段进行配置。流程接着进行至方块1112。(0064) In block 1110, the memory attribute of the associated memory access is configured in an extension field of a corresponding microinstruction sequence. The flow then proceeds to block 1112 .

(0065)于方块1112中，该指令的所有其余部分被译码/转译，以确定所指定的存储器存取、缓存器操作数的位置。存储器地址指定元以及依据该既有微处理器指令集，由前置码所指定的既有架构特征的使用。流程接着进行至方块1114。(0065) In block 1112, all remaining parts of the instruction are decoded/translated to determine the location of the specified memory access, register operand. The use of memory address specifiers and legacy architectural features specified by preambles according to the legacy microprocessor instruction set. The flow then proceeds to block 1114 .

(0066)于方块1114中，一微指令序列被配置为指定所指定的存储器参照及其对应的运算码延伸项。流程接着进行至方块1116。(0066) In block 1114, a sequence of microinstructions is configured to specify the specified memory reference and its corresponding opcode extension. Flow then proceeds to block 1116 .

(0067)于方块1116中，该微指令序列被送至一微指令队列，由微处理器执行。流程接着进行至方块1118。(0067) In block 1116, the microinstruction sequence is sent to a microinstruction queue for execution by the microprocessor. Flow then proceeds to block 1118 .

(0068)于方块1118中，该微指令序列由本发明的一地址器进行提取。该地址器产生该存储器存取的地址，并将该地址送至延伸执行器。流程接着进行至方块1120。(0068) In block 1118, the microinstruction sequence is fetched by an address device of the present invention. The addresser generates an address for the memory access and sends the address to the extended executive. The flow then proceeds to block 1120 .

(0069)于方块1120中，延伸执行器运用该微处理器架构的存储器特性描述工具，以确定一预设的存储器特性。流程接着进行至判断方块1122。(0069) In block 1120, the extended executor uses the microprocessor architecture's memory characterization tool to determine a predetermined memory characteristic. The flow then proceeds to decision block 1122 .

(0070)于判断方块1122中进行评估，以判断该微处理器架构的快取/存储器模型是否允许该指定的存储器属性取代该预设属性。若取代被允许，流程进行至方块1124。若取代未被允许，则流程进行至方块1126。(0070) An evaluation is performed at decision block 1122 to determine whether the cache/memory model of the microprocessor architecture allows the specified memory attribute to override the default attribute. If override is allowed, flow proceeds to block 1124 . If the override is not allowed, the flow proceeds to block 1126 .

(0071)于方块1124中，通过使用于方块1108的延伸前置码字段所指定的取代存储器属性，执行该存储器存取。流程接着进行至方块1128。(0071) In block 1124, perform the memory access by using the replaced memory attribute specified by the extended preamble field used in block 1108. Flow then proceeds to block 1128 .

(0072)于方块1126中，通过使用于方块1120所确定的预设存储器属性，执行该存储器存取。流程接着进行至方块1128。(0072) In block 1126, the memory access is performed using the default memory attributes determined in block 1120. Flow then proceeds to block 1128 .

(0073)于方块1128中，本方法完成。(0073) In block 1128, the method is complete.

(0074)虽然本发明及其目的。特征与优点已详细叙述，其它实施例亦可包括在本发明的范围内。例如，本发明已就如下的技术加以叙述：利用已完全占用的指令集架构内一单一、未使用的运算码状态作为标记，以指出其后的延伸特征前置码。但本发明的范围就任一方面来看，并不限于已完全占用的指令集架构，或未使用的指令，或是单一标记。相反地，本发明涵盖了未完全映像的指令集。具已使用运算码的实施例以及使用一个以上的指令标记的实施例。例如，考虑一没有未使用运算码状态的指令集架构。本发明的一具体实施例包括了选取一作为逸出标记的运算码状态，其中选取标准是依市场因素而确定。另一具体实施例则包括使用运算码的一特殊组合作为标记，如运算码状态7FH的连续出现。因此，本发明的本质是在于使用一标记序列，其后则为一n位的延伸前置码，可允许程序员于指令层级指定存储器存取的存储器属性，而该些属性是无法另由微处理器指令集的既有指令来提供。(0074) Although the present invention and its objects. The features and advantages have been described in detail, and other embodiments may also be included within the scope of the invention. For example, the present invention has been described with respect to the technique of using a single, unused opcode state within a fully occupied ISA as a flag to indicate the extended feature prefix that follows. However, the scope of the present invention is not limited in any respect to a fully occupied ISA, or unused instructions, or a single flag. Rather, the invention covers instruction sets that are not fully mapped. Embodiments that have used opcodes and embodiments that have used more than one instruction flag. For example, consider an ISA without unused opcode states. An embodiment of the present invention includes selecting an opcode state as an escape flag, wherein the selection criteria are determined by market factors. Another embodiment includes using a particular combination of opcodes as a marker, such as successive occurrences of opcode state 7FH. Therefore, the essence of the present invention is that the use of a tag sequence followed by an n-bit extended preamble allows the programmer to specify at the instruction level memory attributes for memory accesses that cannot otherwise be determined by the microprocessor. Existing instructions in the instruction set of the processor are provided.

(0075)此外，虽然上文是利用微处理器为例来解说本发明及其目的。特征和优点，本领域的技术人员仍可察觉，本发明的范围并不限于微处理器的架构，而可涵盖所有形式的可程序化装置，如信号处理器。工业用控制器(industrial controller)、阵列处理机及其它同类装置。(0075) In addition, although the above uses a microprocessor as an example to illustrate the present invention and its purpose. The features and advantages are still apparent to those skilled in the art, and the scope of the present invention is not limited to the architecture of microprocessors, but covers all forms of programmable devices, such as signal processors. Industrial controllers, array processors, and other similar devices.

总之，以上所述仅为本发明的较佳实施例而已，当不能以的限定本发明所实施的范围。大凡依本发明权利要求所作的等效变化与修饰，皆应仍属于本发明专利涵盖的范围内。In a word, the above descriptions are only preferred embodiments of the present invention, and should not limit the implementation scope of the present invention. All equivalent changes and modifications made according to the claims of the present invention should still fall within the scope covered by the patent of the present invention.

Claims

1. A device for providing instruction-level control of memory attributes in a microprocessor, characterized in that it comprises:

A translator for translating an extended instruction into a sequence of microinstructions, wherein the extended instruction includes: an extended preamble and an extended preamble tag; wherein the extended preamble is used for the extended instruction specifying a memory reference specifying a memory property, wherein the memory reference's memory property cannot be specified by a legacy instruction of a legacy instruction set, the extended preamble including an attribute field specifying the memory property, wherein The memory attribute includes one of a plurality of memory attributes; and the extended preamble flag is used to mark the extended preamble, wherein the extended preamble flag is another specified opcode in the existing instruction set;

an extended executor, coupled to the translator, for receiving the microinstruction sequence, and applying the memory property to execute the memory reference;

Wherein the translator includes: an escaped instruction detector for detecting the extended preamble tag; an instruction decoder for determining an operation to be performed, wherein the operation includes the memory reference; and an extended translation An encoder, coupled to the escaped instruction detector and the instruction decoder, is used to determine the memory characteristic and specify the memory characteristic in the microinstruction sequence.

2. The device as claimed in claim 1, wherein the extended command further comprises command items of the existing command set.

3. The apparatus of claim 2, wherein the instruction item specifies an operation to be performed by the microprocessor, and wherein the operation includes the memory reference.

4. The apparatus of claim 1, wherein the memory reference comprises an operand load operation, an operand store operation or both.

5. The apparatus of claim 1, wherein the memory properties specify how a cache is used when the memory reference is executed.

6. The apparatus of claim 1, wherein the memory property specifies how the memory reference is ordered relative to other memory references.

7. The device of claim 1, wherein the extended preamble instructs the microprocessor to override a default memory characteristic when performing the memory reference.

8. The device of claim 1, wherein the plurality of memory attributes include non-cacheable, compound write, write through, write back and write protect.

9. A microprocessor device extending an existing instruction set to provide selective control of memory characteristics, comprising:

A translator configured to receive an extended instruction and generate a sequence of microinstructions to instruct a microprocessor to perform a memory access, the extended instruction configured to specify a memory attribute of a memory access, wherein the extended instruction includes the There is a selected opcode in the instruction set followed by an n-bit extended preamble, the selected opcode identifies the extended instruction, and the n-bit extended preamble indicates the memory attribute, wherein the A memory attribute of a memory access that cannot otherwise be specified by an instruction of the existing instruction set; wherein the n-bit extended preamble includes a memory attribute field configured to specify the memory attribute, wherein the memory attribute includes a number of memory One of the fetch properties, wherein the memory access is to be performed according to the memory property; wherein the translator includes: an escape instruction detector for detecting the selected opcode in the extended instruction; an instruction decode a device for decoding the remainder of the extended instruction to determine the memory access; and an extended decoder coupled to the escaped instruction detector and the instruction decoder for decoding the n bits and specify the memory attribute within the microinstruction sequence.

10. The microprocessor device as claimed in claim 9, wherein said extended instruction further comprises:

The remaining command items are configured to specify the memory access, wherein the memory attribute is used to replace a default memory attribute when the memory access is executed.

11. The microprocessor device of claim 9, wherein the memory access characteristics include non-cacheable, compound write, write through, write back and write protect.

12. A device for adding instruction-level memory characteristic control features to an existing instruction set, characterized in that it comprises:

A translator that receives and translates an instruction that includes an escape flag and a set of accompanying parts, wherein the escape flag is an opcode within the existing instruction set; the accompanying parts include a memory attribute specifier, used to designate one of several memory attributes for memory access; and

an extension implementer, coupled to the translator, executes the memory access using specified memory characteristics, wherein the legacy instruction set only specifies a default memory characteristic for the memory access, and wherein the extension The actuator uses the specified memory characteristics instead of the default memory characteristics;

Wherein, the translator includes: an escape flag detector, used to detect the escape flag, and indicate that the translation action of the accompanying part needs to be based on the extended translation agreement; and a decoder, coupled to the escape flag detector The device is used for executing the translation of the instruction according to the agreement of the existing instruction set, and performing the translation of the instruction according to the extended translation agreement, so as to enable the execution of the memory access according to the specified memory characteristic.

13. The apparatus of claim 12, wherein the translator translates the escape flag and the accompanying part into corresponding microinstructions, and the corresponding microinstructions instruct the extended executor according to the specified The memory access is performed for the memory characteristics.

14. The device according to claim 12, wherein the memory characteristics include non-cacheable, compound write, write through, write back and write protection.

15. A method for extending an existing instruction set architecture to provide instruction-level selective memory attribute control, characterized in that the method comprises:

providing an extended instruction, the extended instruction includes an extended instruction flag, an extended prefix and other parts, wherein the extended instruction flag is a first opcode item of the existing instruction set architecture;

A memory attribute to be applied to a corresponding memory access specified by the extended preamble, wherein the memory access is specified by the rest of the extended instruction; the extended preamble includes an attribute field for specifying the memory characteristics, wherein the memory characteristics include one of several memory attributes; and

Applying the memory attribute to perform the memory access, wherein the apply action replaces a default memory attribute of the memory access with the memory attribute.

16. The method of claim 15, wherein the act of specifying a memory attribute to be applied to a corresponding memory access comprises:

The memory access is first specified in the remainder of the extended instruction, wherein the first specification includes using a second opcode entry in the existing instruction set architecture.

17. The method of claim 15, wherein after said act of specifying a memory attribute to be applied to a corresponding memory access and before said act of performing the memory access, Also includes:

The extended instruction is translated into a sequence of microinstructions instructing an extended executor to perform the memory access according to memory attributes.

18. The method according to claim 17, wherein the action of translating the extended instruction comprises:

within a translator, detecting the extension directive flag; and

The extended preamble and the rest of the extended instruction are decoded according to extended translation rules to generate the microinstruction sequence.

19. The method of claim 15, wherein the act of specifying a memory attribute to be applied to a corresponding memory access comprises:

Specify one of the following storage attributes as the storage attribute to override this default storage attribute: non-cacheable, write-composite, write-through, write-back, and write-protect.